SAM Architecture Variants
SAM Architecture Variants
Evaluate architecture and transfer-learning method separately to avoid confounded results.
Reference architecture: SAM3 in this project
flowchart LR
RGB["RGB Tile"] --> PROC["Sam3Processor"]
PROMPT["Prompt Input (Text / Points / Box)"] --> PENC["Prompt Encoder"]
PROC --> IENC["Image Encoder"]
IENC --> MDEC["Mask Decoder"]
PENC --> MDEC
MDEC --> MASK["Instance Masks"]
This is the baseline structure used before trying architecture variants.
Variant A: Backbone scale sweep
- Compare small/base/large checkpoints (or nearest available sizes).
Expected tradeoff:
- Smaller models: faster inference.
- Larger models: potentially better boundary detail and small-object recall.
Variant B: High-quality decoder variants
- Use SAM-compatible HQ decoders (if available in the stack).
Expected tradeoff:
- Better boundary quality, moderate runtime increase.
Variant C: Efficient or Mobile SAM variants
- Benchmark compact models for throughput-constrained use.
Expected tradeoff:
- Strong deployment speed, possible accuracy loss.
Variant D: Multispectral-ready front-end adapter
- If NIR or extra bands exist, compare:
N -> 3projection adapter.- Dual-branch feature fusion.
Expected tradeoff:
- More input signal and better canopy separation, with added model complexity.
Variant E: Multi-task SAM wrapper
- Instance decoder plus semantic segmentation head trained jointly.
Expected tradeoff:
- Better coarse region consistency and more complex loss balancing.
Example variant architecture: RGB+Height + multi-task output
flowchart LR
RGB["RGB Tile"] --> FUSE["Input Fusion Adapter"]
HGT["Heightmap / DSM"] --> FUSE
FUSE --> PROC["Sam3Processor-Compatible Tensor"]
PROMPT["Prompt Input (Text / Points / Box)"] --> PENC["Prompt Encoder"]
PROC --> IENC["Image Encoder"]
IENC --> MDEC["Mask Decoder"]
PENC --> MDEC
IENC --> SHEAD["Semantic Head"]
MDEC --> IMASK["Instance Masks"]
SHEAD --> SMASK["Semantic Canopy Mask"]
This variant combines:
- Multimodal adapter for RGB + height input.
- Multi-task training for instance and semantic outputs.
See also: