Transfer Learning Methods
Transfer Learning Methods
The SAM family has three core parts: image encoder, prompt encoder, and mask decoder. Prefer minimal changes unless there is a clear bottleneck.
1) Decoder-only fine-tuning
- No structural changes.
- Train mask decoder only.
- Freeze image encoder and most prompt encoder.
Why:
- Fast and often provides a strong first improvement.
2) LoRA or adapters in attention blocks
- Inject LoRA adapters in selected attention projections.
- Keep base weights mostly frozen.
Suggested start:
- LoRA rank: 8-16
- LoRA alpha: 16-32
- LoRA dropout: 0.05-0.1
3) Multimodal adapter for RGB + heightmap/DSM
- Add a small input adapter for a fourth channel (heightmap).
Practical designs:
Conv1x1projection from 4 channels to 3 channels before SAM3 preprocessing.- Dual-branch fusion with a small heightmap branch and feature fusion before decoder.
4) Multi-task head extension
- Keep SAM3 mask decoder for instance masks.
- Add a lightweight semantic head for canopy/background supervision.
5) Broad encoder unfreezing
- Progressively unfreeze upper encoder blocks.
6) Prompt tuning or learned prompts
- Keep encoder mostly frozen.
- Learn prompt tokens or prompt-encoder adapters.
7) BitFit or IA3
- BitFit: train selected bias terms.
- IA3: train multiplicative vectors in attention/MLP paths.
8) Progressive unfreezing with layer-wise LR decay
- Start from decoder or adapter tuning.
- Unfreeze top encoder blocks stepwise with smaller learning rates deeper in the encoder.
9) Teacher-student self-training
- Generate pseudo-labels with best current model.
- Filter by confidence and retrain student model.
10) Continual adaptation with replay
- Fine-tune on new region or season data.
- Replay representative old tiles to reduce forgetting.
See also: