G-VMamba: Contour-enhanced Visual State-Space Model for Remote Sensing Image Classification
The current branch has been tested on Linux system, PyTorch 1.13.X and CUDA 11.6, supports Python 3.8+.
- Class activation map (CAM) visualization of the final normalization layer for VMamba and G-VMamba Models’ classification of UC-Merced dataset images.
When the model classifies the scenes in the image, the G-VMamba model focuses on areas where the color (or brightness) of the image changes more significantly (red areas), such as the lane intersection position of the Overpass scene, the edge of the court in the Baseball diamond scene, and the airplane shadow and lawn border of the Airplane scene. (The model size is ‘Small’.)
- The overall architecture: (a) Overview of G-VMamba model; (b) Feature grouping in the G-VSS block.
We provide the method of preparing the remote sensing image classification dataset used in the paper.
- Image and annotation download link: UC Merced Dataset.
- Image and annotation download link: AID Dataset。
- Image and annotation download link: NWPU RESISC45 Dataset。
Step 1. Create a conda environment and activate it.
conda create -n Gvmamba python=3.9
conda activate Gvmamba
Step 2. Install the requirements.
- Torch1.13.1 + cu116
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple torch==1.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple torchvision==0.14.1 --extra-index-url https://download.pytorch.org/whl/cu116
pip install torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
Step 3. Install the VMamba.
Please refer to the code support.
Step 4. Configuring G-VMamba core components.
Replace the contents of the models folder under the classification folder.
If you only want to test the performance:
python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 --master_port=29500 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp --pretrained </path/of/checkpoint>
Train:
Training with a single GPU:
torchrun --nnodes=1 --node_rank=0 --nproc_per_node=1 main.py --cfg </path/to/config> --batch-size 16 --data-path </path/of/dataset> --output </path/of/output>
Training with multiple GPUs:
export CUDA_VISIBLE_DEVICES=0,1,2,3,4
torchrun --nnodes=1 --node_rank=0 --nproc_per_node=5 --master_port=29500 --rdzv_id=12345 --rdzv_backend=c10d --rdzv_endpoint=localhost:29500 main.py --cfg </path/to/config> --batch-size 8 --data-path </path/of/dataset> --output </path/of/output>
@ARTICLE{10810482,
author={Yan, Liyue and Zhang, Xing and Wang, Kafeng and Zhang, Dejin},
journal={IEEE Transactions on Geoscience and Remote Sensing},
title={Contour-enhanced Visual State-Space Model for Remote Sensing Image Classification},
year={2024}
}
This project is mainly based on VMamba (paper, code), Swin-Transformer (paper, code), pytorch-grad-cam (code), etc, thanks for their excellent works.