Skip to content

VimsLab/SODAWideNetPlusPlus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SODAWideNetPlusPlus [Link]

Combining Attention and Convolutions for Salient Object Detection

ABSTRACT

Salient Object Detection (SOD) has traditionally relied on feature refinement modules that utilize the features of an ImageNet pre-trained backbone. However, this approach limits the possibility of pre-training the entire network because of the distinct nature of SOD and image classification. Additionally, the architecture of these backbones originally built for Image classification is sub-optimal for a dense prediction task like SOD. To address these issues, we propose a novel encoder-decoder-style neural network called SODAWideNet++ that is designed explicitly for SOD. Inspired by the vision transformers' ability to attain a global receptive field from the initial stages, we introduce the Attention Guided Long Range Feature Extraction (AGLRFE) module, which combines large dilated convolutions and self-attention. Specifically, we use attention features to guide long-range information extracted by multiple dilated convolutions, thus taking advantage of the inductive biases of a convolution operation and the input dependency brought by self-attention. In contrast to the current paradigm of ImageNet pre-training, we modify 118K annotated images from the COCO semantic segmentation dataset by binarizing the annotations to pre-train the proposed model end-to-end. Further, we supervise the background predictions along with the foreground to push our model to generate accurate saliency predictions. SODAWideNet++ performs competitively on five different datasets while only containing 35{%} of the trainable parameters compared to the state-of-the-art models.

Model Overview

Model Number of Parameters Pre-computed Saliency Maps Model Weights Pre-trained Weights
SODAWideNet++ 26.58M Saliency Maps Weights Pre-trained Weights
SODAWideNet++-M 6.66M Saliency Maps Weights Pre-trained Weights
SODAWideNet++-S 1.67M Saliency Maps Weights Pre-trained Weights

COCO Pre-training

Download the dataset and unzip the file. Then, use the following command to train the model. The model sizes can be L, M, and S.

python training.py \
    --lr 0.001 \
    --epochs 21 \
    --f_name "COCOSODAWideNet++L" \
    --n 4 \
    --b 20 \
    --sched 1 \
    --training_scheme "COCO" \
    --salient_loss_weight 1.0 \
    --use_pretrained 0 \
    --im_size 384 \
    --model_size 'L'

DUTS Finetuning

Download the dataset from link and unzip the file. Then, use the following command to train the model. Also, download the DUTS-TE dataset for evaluation. Create a folder with the name checkpoints and save the COCO pre-trained checkpoint in it.

python training.py \
    --lr 0.001 \
    --epochs 11 \
    --f_name "DUTSSODAWideNet++L" \
    --n 4 \
    --b 20 \
    --sched 1 \
    --training_scheme "DUTS" \
    --salient_loss_weight 0.5 \
    --use_pretrained 1 \
    --checkpoint_name "COCOSODAWideNet++L"
    --im_size 384 \
    --model_size 'L'

Inference

We provide an option to generate the saliency map for a single image or multiple images in a folder. The below script displays the generated saliency map. model_size can be L, M, and S.

python inference.py \
    --mode single \
    --input_path /path/to/image.jpg \
    --display \
    --model_size L

The below script generates a saliency map and saves the result.

python inference.py \
    --mode single \
    --input_path /path/to/image.jpg \
    --model_size L

The below script generates saliency maps for a folder of images and saves them in the user-specified output directory.

python inference.py \
    --mode folder \
    --input_path /path/to/input/folder \
    --output_dir /path/to/output/folder \
    --model_size L

Citation

If you find our research useful, please cite our paper with the following citation -

@inproceedings{dulam2025sodawidenet++,
  title={SODAWideNet++: Combining Attention and Convolutions for Salient Object Detection},
  author={Dulam, Rohit Venkata Sai and Kambhamettu, Chandra},
  booktitle={International Conference on Pattern Recognition},
  pages={210--226},
  year={2025},
  organization={Springer}
}

About

Combining Attention and Convolutions for Salient Object Detection

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages