GitHub - Jingtao-Li-CVer/MMRAD: This is the official implementation for the paper "## Adapting the segment anything model for multi-modal retinal anomaly detection and localization (Information Fusion2025)".

Adapting the segment anything model for multi-modal retinal anomaly detection and localization (Information Fusion2025)

This is a PyTorch implementation of the MMRAD model:

@article{li2025adapting,
  title={Adapting the segment anything model for multi-modal retinal anomaly detection and localization},
  author={Li, Jingtao and Chen, Ting and Wang, Xinyu and Zhong, Yanfei and Xiao, Xuan},
  journal={Information Fusion},
  volume={113},
  pages={102631},
  year={2025},
  publisher={Elsevier}
}

Outline

Segment anything model is applied for the multi-modal retinal disease diagnosis.
Anomaly simulation and prompt-tuning strategies are used to fine-tune SAM.
A multi-task decoder is proposed for joint anomaly detection and localization.
The model can obtain the anomaly score and localize the lesion for each patient.

Introduction

The fusion of optical coherence tomography (OCT) and fundus modality information can provide a comprehensive diagnosis for retinal artery occlusion (RAO) disease, where OCT provides the cross-sectional examination of the fundus image. Given multi-modal retinal images, an anomaly diagnosis model can discriminate RAO without the need for real diseased samples. Despite this, previous studies have only focused on single-modal diagnosis, because of: 1) the lack of paired modality samples; and 2) the significant imaging differences, which make the fusion difficult with small-scale medical data.

In this paper, we describe how we first built a multi-modal RAO dataset including both OCT and fundus modalities, which supports both the anomaly detection and localization tasks with pixel-level annotation. Motivated by the powerful generalization ability of the recent visual foundation model known as the Segment Anything Model (SAM), we adapted it for our task considering the small-scale property of retinal samples. Specifically, a modality-shared decoder with task-specific tokens is introduced to make SAM support the multi-modal image setting, which includes a mask token for the anomaly localization task at the pixel level and a fusion token for the anomaly detection task at the case level. Since SAM has little medical knowledge and lacks the learning of the “normal” concept, it is infeasible to localize RAO anomalies in the zero-shot manner. To integrate expert retinal knowledge while keeping the general segmentation knowledge, general anomaly simulation for both modalities and a low-level prompt-tuning strategy are introduced.

Preparation

Install required packages according to the requirements.txt.
Organize datasets as provided strcture in data folder, where each case corresponds to a folder.
Download the SAM-b checkpoint and put in the weights folder.

Model Training and Testing

MMRAD is trained with normal samples only and can infer the anomaly samples directly.
Starting the training and testing process using the following one command.

python run.py

Qualitative result

The following are the exemplified localization results of MMRAD on OCT and fundus modalities.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
__pycache__		__pycache__
configs		configs
data		data
datasets		datasets
figs		figs
models		models
__init__.py		__init__.py
auroc.py		auroc.py
img_io.py		img_io.py
random_seed.py		random_seed.py
readme.md		readme.md
requirements.txt		requirements.txt
run.py		run.py
savefig.py		savefig.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adapting the segment anything model for multi-modal retinal anomaly detection and localization (Information Fusion2025)

Outline

Introduction

Preparation

Model Training and Testing

Qualitative result

About

Releases

Packages

Jingtao-Li-CVer/MMRAD

Folders and files

Latest commit

History

Repository files navigation

Adapting the segment anything model for multi-modal retinal anomaly detection and localization (Information Fusion2025)

Outline

Introduction

Preparation

Model Training and Testing

Qualitative result

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages