🔥 Accepted to CVPR 2025!
🌐 Language: 🇺🇸 English | 🇨🇳 中文
Recent advancements in model merging have shown great potential in combining capabilities from multiple large language models (LLMs). However, existing methods primarily focus on merging homogeneous models with identical architectures, struggling when applied to heterogeneous Multimodal Large Language Models (MLLMs) that differ in both architecture and parameter space.
We propose AdaMMS: Adaptive Mapping, Merging, and Searching — a novel unsupervised model merging framework tailored for heterogeneous MLLMs. AdaMMS tackles the challenges in three steps:
-
🧠 Mapping
Establish a mapping function between different model architectures. -
⚖️ Merging
Perform weighted linear interpolation to accommodate asymmetries in parameter space. -
🔍 Searching
Introduce an unsupervised hyperparameter search method to determine optimal merging coefficients.
📊 Extensive experiments show that AdaMMS consistently outperforms previous model merging methods on various vision-language benchmarks.
Here is the illustration of three steps in AdaMMS:
Here is the average results from different mnerging methods:
This is a visualization of the model outputs obtained with different alpha values:
⚠️ It's recommended to set up environments individually for each model, then install thelmms-eval
evaluation framework.
### ✅ Example: CogVLM
```bash
conda create -n lmms-cogvlm python=3.10
conda activate lmms-cogvlm
wget https://github.com/THUDM/CogVLM/blob/main/requirements.txt --no-check-certificate
pip install -r requirements.txt
python -m spacy download en_core_web_sm
git clone https://github.com/EvolvingLMMs-Lab/lmms-eval
cd lmms-eval && pip install -e .
conda install openjdk=8
########################
### ✅ Example: mPLUG-Owl
conda create -n lmms-mplug python=3.10
conda activate lmms-mplug
git clone https://github.com/X-PLUG/mPLUG-Owl.git
cd mPLUG-Owl/mPLUG-Owl2
pip install --upgrade pip && pip install -e .
git clone https://github.com/EvolvingLMMs-Lab/lmms-eval
cd lmms-eval && pip install -e .
conda install openjdk=8
pip install deepspeed # Optional for inference acceleration
Naming convention:
xxx2yyy.py
indicates merging modelxxx
into architectureyyy
.
Source Model | Target Model | Script File |
---|---|---|
LLaVA | CogVLM | llava2cogvlm.py |
mPLUG-Owl | CogVLM | mplugowl2cogvlm.py |
LLaVA-OneVision-Qwen | QwenVL2 | llava-qwen2qwenvl.py |
Source Model | Target Model | Script File |
---|---|---|
LLaVA | CogVLM | llava2cogvlm_ties_merging.py |
mPLUG-Owl | CogVLM | mplugowl2cogvlm_ties_merging.py |
LLaVA-OneVision-Qwen | QwenVL2 | llava-qwen2qwenvl_ties_merging.py |
📝 Refer to
runs/
for example scripts. Logging results helps identify the best alpha. You can find more details for inference in https://github.com/EvolvingLMMs-Lab/lmms-eva .
conda activate lmms-cogvlm
python $MERGE_SCRIPT --output $ckpt_path --alpha $alpha \
--base $BASE_MODEL_PATH --base_llava $LLAVA_PATH \
--interpolation
#!/bin/bash
for alpha in 1.0 0.9 0.8 0.7 0.6 0.5 0.4; do
echo "===> Alpha: $alpha"
# Merge
python3 $MERGE_SCRIPT --output $ckpt_path --alpha $alpha --interpolation \
--base COGVLM_PATH --llava_base LLAVA_PATH
# Evaluate
for task in "mme" "mmmu_val" "nocaps_val" "vizwiz_vqa_val" "seedbench" "gqa" "ok_vqa" "refcoco_bbox_testA" "refcocog_bbox_test" "refcoco+_bbox_testA" "mmbench" "ocrbench" ; do
CUDA_VISIBLE_DEVICES=$GPU accelerate launch \
--num_processes=1 \
-m lmms_eval \
--model cogvlm \
--model_args pretrained=$ckpt_path,... \
--tasks $task \
--log_samples \
--output_path $output_path
done
rm -rf $ckpt_path
done
After evaluating different alphas, run the following script to auto-select the best one:
python search/view_log_delta_perdata_search_limit.py
This will output the best alpha
and its performance logs.
- Check if parameter should be merged:
need_merge(key)
- Scale base model:
cogvlm_diff[key] = (cogvlm_chat[key] * alpha)
- Linear:
cogvlm_diff['lm_head.weight'] += llava['lm_head.weight']
- Non-linear: Call
do_merging()
ordo_merging_strategy()
fromties_merging.py
.
- Compatible with both
torch
andsafetensors
. - For
safetensors
, metadata is required.
We welcome PRs and issues! 🌟 AdaMMS aims to improve the efficiency of heterogeneous multimodal model merging and support your research in MLLMs.
If you find this project helpful, please cite:
@misc{du2025adamms,
title={AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization},
author={Yiyang Du and Xiaochen Wang and Chi Chen and Jiabo Ye and Yiru Wang and Peng Li and Ming Yan and Ji Zhang and Fei Huang and Zhifang Sui and Maosong Sun and Yang Liu},
year={2025},
eprint={2503.23733},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.23733},
}
👉 请点击此链接跳转 中文版 README