This repo contains some scripts to reproduce/enhance the results presented in the LingoQA paper.
- LingoQA Repository: Contains the evaluation script used to evaluate the models
- Unsloth: Framework used for fine-tuning models
The scripts are designed to be run on a SLURM cluster. Still, instructions for both SLURM and non-SLURM environments are provided.
- If you're using a SLURM cluster, make sure to load the necessary modules before running the scripts. Example:
module load cuda
To set up the main environment, run:
chmod +x ./install.sh && ./install.sh
Additionally, you may need to install other environments for specific tasks:
- LingoJudge:
chmod +x ./install_judge.sh && ./install_judge.sh
- Unsloth for fine-tuning:
chmod +x ./install_unsloth.sh && ./install_unsloth.sh
Before running the models, download the required datasets and models:
conda activate lingo_main
chmod +x ./download_eval_dataset.sh && ./download_eval_dataset.sh
chmod +x ./download_models.sh && ./download_models.sh
If you plan to fine-tune models, you also need to download the training dataset:
chmod +x ./download_training.sh && ./download_training.sh
If you only want to run inference using pre-trained models, you can use the scripts inside the slurm/ directory. If you're not using a SLURM cluster, you can extract the Python command from these scripts and run it directly.
Example:
# SLURM
sbatch slurm/job_qwen2vl_instruct_7b.sh
# Non-SLURM
python inference/inference_qwen2vl_instruct_7b.py val.parquet predictions_qwen2vl.csv
To fine-tune models, first ensure that the training dataset has been downloaded. Then, execute the appropriate training script:
# SLURM
sbatch slurm/job_finetune_qwen2vl.sh
# Non-SLURM
python training/finetune_qwen2vl.py
To evaluate the fine-tuned models, run the evaluation script provided by the LingoQA repository:
conda activate lingo_judge
python ./LingoQA/benchmark/evaluate.py --predictions_path ./path_to_predictions/predictions.csv
- You may need to adjust
--nodelist
flags in the SLURM scripts to match your cluster's configuration.
Model | Fine-tune | System Prompt | Size | Frame quantity | Accuracy (%) |
---|---|---|---|---|---|
InternVL 2.5 | No | No | 8B | 5 | 49.0 |
InternVL 2.5 | No | Yes | 8B | 5 | 47.4 |
Qwen2-VL | No | No | 7B | 5 | 50.2 |
Qwen2-VL | No | Yes | 7B | 5 | 52.2 |
Uforms | No | No | 1.5B | 1 | 46.4 |
GIT-base-textvqa | No | No | 177M | 5 | 31.2 |