LingoQA Reproducibility

This repo contains some scripts to reproduce/enhance the results presented in the LingoQA paper.

Important links

LingoQA Repository: Contains the evaluation script used to evaluate the models
Unsloth: Framework used for fine-tuning models

The scripts are designed to be run on a SLURM cluster. Still, instructions for both SLURM and non-SLURM environments are provided.

Installation

If you're using a SLURM cluster, make sure to load the necessary modules before running the scripts. Example: module load cuda

To set up the main environment, run:

chmod +x ./install.sh && ./install.sh

Additionally, you may need to install other environments for specific tasks:

LingoJudge:

chmod +x ./install_judge.sh && ./install_judge.sh

Unsloth for fine-tuning:

chmod +x ./install_unsloth.sh && ./install_unsloth.sh

Dataset & Model Downloads

Before running the models, download the required datasets and models:

conda activate lingo_main
chmod +x ./download_eval_dataset.sh && ./download_eval_dataset.sh
chmod +x ./download_models.sh && ./download_models.sh

If you plan to fine-tune models, you also need to download the training dataset:

chmod +x ./download_training.sh && ./download_training.sh

Usage

Running Pre-trained Models

If you only want to run inference using pre-trained models, you can use the scripts inside the slurm/ directory. If you're not using a SLURM cluster, you can extract the Python command from these scripts and run it directly.

Example:

# SLURM
sbatch slurm/job_qwen2vl_instruct_7b.sh
# Non-SLURM
python inference/inference_qwen2vl_instruct_7b.py val.parquet predictions_qwen2vl.csv

Fine-tuning Models

To fine-tune models, first ensure that the training dataset has been downloaded. Then, execute the appropriate training script:

# SLURM
sbatch slurm/job_finetune_qwen2vl.sh
# Non-SLURM
python training/finetune_qwen2vl.py

Evaluating Models

To evaluate the fine-tuned models, run the evaluation script provided by the LingoQA repository:

conda activate lingo_judge
python ./LingoQA/benchmark/evaluate.py --predictions_path ./path_to_predictions/predictions.csv

Notes

You may need to adjust --nodelist flags in the SLURM scripts to match your cluster's configuration.

Obtained Results

Model	Fine-tune	System Prompt	Size	Frame quantity	Accuracy (%)
InternVL 2.5	No	No	8B	5	49.0
InternVL 2.5	No	Yes	8B	5	47.4
Qwen2-VL	No	No	7B	5	50.2
Qwen2-VL	No	Yes	7B	5	52.2
Uforms	No	No	1.5B	1	46.4
GIT-base-textvqa	No	No	177M	5	31.2

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
LingoQA @ 39d86b1		LingoQA @ 39d86b1
inference		inference
slurm		slurm
training		training
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
download_eval_dataset.sh		download_eval_dataset.sh
download_models.sh		download_models.sh
download_training.sh		download_training.sh
install.sh		install.sh
install_judge.sh		install_judge.sh
install_miniconda.sh		install_miniconda.sh
install_unsloth.sh		install_unsloth.sh
lingo_main_requirements.txt		lingo_main_requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LingoQA Reproducibility

Important links

Installation

Dataset & Model Downloads

Usage

Running Pre-trained Models

Fine-tuning Models

Evaluating Models

Notes

Obtained Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

Adrian-Cespedes/LingoQA_Reproducibility

Folders and files

Latest commit

History

Repository files navigation

LingoQA Reproducibility

Important links

Installation

Dataset & Model Downloads

Usage

Running Pre-trained Models

Fine-tuning Models

Evaluating Models

Notes

Obtained Results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages