This repo is a wrapper of many open source packages that are needed to fine-tune LLMs and run them in inference mode effectively.
Important: See the Makefile
for useful commands.
The online documentation con be found at https://giotto-ai.github.io/giotto-llm/index.html
The following environmental variables needs to be set (update credential path below):
GOOGLE_APPLICATION_CREDENTIALS=$HOME/.gloud/giotto-research-admin.json
export MLFLOW_TRACKING_URI=http://cluster-manager:5051
export PATH=$HOME/.local/bin:$PATH
Generate key with ssh-keygen
, and add to gitlab.
make
Example of QwenVL-based model:
torchrun --nproc-per-node=gpu -m llm_prompts.finetuning -d re_arc_400x5 --model_id Qwen/Qwen2-VL-2B-Instruct --wrapper QwenVL -o qwenvl --batch_size 1 --gradient_accumulation_steps 16 --quantization 8bit-4 --neftune_noise_alpha 10.0 --num_train_epochs 15 --learning_rate 2e-4
Example of Molmo-based model:
torchrun --nproc-per-node=gpu -m llm_prompts.finetuning -d re_arc_400x5 --model_id allenai/MolmoE-1B-0924 --wrapper Molmo -o molmo --batch_size 1 --gradient_accumulation_steps 16 --quantization 8bit-4 --neftune_noise_alpha 10.0 --num_train_epochs 15 --learning_rate 2e-4
Example of Llama-based models:
torchrun --nproc-per-node=gpu -m llm_prompts.finetuning -d re_arc_400x5 --model_id meta-llama/Llama-3.2-1B-Instruct --wrapper CausalLM -o llama --batch_size 1 --gradient_accumulation_steps 16 --quantization 8bit-4 --neftune_noise_alpha 10.0 --num_train_epochs 15 --learning_rate 2e-4
Example of Qwen-based models:
torchrun --nproc-per-node=gpu -m llm_prompts.finetuning -d re_arc_400x5 --model_id Qwen/Qwen2.5-0.5B --wrapper CausalLM -o qwen --batch_size 1 --gradient_accumulation_steps 16 --quantization 8bit-4 --neftune_noise_alpha 10.0 --num_train_epochs 15 --learning_rate 2e-4
See full list of currently tested models in ./models/
.
The validation script is single gpu for now, and requires a config entry in ./llm_prompts/validation/__main__.py
.
# Only single gpu support for now
CUDA_VISIBLE_DEVICES=0 python -m llm_prompts.validation --dataset_type evaluation --finetuned_model_id <MODEL-ID> --max_num_tasks 400
where <MODEL-ID>
is a fine-tuned model defined in the config.
See
- ARC technical guide: specs
--- CPU only ---
4 CPU Cores
30 Gigabytes of RAM
or
--- P100 GPU ---
1 Nvidia Tesla P100 GPI
4 CPU cores
29 Gigabytes of RAM
or
--- T4 2x GPU ---
2 Nvidia Tesla T4 GPUs
4 CPU cores
29 Gigabytes of RAM
- Awesome ARC: lots of information
- Lots of data: millions of synthetically generated tasks
- Unsloth: fine-tune LLMs. Also with quantization.
- Fine-tuning example: how to fine-tune and quantize a LLM model from HugginFace
- The current version of this repository is using version
4.43.2
of thetransformer
package, which is different from version4.42.3
in the Kaggle environemnt. - The current version of this repository is using version
1.7.1
of thepolars
package, which is different from version1.1.0
in the Kaggle environemnt.