Skip to content

Wrapper of many OS libraries to finetune LLMs and run them in inference mode efficiently.

License

Notifications You must be signed in to change notification settings

giotto-ai/giotto-llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

giotto-llm

Overview

This repo is a wrapper of many open source packages that are needed to fine-tune LLMs and run them in inference mode effectively.

Important: See the Makefile for useful commands.

Documentation

The online documentation con be found at https://giotto-ai.github.io/giotto-llm/index.html

Setup

Environmental variables

The following environmental variables needs to be set (update credential path below):

GOOGLE_APPLICATION_CREDENTIALS=$HOME/.gloud/giotto-research-admin.json
export MLFLOW_TRACKING_URI=http://cluster-manager:5051
export PATH=$HOME/.local/bin:$PATH

Gitlab ssh-key

Generate key with ssh-keygen, and add to gitlab.

Install and download packages, download data and model

make

Finetune models on multi-gpu

Example of QwenVL-based model:

torchrun --nproc-per-node=gpu -m llm_prompts.finetuning -d re_arc_400x5 --model_id Qwen/Qwen2-VL-2B-Instruct --wrapper QwenVL -o qwenvl --batch_size 1 --gradient_accumulation_steps 16 --quantization 8bit-4 --neftune_noise_alpha 10.0 --num_train_epochs 15 --learning_rate 2e-4

Example of Molmo-based model:

torchrun --nproc-per-node=gpu -m llm_prompts.finetuning -d re_arc_400x5 --model_id allenai/MolmoE-1B-0924 --wrapper Molmo -o molmo --batch_size 1 --gradient_accumulation_steps 16 --quantization 8bit-4 --neftune_noise_alpha 10.0 --num_train_epochs 15 --learning_rate 2e-4

Example of Llama-based models:

torchrun --nproc-per-node=gpu -m llm_prompts.finetuning -d re_arc_400x5 --model_id meta-llama/Llama-3.2-1B-Instruct --wrapper CausalLM -o llama --batch_size 1 --gradient_accumulation_steps 16 --quantization 8bit-4 --neftune_noise_alpha 10.0 --num_train_epochs 15 --learning_rate 2e-4

Example of Qwen-based models:

torchrun --nproc-per-node=gpu -m llm_prompts.finetuning -d re_arc_400x5 --model_id Qwen/Qwen2.5-0.5B --wrapper CausalLM -o qwen --batch_size 1 --gradient_accumulation_steps 16 --quantization 8bit-4 --neftune_noise_alpha 10.0 --num_train_epochs 15 --learning_rate 2e-4

See full list of currently tested models in ./models/.

Validate fine-tuned models

The validation script is single gpu for now, and requires a config entry in ./llm_prompts/validation/__main__.py.

# Only single gpu support for now
CUDA_VISIBLE_DEVICES=0 python -m llm_prompts.validation --dataset_type evaluation --finetuned_model_id <MODEL-ID> --max_num_tasks 400

where <MODEL-ID> is a fine-tuned model defined in the config.

Info

See

--- CPU only ---
4 CPU Cores
30 Gigabytes of RAM

or

--- P100 GPU ---
1 Nvidia Tesla P100 GPI
4 CPU cores
29 Gigabytes of RAM

or

--- T4 2x GPU ---
2 Nvidia Tesla T4 GPUs
4 CPU cores
29 Gigabytes of RAM

Notes

  • The current version of this repository is using version 4.43.2 of the transformer package, which is different from version 4.42.3 in the Kaggle environemnt.
  • The current version of this repository is using version 1.7.1 of the polars package, which is different from version 1.1.0 in the Kaggle environemnt.

About

Wrapper of many OS libraries to finetune LLMs and run them in inference mode efficiently.

Resources

License

Stars

Watchers

Forks

Packages

No packages published