changes within Repllama training and data script #105

thakur-nandan · 2024-01-14T07:51:14Z

I was running RepLLAMA on the refactor branch, but the RepLLAMA training and data scripts seem old.
I have started this PR as a starting point.

I already started changing the old arguments and updating them wherever possible.
Feel free to edit as there are places where there might be more changes required.

This fine-tuning code with RepLLAMA on the refactor branch:

export NCCL_P2P_DISABLE=1
export CACHE_DIR=<your_cache_dir>

deepspeed --include localhost:0,1,2 train.py \
  --deepspeed ds_config.json \
  --cache_dir $CACHE_DIR \
  --dataset_cache_dir $CACHE_DIR \
  --output_dir model_repllama \
  --model_name_or_path meta-llama/Llama-2-7b-hf \
  --save_steps 200 \
  --dataset_name Tevatron/msmarco-passage \
  --fp16 \
  --train_group_size 8 \
  --gradient_accumulation_steps 4 \
  --gradient_checkpointing \
  --pad_to_multiple_of 16 \
  --lora \
  --lora_target_modules "q_proj,v_proj,o_proj,down_proj,up_proj,gate_proj" \
  --learning_rate 1e-4 \
  --query_max_len 32 \
  --passage_max_len 196 \
  --num_train_epochs 1 \
  --logging_steps 10 \
  --overwrite_output_dir \
  --warmup_steps 100 \
  --dataset_number_of_shards 32

added new refactor arguments within repllama training and data script

af59cbb

MXueguang merged commit cb0c37b into texttron:refactor May 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

changes within Repllama training and data script #105

changes within Repllama training and data script #105

thakur-nandan commented Jan 14, 2024

changes within Repllama training and data script #105

changes within Repllama training and data script #105

Conversation

thakur-nandan commented Jan 14, 2024