Move neuron_parallel_compile outside of bash script #706

jgray-aws · 2024-09-26T19:28:05Z

Currently, the tutorial call neuron_parallel_compile inside of the bash script. Because neuron_parallel_compile is responsible for setting $NEURON_EXTRACT_GRAPHS_ONLY, this causes the MAX_STEPS set to -1, causing compilation to run for >1 hour.

if [ "$NEURON_EXTRACT_GRAPHS_ONLY" = "1" ]; then
    MAX_STEPS=$((LOGGING_STEPS + 5))
else
    MAX_STEPS=-1
fi

optimum-neuron/docs/source/training_tutorials/sft_lora_finetune_llm.mdx

Lines 215 to 262 in 3748a06

 ```bash 

 #!/bin/bash 

 set -ex 

 export NEURON_FUSE_SOFTMAX=1 

 export NEURON_RT_ASYNC_EXEC_MAX_INFLIGHT_REQUESTS=3 

 export MALLOC_ARENA_MAX=64 

 export NEURON_CC_FLAGS="--model-type=transformer --distribution-strategy=llm-training --enable-saturate-infinity --cache_dir=/home/ubuntu/cache_dir_neuron/" 

 PROCESSES_PER_NODE=8 

 NUM_EPOCHS=1 

 TP_DEGREE=2 

 PP_DEGREE=1 

 BS=1 

 GRADIENT_ACCUMULATION_STEPS=8 

 LOGGING_STEPS=1 

 MODEL_NAME="meta-llama/Meta-Llama-3-8B" 

 OUTPUT_DIR=output-$SLURM_JOB_ID 

 if [ "$NEURON_EXTRACT_GRAPHS_ONLY" = "1" ]; then 

 MAX_STEPS=$((LOGGING_STEPS + 5)) 

 else 

 MAX_STEPS=-1 

 fi 

 XLA_USE_BF16=1 neuron_parallel_compile torchrun --nproc_per_node $PROCESSES_PER_NODE docs/source/training_tutorials/sft_lora_finetune_llm.py \ 

 --model_id $MODEL_NAME \ 

 --num_train_epochs $NUM_EPOCHS \ 

 --do_train \ 

 --learning_rate 5e-5 \ 

 --warmup_ratio 0.03 \ 

 --max_steps $MAX_STEPS \ 

 --per_device_train_batch_size $BS \ 

 --per_device_eval_batch_size $BS \ 

 --gradient_accumulation_steps $GRADIENT_ACCUMULATION_STEPS \ 

 --gradient_checkpointing true \ 

 --bf16 \ 

 --zero_1 false \ 

 --tensor_parallel_size $TP_DEGREE \ 

 --pipeline_parallel_size $PP_DEGREE \ 

 --logging_steps $LOGGING_STEPS \ 

 --save_total_limit 1 \ 

 --output_dir $OUTPUT_DIR \ 

 --lr_scheduler_type "constant" \ 

 --overwrite_output_dir 

 ```

We need to refactor the tutorial to call neuron_parallel_compile on the training script.

Example can be found here:

https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/training_llama_tp_zero1.html

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move neuron_parallel_compile outside of bash script #706

Move neuron_parallel_compile outside of bash script #706

jgray-aws commented Sep 26, 2024

Move neuron_parallel_compile outside of bash script #706

Move neuron_parallel_compile outside of bash script #706

Comments

jgray-aws commented Sep 26, 2024