训练llama3-8b-it报错 #256

wx971025 · 2024-05-15T08:23:46Z

我严格按照README安装了相关的包
pip install -r requirements.txt
pip install git+https://github.com/unslothai/unsloth.git
pip install bitsandbytes==0.43.1
pip install peft==0.10.0
pip install torch==2.2.2
pip install xformers==0.0.25.post1
启动参数

export CUDA_VISIBLE_DEVICES=2,3,4,5,6,7
torchrun --nproc_per_node=6 train.py \
               --train_args_file train_args/sft/qlora/llama3-8b-sft-qlora.json

我使用了6xA800
ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. Make sure you loaded the model on th e correct device using for example device_map={'':torch.cuda.current_device() or device_map={'':torch.xpu.current_device()}`
这个错误是为什么呢

The text was updated successfully, but these errors were encountered:

Parasolation · 2024-05-15T08:51:26Z

能看看train_args么

wx971025 · 2024-05-15T08:54:06Z

能看看train_args么

{
    "output_dir": "output/firefly-llama3-8b-sft-qlora",
    "model_name_or_path": "/data1/models/llms/llama3_8b_it",
    "train_file": "./data/llama3/dummy_data.jsonl",
    "template_name": "llama3",
    "num_train_epochs": 2,
    "per_device_train_batch_size": 1,
    "gradient_accumulation_steps": 1,
    "learning_rate": 2e-4,
    "max_seq_length": 1024,
    "logging_steps": 100,
    "save_steps": 100,
    "save_total_limit": 1,
    "lr_scheduler_type": "constant_with_warmup",
    "warmup_steps": 100,
    "lora_rank": 64,
    "lora_alpha": 16,
    "lora_dropout": 0.05,
    "use_unsloth": true,

    "gradient_checkpointing": true,
    "disable_tqdm": false,
    "optim": "paged_adamw_32bit",
    "seed": 42,
    "fp16": true,
    "report_to": "tensorboard",
    "dataloader_num_workers": 10,
    "save_strategy": "steps",
    "weight_decay": 0,
    "max_grad_norm": 0.3,
    "remove_unused_columns": false
}

似乎是accelerate本身的问题，我不用unsloth就没有问题。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

训练llama3-8b-it报错 #256

训练llama3-8b-it报错 #256

wx971025 commented May 15, 2024 •

edited

Parasolation commented May 15, 2024

wx971025 commented May 15, 2024

训练llama3-8b-it报错 #256

训练llama3-8b-it报错 #256

Comments

wx971025 commented May 15, 2024 • edited

Parasolation commented May 15, 2024

wx971025 commented May 15, 2024

wx971025 commented May 15, 2024 •

edited