合并llama3时出现如下报错，这个问题再使用zero3时也出现了 #686

1518630367 · 2024-05-14T06:16:55Z

Traceback (most recent call last):
File "/home/wumao/xtuner-main/xtuner/tools/model_converters/pth_to_hf.py", line 158, in
main()
File "/home/wumao/xtuner-main/xtuner/tools/model_converters/pth_to_hf.py", line 78, in main
model = BUILDER.build(cfg.model)
File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
obj = obj_cls(**args) # type: ignore
File "/home/wumao/xtuner-main/xtuner/model/sft.py", line 115, in init
self._prepare_for_lora(peft_model, use_activation_checkpointing)
File "/home/wumao/xtuner-main/xtuner/model/sft.py", line 144, in _prepare_for_lora
self.llm = get_peft_model(self.llm, self.lora)
File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/peft/mapping.py", line 136, in get_peft_model
return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config, adapter_name=adapter_name)
File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/peft/peft_model.py", line 1094, in init
super().init(model, peft_config, adapter_name)
File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/peft/peft_model.py", line 129, in init
self.base_model = cls(model, {adapter_name: peft_config}, adapter_name)
File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 136, in init
super().init(model, config, adapter_name)
File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 148, in init
self.inject_adapter(self.model, adapter_name)
File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 325, in inject_adapter
self._create_and_replace(peft_config, adapter_name, target, target_name, parent, current_key=key)
File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 220, in _create_and_replace
new_module = self._create_new_module(lora_config, adapter_name, target, **kwargs)
File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 295, in _create_new_module
new_module = dispatcher(target, adapter_name, lora_config=lora_config, **kwargs)
File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/peft/tuners/lora/bnb.py", line 506, in dispatch_bnb_4bit
new_module = Linear4bit(target, adapter_name, **fourbit_kwargs)
File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/peft/tuners/lora/bnb.py", line 293, in init
self.update_layer(
File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/peft/tuners/lora/layer.py", line 122, in update_layer
self.to(weight.device)
File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1173, in to
return self._apply(convert)
File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
module._apply(fn)
File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 779, in _apply
module._apply(fn)
File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 804, in _apply
param_applied = fn(param)
File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1166, in convert
raise NotImplementedError(
NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

Process finished with exit code 1

pppppM · 2024-05-14T06:22:29Z

显存不够导致模型中有些参数是 meta tensor，在命令后加一下 --device cpu

1518630367 · 2024-05-14T06:42:08Z

显存不够导致模型中有些参数是 meta tensor，在命令后加一下 --device cpu

按道理来说A100 80G 不应该会显存不够，看了一下显存没有占到1/8的时候就报错了，然后如果使用cpu会出现以下报错 pth_to_hf.py: error: unrecognized arguments: --device cpu

1518630367 · 2024-05-14T06:50:14Z

显存不够导致模型中有些参数是 meta tensor，在命令后加一下 --device cpu

使用旧版的xtuner 可以秒合并，新版的就会出现那个问题

pppppM · 2024-05-16T02:51:20Z

@1518630367 请问使用的新旧 xtuner 都是多少？

pppppM · 2024-05-17T08:56:14Z

@1518630367
这个问题已经被定位到并修复 #697

1518630367 changed the title ~~合并llama3时出现如下报错~~ 合并llama3时出现如下报错，这个问题再使用zero3时也出现了 May 14, 2024

pppppM linked a pull request May 17, 2024 that will close this issue

[Bug] The LoRA model will have meta-tensor during the pth_to_hf phase. #697

Merged

pppppM closed this as completed in #697 May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

合并llama3时出现如下报错，这个问题再使用zero3时也出现了 #686

合并llama3时出现如下报错，这个问题再使用zero3时也出现了 #686

1518630367 commented May 14, 2024

pppppM commented May 14, 2024

1518630367 commented May 14, 2024

1518630367 commented May 14, 2024

pppppM commented May 16, 2024

pppppM commented May 17, 2024

合并llama3时出现如下报错，这个问题再使用zero3时也出现了 #686

合并llama3时出现如下报错，这个问题再使用zero3时也出现了 #686

Comments

1518630367 commented May 14, 2024

pppppM commented May 14, 2024

1518630367 commented May 14, 2024

1518630367 commented May 14, 2024

pppppM commented May 16, 2024

pppppM commented May 17, 2024