Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #116

Open
SparkJiao opened this issue Aug 11, 2023 · 0 comments

Comments

@SparkJiao
Copy link

Hi, recently I was running LLaMA-2 with tensor-parallel inference through generate method and I encounter this problem.

Here is the error msg:

[2023-08-11 23:43:09,855][FK.general_util.evaluator][INFO] - ***** Running evaluation test.test *****                                                                                                   │|    0   N/A  N/A     53265      C   ...avishankar1/tc/bin/python    16531MiB |
[2023-08-11 23:43:09,855][FK.general_util.evaluator][INFO] -   Num examples = 1569                                                                                                                      │|    0   N/A  N/A   4088900      C   ...avishankar1/tc/bin/python    32031MiB |
[2023-08-11 23:43:09,856][FK.general_util.evaluator][INFO] -   Batch size = 1                                                                                                                           │|    1   N/A  N/A     53265      C   ...avishankar1/tc/bin/python      331MiB |
Evaluating:   0%|          | 0/1569 [00:00<?, ?it/s]                                                                                                                                                    │|    1   N/A  N/A   1331281      C   .../envs/torch2.0/bin/python    14639MiB |
Error executing job with overrides: ['ddp_eval=False']                                                                                                                                                  │|    1   N/A  N/A   4099212      C   ...avishankar1/tc/bin/python     2087MiB |
Traceback (most recent call last):                                                                                                                                                                      │|    2   N/A  N/A   1331282      C   .../envs/torch2.0/bin/python    14667MiB |
  File "/export/home2/fangkai/merit-v2/trainer_base_fsdp_v4.py", line 464, in <module>                                                                                                                  │|    2   N/A  N/A   4099212      C   ...avishankar1/tc/bin/python     3915MiB |
    main()                                                                                                                                                                                              │|    3   N/A  N/A   1546448      C   ...da3/envs/torch/bin/python    11611MiB |
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/main.py", line 90, in decorated_main                                                                            │|    3   N/A  N/A   4099212      C   ...avishankar1/tc/bin/python    24771MiB |
    _run_hydra(                                                                                                                                                                                         │|    4   N/A  N/A   1060990      C   ...da3/envs/torch/bin/python    18553MiB |
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/_internal/utils.py", line 389, in _run_hydra                                                                    │|    5   N/A  N/A   2376820      C   ...da3/envs/torch/bin/python    42357MiB |
    _run_app(                                                                                                                                                                                           │|    5   N/A  N/A   4099212      C   ...avishankar1/tc/bin/python      349MiB |
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/_internal/utils.py", line 452, in _run_app                                                                      │|    6   N/A  N/A   4099212      C   ...avishankar1/tc/bin/python    32031MiB |
    run_and_report(                                                                                                                                                                                     │|    7   N/A  N/A    310278      C   ...nvs/retrieval/bin/python3     1333MiB |
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/_internal/utils.py", line 216, in run_and_report                                                                │+-----------------------------------------------------------------------------+
    raise ex                                                                                                                                                                                            │(base) fangkai@scsehg:~$ nvi8
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/_internal/utils.py", line 213, in run_and_report                                                                │Fri Aug 11 23:03:01 2023
    return func()                                                                                                                                                                                       │+-----------------------------------------------------------------------------+
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/_internal/utils.py", line 453, in <lambda>                                                                      │| NVIDIA-SMI 510.39.01    Driver Version: 510.39.01    CUDA Version: 11.6     |
    lambda: hydra.run(                                                                                                                                                                                  │|-------------------------------+----------------------+----------------------+
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 132, in run                                                                           │| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    _ = ret.return_value                                                                                                                                                                                │| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/core/utils.py", line 260, in return_value                                                                       │|                               |                      |               MIG M. |
    raise self._return_value                                                                                                                                                                            │|===============================+======================+======================|
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/hydra/core/utils.py", line 186, in run_job                                                                            │|   0  NVIDIA RTX A6000    On   | 00000000:01:00.0 Off |                  Off |
    ret.return_value = task_function(task_cfg)                                                                                                                                                          │| 37%   67C    P2   124W / 300W |  48572MiB / 49140MiB |     43%      Default |
  File "/export/home2/fangkai/merit-v2/trainer_base_fsdp_v4.py", line 436, in main                                                                                                                      │|                               |                      |                  N/A |
    result = evaluate(cfg, model, tokenizer, prefix=prefix, _split=split)                                                                                                                               │+-------------------------------+----------------------+----------------------+
  File "/export/home2/fangkai/merit-v2/general_util/evaluator.py", line 227, in evaluate_fn                                                                                                             │|   1  NVIDIA RTX A6000    On   | 00000000:24:00.0 Off |                  Off |
    outputs, pred_res = eval_forward_fn(batch)                                                                                                                                                          │| 30%   38C    P8    22W / 300W |   2438MiB / 49140MiB |      0%      Default |
  File "/export/home2/fangkai/merit-v2/general_util/evaluator.py", line 470, in __call__                                                                                                                │|                               |                      |                  N/A |
    decoding_outputs = self.model.generate(**batch, generation_config=self.generation_config)                                                                                                           │+-------------------------------+----------------------+----------------------+
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context                                                            │|   2  NVIDIA RTX A6000    On   | 00000000:41:00.0 Off |                  Off |
    return func(*args, **kwargs)                                                                                                                                                                        │| 30%   41C    P8    29W / 300W |   3931MiB / 49140MiB |      0%      Default |
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/transformers/generation/utils.py", line 1588, in generate                                                             │|                               |                      |                  N/A |
    return self.sample(                                                                                                                                                                                 │+-------------------------------+----------------------+----------------------+
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/transformers/generation/utils.py", line 2642, in sample                                                               │|   3  NVIDIA RTX A6000    On   | 00000000:61:00.0 Off |                  Off |
    outputs = self(                                                                                                                                                                                     │| 30%   38C    P2    69W / 300W |  36398MiB / 49140MiB |      0%      Default |
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl                                                                 │|                               |                      |                  N/A |
    return forward_call(*args, **kwargs)                                                                                                                                                                │+-------------------------------+----------------------+----------------------+
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 806, in forward                                                    │|   4  NVIDIA RTX A6000    On   | 00000000:81:00.0 Off |                  Off |
    outputs = self.model(                                                                                                                                                                               │| 30%   33C    P2    73W / 300W |  18555MiB / 49140MiB |      0%      Default |
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl                                                                 │|                               |                      |                  N/A |
    return forward_call(*args, **kwargs)                                                                                                                                                                │+-------------------------------+----------------------+----------------------+
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 693, in forward                                                    │|   5  NVIDIA RTX A6000    On   | 00000000:A1:00.0 Off |                  Off |
    layer_outputs = decoder_layer(                                                                                                                                                                      │| 30%   44C    P2    99W / 300W |  42718MiB / 49140MiB |      0%      Default |
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl                                                                 │|                               |                      |                  N/A |
    return forward_call(*args, **kwargs)                                                                                                                                                                │+-------------------------------+----------------------+----------------------+
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 408, in forward                                                    │|   6  NVIDIA RTX A6000    On   | 00000000:C1:00.0 Off |                  Off |
    hidden_states, self_attn_weights, present_key_value = self.self_attn(                                                                                                                               │| 36%   66C    P2   188W / 300W |  32047MiB / 49140MiB |    100%      Default |
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl                                                                 │|                               |                      |                  N/A |
    return forward_call(*args, **kwargs)                                                                                                                                                                │+-------------------------------+----------------------+----------------------+
  File "/export/home2/fangkai/merit-v2/models/llama.py", line 75, in _forward                                                                                                                           │|   7  NVIDIA RTX A6000    On   | 00000000:E1:00.0 Off |                  Off |
    query_states = self.q_proj(hidden_states)                                                                                                                                                           │| 30%   38C    P2    73W / 300W |   1335MiB / 49140MiB |     13%      Default |
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl                                                                 │|                               |                      |                  N/A |
    return forward_call(*args, **kwargs)                                                                                                                                                                │+-------------------------------+----------------------+----------------------+
  File "/export/home2/fangkai/anaconda3/envs/torch2.0/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward                                                                     │
    return F.linear(input, self.weight, self.bias)                                                                                                                                                      │+-----------------------------------------------------------------------------+
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm)                       │| Processes:                                                                  |
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1346893) of binary: /export/home2/fangkai/anaconda3/envs/torch2.0/bin/python                               │|  GPU   GI   CI        PID   Type   Process name

It seems that the error happened at query projection.

The following is my initialization wrap:

def load_model_from_pretrained_tp(pretrained_model_name_or_path: str, *args, **kwargs):
    tp_sharded = kwargs.pop("tp_sharded", None)
    enable_flash_attention = kwargs.pop("enable_flash_attention", False)
    flash_attention_vanilla_torch = kwargs.pop("flash_attention_vanilla_torch", False)
    flash_attention_var_len = kwargs.pop("flash_attention_var_len", False)

    model = LlamaForCausalLM.from_pretrained(pretrained_model_name_or_path, *args, **kwargs)

    if enable_flash_attention:
        logger.info("⚡⚡⚡ enable llama flash attention.")

        layers = model.model.layers
        for layer in layers:
            llama_fast_attention_wrap(layer.self_attn, vanilla_torch=flash_attention_vanilla_torch, var_len=flash_attention_var_len)

    import tensor_parallel as tp
    import torch.distributed as dist

    n_gpus = torch.cuda.device_count()
    if not dist.is_initialized():
        model = tp.tensor_parallel(model, [torch.device(f"cuda:{i}") for i in range(n_gpus)], sharded=tp_sharded)
    else:
        model = tp.tensor_parallel(model, sharded=False)[0]
    return model

I noticed that you do not calling batch["input_ids"].to(device) method. When I remove this code I found that it will raise another error message that the inputs are on cpu.

version information:

transformers==4.31.0
torch==2.0.0
tensor-parallel==2.0.0

Thanks for your help very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant