Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V100推理internVL-1.5-Int8问题 #890

Closed
rTrQqgH74lc2PT5k opened this issue May 9, 2024 · 19 comments
Closed

V100推理internVL-1.5-Int8问题 #890

rTrQqgH74lc2PT5k opened this issue May 9, 2024 · 19 comments

Comments

@rTrQqgH74lc2PT5k
Copy link

Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)

  1. 使用最新代码后,使用int8模型后会报错 RuntimeError: probability tensor contains either inf, nan or element < 0
  2. 如果不使用命令行,使用推理脚本
    则推理出来后response是乱码
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
FlashAttention is not installed.
Unused kwargs: ['quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Loading checkpoint shards: 100%|██████████████████████| 6/6 [00:58<00:00,  9.77s/it]
[INFO:swift] model.max_model_len: None
[INFO:swift] Global seed set to 42
query: How far is it from each city?
response:  <unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk>
query: Which city is the farthest?
response: The data</s>
history: [['How far is it from each city?', ' <unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk>], ['Which city is the farthest?', 'The data</s>']]

Your hardware and system info

Pytorch 2.2, CUDA 11.8

@hjh0119
Copy link
Collaborator

hjh0119 commented May 9, 2024

贴下执行命令

@rTrQqgH74lc2PT5k
Copy link
Author

rTrQqgH74lc2PT5k commented May 9, 2024

执行命令1:CUDA_VISIBLE_DEVICES=0 swift infer --model_type internvl-chat-v1_5-int8 --use_flash_attn false
模型可以成功加载,加载后输入prompt + image url, 就会报错 RuntimeError: probability tensor contains either inf, nan or element < 0

执行命令2: python test_int8.py

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
    get_model_tokenizer, get_template, inference, ModelType,
    get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch

model_type = ModelType.internvl_chat_v1_5_int8
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')
model, tokenizer = get_model_tokenizer(model_type, torch.float16,
                                        model_kwargs={'device_map': 'auto'},
                                        use_flash_attn = False)

model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
query = 'How far is it from each city?'
response, history = inference(model, template, query, images=images)  # chat with image
print(f'query: {query}')
print(f'response: {response}')

query = 'Which city is the farthest?'
gen = inference_stream(model, template, query, history)  # chat withoud image
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
    delta = response[print_idx:]
    print(delta, end='', flush=True)
    print_idx = len(response)
print()
print(f'history: {history}')

执行后结果有点异常

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
FlashAttention is not installed.
Unused kwargs: ['quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Loading checkpoint shards: 100%|██████████████████████| 6/6 [00:58<00:00,  9.77s/it]
[INFO:swift] model.max_model_len: None
[INFO:swift] Global seed set to 42
query: How far is it from each city?
response:  <unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk>
query: Which city is the farthest?
response: The data</s>
history: [['How far is it from each city?', ' <unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk>], ['Which city is the farthest?', 'The data</s>']]

@hjh0119
Copy link
Collaborator

hjh0119 commented May 9, 2024

@rTrQqgH74lc2PT5k 你好 swift infer 用示例中的图片url会报错吗 比如http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png

@hjh0119
Copy link
Collaborator

hjh0119 commented May 9, 2024

结果异常有什么报错信息吗? 没能复现
image

@rTrQqgH74lc2PT5k
Copy link
Author

rTrQqgH74lc2PT5k commented May 9, 2024

我用的是huggingface上下载的int8模型,没有用modelscope上的。请问有可能是这个原因吗

ms-swift                      2.1.0.dev0
transformers                  4.40.0
bitsandbytes                  0.43.1
timm                          0.9.16

@rTrQqgH74lc2PT5k
Copy link
Author

企业微信截图_17152509956502

不使用图片,可以得到正常的输出,
加了图片,输出就异常了

@hjh0119
Copy link
Collaborator

hjh0119 commented May 9, 2024

swift infer 正常推理是先输入query内容回车,然后会提示输入Input a media path or URL,再输入图片路径或者链接
image

我的执行命令

CUDA_VISIBLE_DEVICES=0 swift infer --model_type internvl-chat-v1_5-int8  --use_flash_attn false

@rTrQqgH74lc2PT5k
Copy link
Author

image

相同的做法,但还是报错了。

@hjh0119
Copy link
Collaborator

hjh0119 commented May 9, 2024

@rTrQqgH74lc2PT5k 我推测可能是运行的时候OOM了,你能找一张比较小的图片或者多卡推理试下吗。运行的时候关注一下显存占用

@NLP-Learning
Copy link

@rTrQqgH74lc2PT5k 我推测可能是运行的时候OOM了,你能找一张比较小的图片或者多卡推理试下吗。运行的时候关注一下显存占用

int8我和题主遇到了一样的报错,我按您说的使用了两张32G卡加载int8,确定没有oom。

@hjh0119
Copy link
Collaborator

hjh0119 commented May 10, 2024

@NLP-Learning @rTrQqgH74lc2PT5k 如果传--dtype bf16 可以正常跑吗 ref: OpenGVLab/InternVL#144

@NLP-Learning
Copy link

@NLP-Learning @rTrQqgH74lc2PT5k 如果传--dtype bf16 可以正常跑吗 ref: OpenGVLab/InternVL#144

可以的,Int8加上这个参数就可以正常跑了!感谢!不管是单卡还是多卡都可以正常跑,一张32G的V100显存基本可以用满。
image

@hjh0119
Copy link
Collaborator

hjh0119 commented May 10, 2024

@NLP-Learning 方便测试一下--dtype bf16 下sft可以正常运行吗

@BIGBALLON
Copy link

BIGBALLON commented May 10, 2024

@NLP-Learning 方便测试一下--dtype bf16 下sft可以正常运行吗

bf16 is only supported on Ampere GPU, --dtype bf16 will cause an error on V100 GPU
but if --dtype bf16 is not setted, it may cause some other problem(e.g. The result of the inference mentioned above is garbled code).

@hjh0119
Copy link
Collaborator

hjh0119 commented May 10, 2024

bf16 is only supported on Ampere GPU, --dtype bf16 will cause an error on V100 GPU but if --dtype bf16 is not setted, it may cause some other problem(e.g. The result of the inference mentioned above is garbled code).

It is a bit strange, have you figured out the reason?

@NLP-Learning
Copy link

@NLP-Learning 方便测试一下--dtype bf16 下sft可以正常运行吗

我刚拉取了最新代码安装了最新依赖,分别试了一下:
(1)用了5张32G V100,4张会报OOM

CUDA_VISIBLE_DEVICES=1,2,4,5,6 swift sft \
    --model_type  internvl-chat-v1_5 \
    --dataset coco-mini-en-2 \
    --model_id_or_path /data/InternVL-Chat-V1-5 \
    --use_flash_attn false

最后会报错:

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cuda:1)
Train:   0%|                                                                                                                                            | 0/1250 [00:04<?, ?it/s]

(2)

CUDA_VISIBLE_DEVICES=1,2,5,6 swift sft \
    --model_type  internvl-chat-v1_5 \
    --dataset coco-mini-en-2 \
    --model_id_or_path /data/InternVL-Chat-V1-5 \
    --use_flash_attn false \
    --deepspeed default-zero2

会报错:

ValueError: DeepSpeed is not compatible with MP. n_gpu: 4, local_world_size: 1.

(3)

CUDA_VISIBLE_DEVICES=1,2,4,5,6 swift sft \
    --model_type  internvl-chat-v1_5 \
    --dataset coco-mini-en-2 \
    --model_id_or_path /data/InternVL-Chat-V1-5 \
    --use_flash_attn false \
    --dtype bf16

会报错:

ValueError: Your setup doesn't support bf16/gpu. You need torch>=1.10, using Ampere GPU with cuda>=11.0

就尝试了这么多,但前两个错误应该是好解决的,此外V100确实不支持bf16,但推理的时候加上--dtype bf16就可以正常推理,不知道是什么原因。

@hjh0119
Copy link
Collaborator

hjh0119 commented May 13, 2024

就尝试了这么多,但前两个错误应该是好解决的,此外V100确实不支持bf16,但推理的时候加上--dtype bf16就可以正常推理,不知道是什么原因。

感谢测试!
第一个原因是模型device map对五张卡这种支持不友好,2/4卡应该没问题
第二个原因是device map和deepspeed不能联用
第三个看来V100微调还是不能用bf16,那我还是将默认的dtype去掉好了,sft用float16应该没问题

@MVP-D77
Copy link

MVP-D77 commented May 15, 2024

@hjh0119 您好,我拉了一下最新的代码,有什么额外的依赖需要安装码?我测试的微调 1.5int8的命令仍然出现了错误,我的显卡是A6000:

CUDA_VISIBLE_DEVICES=4,5,6,7 swift sft --model_type  internvl-chat-v1_5-int8 --dataset coco-mini-en-2 --model_id_or_path  /xxxx/InternVL/pretrained/InternVL-Chat-V1-5-Int8 

报错是RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cuda:1)
2卡 和4卡 都会报这个相同的错误

@hjh0119
Copy link
Collaborator

hjh0119 commented May 15, 2024

@MVP-D77 fixed #937

@hjh0119 hjh0119 closed this as completed May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants