V100推理internVL-1.5-Int8问题 #890

rTrQqgH74lc2PT5k · 2024-05-09T07:16:58Z

Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图)

使用最新代码后，使用int8模型后会报错 RuntimeError: probability tensor contains either inf, nan or element < 0
如果不使用命令行，使用推理脚本
则推理出来后response是乱码

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
FlashAttention is not installed.
Unused kwargs: ['quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Loading checkpoint shards: 100%|██████████████████████| 6/6 [00:58<00:00,  9.77s/it]
[INFO:swift] model.max_model_len: None
[INFO:swift] Global seed set to 42
query: How far is it from each city?
response:  <unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk>
query: Which city is the farthest?
response: The data</s>
history: [['How far is it from each city?', ' <unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk>], ['Which city is the farthest?', 'The data</s>']]

Your hardware and system info

Pytorch 2.2, CUDA 11.8

The text was updated successfully, but these errors were encountered:

hjh0119 · 2024-05-09T07:27:34Z

贴下执行命令

rTrQqgH74lc2PT5k · 2024-05-09T07:33:14Z

执行命令1：CUDA_VISIBLE_DEVICES=0 swift infer --model_type internvl-chat-v1_5-int8 --use_flash_attn false
模型可以成功加载，加载后输入prompt + image url，就会报错 RuntimeError: probability tensor contains either inf, nan or element < 0

执行命令2： python test_int8.py

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
    get_model_tokenizer, get_template, inference, ModelType,
    get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch

model_type = ModelType.internvl_chat_v1_5_int8
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')
model, tokenizer = get_model_tokenizer(model_type, torch.float16,
                                        model_kwargs={'device_map': 'auto'},
                                        use_flash_attn = False)

model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
query = 'How far is it from each city?'
response, history = inference(model, template, query, images=images)  # chat with image
print(f'query: {query}')
print(f'response: {response}')

query = 'Which city is the farthest?'
gen = inference_stream(model, template, query, history)  # chat withoud image
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
    delta = response[print_idx:]
    print(delta, end='', flush=True)
    print_idx = len(response)
print()
print(f'history: {history}')

执行后结果有点异常

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
FlashAttention is not installed.
Unused kwargs: ['quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Loading checkpoint shards: 100%|██████████████████████| 6/6 [00:58<00:00,  9.77s/it]
[INFO:swift] model.max_model_len: None
[INFO:swift] Global seed set to 42
query: How far is it from each city?
response:  <unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk>
query: Which city is the farthest?
response: The data</s>
history: [['How far is it from each city?', ' <unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk>], ['Which city is the farthest?', 'The data</s>']]

hjh0119 · 2024-05-09T08:21:06Z

@rTrQqgH74lc2PT5k 你好 swift infer 用示例中的图片url会报错吗比如http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png

hjh0119 · 2024-05-09T08:37:59Z

结果异常有什么报错信息吗？没能复现

rTrQqgH74lc2PT5k · 2024-05-09T10:32:13Z

我用的是huggingface上下载的int8模型，没有用modelscope上的。请问有可能是这个原因吗

ms-swift                      2.1.0.dev0
transformers                  4.40.0
bitsandbytes                  0.43.1
timm                          0.9.16

rTrQqgH74lc2PT5k · 2024-05-09T10:38:31Z

不使用图片，可以得到正常的输出，
加了图片，输出就异常了

hjh0119 · 2024-05-09T11:11:01Z

swift infer 正常推理是先输入query内容回车，然后会提示输入Input a media path or URL，再输入图片路径或者链接

我的执行命令

CUDA_VISIBLE_DEVICES=0 swift infer --model_type internvl-chat-v1_5-int8  --use_flash_attn false

rTrQqgH74lc2PT5k · 2024-05-09T11:42:23Z

相同的做法，但还是报错了。

hjh0119 · 2024-05-09T12:44:48Z

@rTrQqgH74lc2PT5k 我推测可能是运行的时候OOM了，你能找一张比较小的图片或者多卡推理试下吗。运行的时候关注一下显存占用

NLP-Learning · 2024-05-10T01:47:26Z

@rTrQqgH74lc2PT5k 我推测可能是运行的时候OOM了，你能找一张比较小的图片或者多卡推理试下吗。运行的时候关注一下显存占用

int8我和题主遇到了一样的报错，我按您说的使用了两张32G卡加载int8，确定没有oom。

hjh0119 · 2024-05-10T02:13:43Z

@NLP-Learning @rTrQqgH74lc2PT5k 如果传--dtype bf16 可以正常跑吗 ref: OpenGVLab/InternVL#144

NLP-Learning · 2024-05-10T02:28:06Z

@NLP-Learning @rTrQqgH74lc2PT5k 如果传--dtype bf16 可以正常跑吗 ref: OpenGVLab/InternVL#144

可以的，Int8加上这个参数就可以正常跑了！感谢！不管是单卡还是多卡都可以正常跑，一张32G的V100显存基本可以用满。

hjh0119 · 2024-05-10T03:02:19Z

@NLP-Learning 方便测试一下--dtype bf16 下sft可以正常运行吗

BIGBALLON · 2024-05-10T04:26:02Z

@NLP-Learning 方便测试一下--dtype bf16 下sft可以正常运行吗

bf16 is only supported on Ampere GPU, --dtype bf16 will cause an error on V100 GPU
but if --dtype bf16 is not setted, it may cause some other problem(e.g. The result of the inference mentioned above is garbled code).

hjh0119 · 2024-05-10T08:49:30Z

bf16 is only supported on Ampere GPU, --dtype bf16 will cause an error on V100 GPU but if --dtype bf16 is not setted, it may cause some other problem(e.g. The result of the inference mentioned above is garbled code).

It is a bit strange, have you figured out the reason?

NLP-Learning · 2024-05-13T02:45:26Z

@NLP-Learning 方便测试一下--dtype bf16 下sft可以正常运行吗

我刚拉取了最新代码安装了最新依赖，分别试了一下：
（1）用了5张32G V100，4张会报OOM

CUDA_VISIBLE_DEVICES=1,2,4,5,6 swift sft \
    --model_type  internvl-chat-v1_5 \
    --dataset coco-mini-en-2 \
    --model_id_or_path /data/InternVL-Chat-V1-5 \
    --use_flash_attn false

最后会报错：

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cuda:1)
Train:   0%|                                                                                                                                            | 0/1250 [00:04<?, ?it/s]

（2）

CUDA_VISIBLE_DEVICES=1,2,5,6 swift sft \
    --model_type  internvl-chat-v1_5 \
    --dataset coco-mini-en-2 \
    --model_id_or_path /data/InternVL-Chat-V1-5 \
    --use_flash_attn false \
    --deepspeed default-zero2

会报错：

ValueError: DeepSpeed is not compatible with MP. n_gpu: 4, local_world_size: 1.

（3）

CUDA_VISIBLE_DEVICES=1,2,4,5,6 swift sft \
    --model_type  internvl-chat-v1_5 \
    --dataset coco-mini-en-2 \
    --model_id_or_path /data/InternVL-Chat-V1-5 \
    --use_flash_attn false \
    --dtype bf16

会报错：

ValueError: Your setup doesn't support bf16/gpu. You need torch>=1.10, using Ampere GPU with cuda>=11.0

就尝试了这么多，但前两个错误应该是好解决的，此外V100确实不支持bf16，但推理的时候加上--dtype bf16就可以正常推理，不知道是什么原因。

hjh0119 · 2024-05-13T03:28:02Z

就尝试了这么多，但前两个错误应该是好解决的，此外V100确实不支持bf16，但推理的时候加上--dtype bf16就可以正常推理，不知道是什么原因。

感谢测试！
第一个原因是模型device map对五张卡这种支持不友好，2/4卡应该没问题
第二个原因是device map和deepspeed不能联用
第三个看来V100微调还是不能用bf16，那我还是将默认的dtype去掉好了，sft用float16应该没问题

MVP-D77 · 2024-05-15T07:12:21Z

@hjh0119 您好，我拉了一下最新的代码，有什么额外的依赖需要安装码？我测试的微调 1.5int8的命令仍然出现了错误，我的显卡是A6000：

CUDA_VISIBLE_DEVICES=4,5,6,7 swift sft --model_type  internvl-chat-v1_5-int8 --dataset coco-mini-en-2 --model_id_or_path  /xxxx/InternVL/pretrained/InternVL-Chat-V1-5-Int8

报错是RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cuda:1)
2卡和4卡都会报这个相同的错误

hjh0119 · 2024-05-15T13:34:39Z

@MVP-D77 fixed #937

hjh0119 mentioned this issue May 10, 2024

Support Hqq and Eetq quantization #900

Merged

4 tasks

hjh0119 mentioned this issue May 15, 2024

fix Internvl-int8 device map #937

Merged

4 tasks

hjh0119 closed this as completed May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V100推理internVL-1.5-Int8问题 #890

V100推理internVL-1.5-Int8问题 #890

rTrQqgH74lc2PT5k commented May 9, 2024

hjh0119 commented May 9, 2024

rTrQqgH74lc2PT5k commented May 9, 2024 •

edited

hjh0119 commented May 9, 2024

hjh0119 commented May 9, 2024

rTrQqgH74lc2PT5k commented May 9, 2024 •

edited

rTrQqgH74lc2PT5k commented May 9, 2024

hjh0119 commented May 9, 2024

rTrQqgH74lc2PT5k commented May 9, 2024

hjh0119 commented May 9, 2024 •

edited

NLP-Learning commented May 10, 2024

hjh0119 commented May 10, 2024

NLP-Learning commented May 10, 2024

hjh0119 commented May 10, 2024

BIGBALLON commented May 10, 2024 •

edited

hjh0119 commented May 10, 2024

NLP-Learning commented May 13, 2024

hjh0119 commented May 13, 2024

MVP-D77 commented May 15, 2024

hjh0119 commented May 15, 2024

V100推理internVL-1.5-Int8问题 #890

V100推理internVL-1.5-Int8问题 #890

Comments

rTrQqgH74lc2PT5k commented May 9, 2024

hjh0119 commented May 9, 2024

rTrQqgH74lc2PT5k commented May 9, 2024 • edited

hjh0119 commented May 9, 2024

hjh0119 commented May 9, 2024

rTrQqgH74lc2PT5k commented May 9, 2024 • edited

rTrQqgH74lc2PT5k commented May 9, 2024

hjh0119 commented May 9, 2024

rTrQqgH74lc2PT5k commented May 9, 2024

hjh0119 commented May 9, 2024 • edited

NLP-Learning commented May 10, 2024

hjh0119 commented May 10, 2024

NLP-Learning commented May 10, 2024

hjh0119 commented May 10, 2024

BIGBALLON commented May 10, 2024 • edited

hjh0119 commented May 10, 2024

NLP-Learning commented May 13, 2024

hjh0119 commented May 13, 2024

MVP-D77 commented May 15, 2024

hjh0119 commented May 15, 2024

rTrQqgH74lc2PT5k commented May 9, 2024 •

edited

rTrQqgH74lc2PT5k commented May 9, 2024 •

edited

hjh0119 commented May 9, 2024 •

edited

BIGBALLON commented May 10, 2024 •

edited