New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V100推理internVL-1.5-Int8问题 #890
Comments
贴下执行命令 |
执行命令1: 执行命令2: python test_int8.py import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import (
get_model_tokenizer, get_template, inference, ModelType,
get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch
model_type = ModelType.internvl_chat_v1_5_int8
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')
model, tokenizer = get_model_tokenizer(model_type, torch.float16,
model_kwargs={'device_map': 'auto'},
use_flash_attn = False)
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
query = 'How far is it from each city?'
response, history = inference(model, template, query, images=images) # chat with image
print(f'query: {query}')
print(f'response: {response}')
query = 'Which city is the farthest?'
gen = inference_stream(model, template, query, history) # chat withoud image
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
delta = response[print_idx:]
print(delta, end='', flush=True)
print_idx = len(response)
print()
print(f'history: {history}') 执行后结果有点异常 Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
FlashAttention is not installed.
Unused kwargs: ['quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Loading checkpoint shards: 100%|██████████████████████| 6/6 [00:58<00:00, 9.77s/it]
[INFO:swift] model.max_model_len: None
[INFO:swift] Global seed set to 42
query: How far is it from each city?
response: <unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk>
query: Which city is the farthest?
response: The data</s>
history: [['How far is it from each city?', ' <unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk>], ['Which city is the farthest?', 'The data</s>']] |
@rTrQqgH74lc2PT5k 你好 swift infer 用示例中的图片url会报错吗 比如http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png |
我用的是huggingface上下载的int8模型,没有用modelscope上的。请问有可能是这个原因吗
|
@rTrQqgH74lc2PT5k 我推测可能是运行的时候OOM了,你能找一张比较小的图片或者多卡推理试下吗。运行的时候关注一下显存占用 |
int8我和题主遇到了一样的报错,我按您说的使用了两张32G卡加载int8,确定没有oom。 |
@NLP-Learning @rTrQqgH74lc2PT5k 如果传--dtype bf16 可以正常跑吗 ref: OpenGVLab/InternVL#144 |
可以的,Int8加上这个参数就可以正常跑了!感谢!不管是单卡还是多卡都可以正常跑,一张32G的V100显存基本可以用满。 |
@NLP-Learning 方便测试一下--dtype bf16 下sft可以正常运行吗 |
bf16 is only supported on Ampere GPU, |
It is a bit strange, have you figured out the reason? |
我刚拉取了最新代码安装了最新依赖,分别试了一下: CUDA_VISIBLE_DEVICES=1,2,4,5,6 swift sft \
--model_type internvl-chat-v1_5 \
--dataset coco-mini-en-2 \
--model_id_or_path /data/InternVL-Chat-V1-5 \
--use_flash_attn false 最后会报错:
(2) CUDA_VISIBLE_DEVICES=1,2,5,6 swift sft \
--model_type internvl-chat-v1_5 \
--dataset coco-mini-en-2 \
--model_id_or_path /data/InternVL-Chat-V1-5 \
--use_flash_attn false \
--deepspeed default-zero2 会报错:
(3) CUDA_VISIBLE_DEVICES=1,2,4,5,6 swift sft \
--model_type internvl-chat-v1_5 \
--dataset coco-mini-en-2 \
--model_id_or_path /data/InternVL-Chat-V1-5 \
--use_flash_attn false \
--dtype bf16 会报错:
就尝试了这么多,但前两个错误应该是好解决的,此外V100确实不支持bf16,但推理的时候加上 |
感谢测试! |
@hjh0119 您好,我拉了一下最新的代码,有什么额外的依赖需要安装码?我测试的微调 1.5int8的命令仍然出现了错误,我的显卡是A6000:
报错是 |
Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
inf
,nan
or element < 0则推理出来后response是乱码
Your hardware and system info
Pytorch 2.2, CUDA 11.8
The text was updated successfully, but these errors were encountered: