Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama-2-7b-hf output almost OOOO #1078

Open
SXxinxiaosong opened this issue Mar 22, 2024 · 0 comments
Open

llama-2-7b-hf output almost OOOO #1078

SXxinxiaosong opened this issue Mar 22, 2024 · 0 comments

Comments

@SXxinxiaosong
Copy link

my code:

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import torch

from accelerate import Accelerator
from accelerate.utils import set_seed
from transformers import LlamaForCausalLM, LlamaTokenizer, LlamaConfig

set_seed(1234)

prefix_path = '/home/xsong/llama/Llama-2-7b-hf'
accelerator = Accelerator()
tokenizer = LlamaTokenizer.from_pretrained(prefix_path)
model = LlamaForCausalLM.from_pretrained(prefix_path,
# torch_dtype=torch.float16,
device_map=0)
tokenizer.pad_token = tokenizer.unk_token
tokenizer.padding_side = 'left'
input = 'What is the capital of China?'
batch = tokenizer.batch_encode_plus([input], padding=True, return_tensors='pt')
generation_config = {'do_sample' : True,
'num_beams' : 1,
'temperature' : 0.6,
'top_p' : 0.9,
'use_cache' : True,
'num_return_sequences' : 1,
'max_length' : 200,
'eos_token_id' : [2]}
b_out = model.generate(batch['input_ids'].cuda(), attention_mask=batch['attention_mask'].cuda(), **generation_config)
print(tokenizer.decode(b_out[0]))

output:
What is the capital of China?OOOOOOOOOOOOOOOOOOOOOOO2OOOOOOOOOOOOOOOOOOOO0OOOOOOOOOOOOOOOOOOOOOO0OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0OOOOOtOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO0OOOOOOOOOOOOO0OOOOOOOtOOOOOOOOOOOOO

almost OOOO?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant