Skip to content

ValueError: please provide at least one prompt #95

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
TikaToka opened this issue Mar 11, 2025 · 7 comments
Open

ValueError: please provide at least one prompt #95

TikaToka opened this issue Mar 11, 2025 · 7 comments

Comments

@TikaToka
Copy link

TikaToka commented Mar 11, 2025

Hello, thank you for sharing amazing work.

I am trying to evaluate my model with lm_eval --model vllm --model_args pretrained=ckpts/s1-20250310_141828,dtype=bfloat16,tensor_parallel_size=2 --tasks aime25_nofigures --batch_size auto --apply_chat_template --output_path s1.1forcingignore1wait --log_samples --gen_kwargs "max_gen_toks=20000,temperature=0,temperature_thinking=0,max_tokens_thinking=20000,thinking_n_ignore=1,thinking_n_ignore_str=Wait"

However, there is still an error

Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [05:40<00:00, 22.69s/it, est. speed input: 6.14 toks/s, output: 881.59 toks/s]
[rank0]: Traceback (most recent call last):                                                                                                     | 1/15 [05:40<1:19:24, 340.29s/it, est. speed input: 0.80 toks/s, output: 58.77 toks/s]
[rank0]:   File "/home/hslim/miniforge3/envs/s1/bin/lm_eval", line 8, in <module>
[rank0]:     sys.exit(cli_evaluate())
[rank0]:   File "/home/hslim/data/models/s1/eval/lm-evaluation-harness/lm_eval/__main__.py", line 394, in cli_evaluate
[rank0]:     results = evaluator.simple_evaluate(
[rank0]:   File "/home/hslim/data/models/s1/eval/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/home/hslim/data/models/s1/eval/lm-evaluation-harness/lm_eval/evaluator.py", line 301, in simple_evaluate
[rank0]:     results = evaluate(
[rank0]:   File "/home/hslim/data/models/s1/eval/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/home/hslim/data/models/s1/eval/lm-evaluation-harness/lm_eval/evaluator.py", line 506, in evaluate
[rank0]:     resps = getattr(lm, reqtype)(cloned_reqs)
[rank0]:   File "/home/hslim/data/models/s1/eval/lm-evaluation-harness/lm_eval/models/vllm_causallms.py", line 576, in generate_until
[rank0]:     cont = self._model_generate(
[rank0]:   File "/home/hslim/data/models/s1/eval/lm-evaluation-harness/lm_eval/models/vllm_causallms.py", line 339, in _model_generate
[rank0]:     outputs_tmp = self.model.generate(
[rank0]:   File "/home/hslim/miniforge3/envs/s1/lib/python3.10/site-packages/vllm/utils.py", line 1063, in inner
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/home/hslim/miniforge3/envs/s1/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 378, in generate
[rank0]:     parsed_prompts = self._convert_v1_inputs(
[rank0]:   File "/home/hslim/miniforge3/envs/s1/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 803, in _convert_v1_inputs
[rank0]:     p["content"] for p in parse_and_batch_prompt(prompt_token_ids)
[rank0]:   File "/home/hslim/miniforge3/envs/s1/lib/python3.10/site-packages/vllm/inputs/parse.py", line 43, in parse_and_batch_prompt
[rank0]:     raise ValueError("please provide at least one prompt")
[rank0]: ValueError: please provide at least one prompt

I've tried changing max_gen_toks and max_tokens_thinking but still does not help.

This is solved when I try with different model,
and It work with no wait.
lm_eval --model vllm --model_args pretrained=ckpts/s1-20250310_141828,dtype=bfloat16,tensor_parallel_size=2 --tasks aime24_figures,aime24_nofigures --batch_size auto --output_path dummy --log_samples --gen_kwargs "max_gen_toks=20000" "

However, budget forcing keep makes error.

I've also tried Inference code in README,

from vllm import LLM, SamplingParams
from transformers import AutoTokenizer

# Decide on a token limit for thinking; As the model's max tokens is 32768, 32000 usually ensures there is enough space for the model to still answer
MAX_TOKENS_THINKING = 32000 
# Decide how often to ignore end-of-thinking token
NUM_IGNORE = 1

model = LLM(
    "/home/hslim/data/models/s1/ckpts/s1-20250310_141828", # s1 originally gets this prompt wrong but with budget forcing it fixes it
    tensor_parallel_size=2,
)
tok = AutoTokenizer.from_pretrained(
    "/home/hslim/data/models/s1/ckpts/s1-20250310_141828"
)

stop_token_ids = tok("<|im_end|>")["input_ids"]
sampling_params = SamplingParams(
    max_tokens=32768,
    min_tokens=0,
    stop_token_ids=stop_token_ids,
    skip_special_tokens=False,
    temperature=0.0,
)

# For the exact raspberry sample in the paper see
prompts = [
    "How many r in raspberry",
]

for i, p in enumerate(prompts):
    prompt = "<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n<|im_start|>user\n" + p + "<|im_end|>\n<|im_start|>assistant\n"
    stop_token_ids = tok("<|im_start|><|im_end|>")["input_ids"]
    sampling_params = SamplingParams(
        max_tokens=MAX_TOKENS_THINKING,
        min_tokens=0,
        stop_token_ids=stop_token_ids,
        skip_special_tokens=False,
        temperature=0.0,
    )
    prompt += "<|im_start|>think"
    o = model.generate(
        prompt,
        sampling_params=sampling_params
    )
    
    print(o[0].outputs[0].text)
    
    ignore_str = "Wait"
    max_tokens_thinking_tmp = MAX_TOKENS_THINKING
    if max_tokens_thinking_tmp > 0:
        for i in range(NUM_IGNORE): # Num of times to skip stop token
            max_tokens_thinking_tmp -= len(o[0].outputs[0].token_ids)
            prompt += o[0].outputs[0].text + ignore_str
            sampling_params = SamplingParams(
                max_tokens=max_tokens_thinking_tmp,
                min_tokens=1,
                stop_token_ids=stop_token_ids,
                skip_special_tokens=False,
                temperature=0.0,
            )
            o = model.generate(
                prompt,
                sampling_params=sampling_params
            )
    ### Final answer ###
    prompt += o[0].outputs[0].text # You can also append "Final Answer:" here like we do for some evaluations to prevent the model from just continuing to reason in its answer when early exiting
    stop_token_ids = tok("<|im_end|>")["input_ids"]
    sampling_params = SamplingParams(
        max_tokens=32768,
        min_tokens=0,
        stop_token_ids=stop_token_ids,
        skip_special_tokens=False,
        temperature=0.0,
    )
    o = model.generate(
        prompt,
        sampling_params=sampling_params,
    )
    print("With budget forcing:") # You will see that after the "Wait" in the reasoning trace it fixes its answer
    print(prompt + o[0].outputs[0].text)

There is no problem with first output print, but there is error when budget forcing.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[1], line 55
     53         max_tokens_thinking_tmp -= len(o[0].outputs[0].token_ids)
     54         prompt += o[0].outputs[0].text + ignore_str
---> 55         sampling_params = SamplingParams(
     56             max_tokens=max_tokens_thinking_tmp,
     57             min_tokens=1,
     58             stop_token_ids=stop_token_ids,
     59             skip_special_tokens=False,
     60             temperature=0.0,
     61         )
     62         o = model.generate(
     63             prompt,
     64             sampling_params=sampling_params
     65         )
     66 ### Final answer ###

File ~/miniforge3/envs/s1/lib/python3.10/site-packages/vllm/sampling_params.py:337, in SamplingParams.__post_init__(self)
    334 if self.stop and not self.include_stop_str_in_output:
    335     self.output_text_buffer_length = max(len(s) for s in self.stop) - 1
--> 337 self._verify_args()
    339 if self.temperature < _SAMPLING_EPS:
    340     # Zero temperature means greedy sampling.
    341     self.top_p = 1.0

File ~/miniforge3/envs/s1/lib/python3.10/site-packages/vllm/sampling_params.py:378, in SamplingParams._verify_args(self)
    375     raise ValueError("min_p must be in [0, 1], got "
    376                      f"{self.min_p}.")
    377 if self.max_tokens is not None and self.max_tokens < 1:
--> 378     raise ValueError(
    379         f"max_tokens must be at least 1, got {self.max_tokens}.")
    380 if self.min_tokens < 0:
    381     raise ValueError(f"min_tokens must be greater than or equal to 0, "
    382                      f"got {self.min_tokens}.")

ValueError: max_tokens must be at least 1, got 0.

How can I solve it? Thank you in advance.

@Muennighoff
Copy link
Contributor

I think this issue may be helpful: #35

@lixin2002cn
Copy link

I met the same error. Did you deal with it?

@Muennighoff
Copy link
Contributor

I think you probably need to decrease max_gen_toks

@lixin2002cn
Copy link

I think you probably need to decrease max_gen_toks

I tried this, but it didn't work. Also, the error occurs earlier

@TikaToka
Copy link
Author

TikaToka commented Apr 6, 2025

I met the same error. Did you deal with it?

I think there seems to be a condition for this to happen. But I do not have a time to inspect for now.

Therefore, I gave up solving this problem in direct, but by re-training the model.

@RohollahHS
Copy link

same problem

@RohollahHS
Copy link

RohollahHS commented Apr 13, 2025

I think the error stems from here

If the first part of the condition is True, but the second part is not, then at the end of the for loop, requests_tmp will be assigned an empty list. So, the generation should be ended by a break or something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants