Skip to content

Error in Evaluating on Other Dataset (AssertionError: min_tokens_thinking only supports until_thinking tokens that are 1 token long) #102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jd730 opened this issue Mar 19, 2025 · 3 comments

Comments

@jd730
Copy link

jd730 commented Mar 19, 2025

Hi, I am trying to evaluate S1 on MGSM but I am facing an error, AssertionError: min_tokens_thinking only supports until_thinking tokens that are 1 token long. Could you help me use S1 on other datasets?

I ran a script as follows:

model={trained_S1_model_path}
lm_eval --model vllm --model_args pretrained=$model,dtype=float32,tensor_parallel_size=4,gpu_memory_utilization=0.9 --tasks afrimgsm_en_cot --batch_size auto --output_path ckpts/results/wait1 --log_samples --apply_chat_template --gen_kwargs "max_gen_toks=32768,max_tokens_thinking=auto,thinking_n_ignore=1,thinking_n_ignore_str=Wait"

Here is the full error log:

Running generate_until requests:   0%|          | 0/5500 [00:00<?, ?it/s]Separating thinking and answering generation.
Thinking ignore string: Wait
Thinking start: <|im_start|>think, Thinking end: <|im_start|>answer, Stop: ['<|im_start|>', 'Swali:', '</s>', '<|im_end|>', '<|im_end|>']
Auto setting max_tokens_thinking to 19576
[rank0]: Traceback (most recent call last):
[rank0]:   File "/orcd/home/001/jdhwang/.conda/envs/llm/bin/lm_eval", line 8, in <module>
[rank0]:     sys.exit(cli_evaluate())
[rank0]:              ^^^^^^^^^^^^^^
[rank0]:   File "/orcd/home/001/jdhwang/s1/eval/lm-evaluation-harness/lm_eval/__main__.py", line 394, in cli_evaluate
[rank0]:     results = evaluator.simple_evaluate(
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/orcd/home/001/jdhwang/s1/eval/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
[rank0]:     return fn(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/orcd/home/001/jdhwang/s1/eval/lm-evaluation-harness/lm_eval/evaluator.py", line 301, in simple_evaluate
[rank0]:     results = evaluate(
[rank0]:               ^^^^^^^^^
[rank0]:   File "/orcd/home/001/jdhwang/s1/eval/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
[rank0]:     return fn(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/orcd/home/001/jdhwang/s1/eval/lm-evaluation-harness/lm_eval/evaluator.py", line 506, in evaluate
[rank0]:     resps = getattr(lm, reqtype)(cloned_reqs)
[rank0]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/orcd/home/001/jdhwang/s1/eval/lm-evaluation-harness/lm_eval/models/vllm_causallms.py", line 576, in generate_until
[rank0]:     cont = self._model_generate(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/orcd/home/001/jdhwang/s1/eval/lm-evaluation-harness/lm_eval/models/vllm_causallms.py", line 278, in _model_generate
[rank0]:     assert all([len(x) == 1 for x in until_thinking_tok]), "min_tokens_thinking only supports until_thinking tokens that are 1 token long"
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: AssertionError: min_tokens_thinking only supports until_thinking tokens that are 1 token long
@jd730
Copy link
Author

jd730 commented Mar 24, 2025

@Muennighoff Can you guess any reasons for this?

@Muennighoff
Copy link
Contributor

it is as the message says, min_tokens_thinking only supports until_thinking tokens that are 1 token long. I think until_thinking which is the stop token defaults to something like <|im_start|> which normally is 1 token but if it is not 1 token in your case then it doesn't work because vllm only supports 1 token long min token ignoring. You can pass the until_thinking kwarg to change it if you want to e.g. a different string that marks the end of thinking

@jd730
Copy link
Author

jd730 commented Mar 26, 2025

Hi @Muennighoff Thank you for your response. I will check and explore about until_thinking in various datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants