Error in Evaluating on Other Dataset (AssertionError: min_tokens_thinking only supports until_thinking tokens that are 1 token long) #102

jd730 · 2025-03-19T02:16:09Z

Hi, I am trying to evaluate S1 on MGSM but I am facing an error, AssertionError: min_tokens_thinking only supports until_thinking tokens that are 1 token long. Could you help me use S1 on other datasets?

I ran a script as follows:

model={trained_S1_model_path}
lm_eval --model vllm --model_args pretrained=$model,dtype=float32,tensor_parallel_size=4,gpu_memory_utilization=0.9 --tasks afrimgsm_en_cot --batch_size auto --output_path ckpts/results/wait1 --log_samples --apply_chat_template --gen_kwargs "max_gen_toks=32768,max_tokens_thinking=auto,thinking_n_ignore=1,thinking_n_ignore_str=Wait"

Here is the full error log:

Running generate_until requests:   0%|          | 0/5500 [00:00<?, ?it/s]Separating thinking and answering generation.
Thinking ignore string: Wait
Thinking start: <|im_start|>think, Thinking end: <|im_start|>answer, Stop: ['<|im_start|>', 'Swali:', '</s>', '<|im_end|>', '<|im_end|>']
Auto setting max_tokens_thinking to 19576
[rank0]: Traceback (most recent call last):
[rank0]:   File "/orcd/home/001/jdhwang/.conda/envs/llm/bin/lm_eval", line 8, in <module>
[rank0]:     sys.exit(cli_evaluate())
[rank0]:              ^^^^^^^^^^^^^^
[rank0]:   File "/orcd/home/001/jdhwang/s1/eval/lm-evaluation-harness/lm_eval/__main__.py", line 394, in cli_evaluate
[rank0]:     results = evaluator.simple_evaluate(
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/orcd/home/001/jdhwang/s1/eval/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
[rank0]:     return fn(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/orcd/home/001/jdhwang/s1/eval/lm-evaluation-harness/lm_eval/evaluator.py", line 301, in simple_evaluate
[rank0]:     results = evaluate(
[rank0]:               ^^^^^^^^^
[rank0]:   File "/orcd/home/001/jdhwang/s1/eval/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
[rank0]:     return fn(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/orcd/home/001/jdhwang/s1/eval/lm-evaluation-harness/lm_eval/evaluator.py", line 506, in evaluate
[rank0]:     resps = getattr(lm, reqtype)(cloned_reqs)
[rank0]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/orcd/home/001/jdhwang/s1/eval/lm-evaluation-harness/lm_eval/models/vllm_causallms.py", line 576, in generate_until
[rank0]:     cont = self._model_generate(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/orcd/home/001/jdhwang/s1/eval/lm-evaluation-harness/lm_eval/models/vllm_causallms.py", line 278, in _model_generate
[rank0]:     assert all([len(x) == 1 for x in until_thinking_tok]), "min_tokens_thinking only supports until_thinking tokens that are 1 token long"
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: AssertionError: min_tokens_thinking only supports until_thinking tokens that are 1 token long

The text was updated successfully, but these errors were encountered:

jd730 · 2025-03-24T14:25:56Z

@Muennighoff Can you guess any reasons for this?

Muennighoff · 2025-03-24T15:12:10Z

it is as the message says, min_tokens_thinking only supports until_thinking tokens that are 1 token long. I think until_thinking which is the stop token defaults to something like <|im_start|> which normally is 1 token but if it is not 1 token in your case then it doesn't work because vllm only supports 1 token long min token ignoring. You can pass the until_thinking kwarg to change it if you want to e.g. a different string that marks the end of thinking

jd730 · 2025-03-26T20:34:34Z

Hi @Muennighoff Thank you for your response. I will check and explore about until_thinking in various datasets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in Evaluating on Other Dataset (AssertionError: min_tokens_thinking only supports until_thinking tokens that are 1 token long) #102

Error in Evaluating on Other Dataset (AssertionError: min_tokens_thinking only supports until_thinking tokens that are 1 token long) #102

jd730 commented Mar 19, 2025

jd730 commented Mar 24, 2025

Muennighoff commented Mar 24, 2025

jd730 commented Mar 26, 2025

Error in Evaluating on Other Dataset (AssertionError: min_tokens_thinking only supports until_thinking tokens that are 1 token long) #102

Error in Evaluating on Other Dataset (AssertionError: min_tokens_thinking only supports until_thinking tokens that are 1 token long) #102

Comments

jd730 commented Mar 19, 2025

jd730 commented Mar 24, 2025

Muennighoff commented Mar 24, 2025

jd730 commented Mar 26, 2025