Eval failed for SGLang deployed Qwen3-VL-Thinking, separate reasoning enabled

NOTE: Same eval commands working for Instruct model.
Eval cmd:
```
python3 -m lmms_eval --model openai_compatible --model_args "model_version=Qwen/Qwen3-VL-30B-A3B-Thinking" --tasks mmmu_val --batch_size 128 --log_samples --log_samples_suffix "openai_compatible" --output_path ./logs --gen_kwargs "max_new_tokens=4096"
```
Error msg as:
```
█████████▊| 897/900 [1:Model Responding: 100%|█████████████████████████████████████████████████▉| 898/900 [1:Model Responding: 100%|█████████████████████████████████████████████████▉| 899/900 [1:Model Responding: 100%|██████████████████████████████████████████████████| 900/900 [1:46:50<00:00,  3.45s/it]2025-10-04 09:38:49 | INFO     | lmms_eval.models.model_utils.gen_metrics:log_metrics:48 - Metric summary - Total time: 6321.931s, Total tokens: 1400994, Avg speed: 221.6 tokens/s
Model Responding: 100%|██████████████████████████████████████████████████| 900/900 [1:46:50<00:00,  7.12s/it]
Postprocessing:   0%|                                         | 0/900 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/root/.python/sglang/lib/python3.10/site-packages/tenacity/__init__.py", line 470, in __call__
    result = fn(*args, **kwargs)
  File "/root/.python/sglang/lib/python3.10/site-packages/lmms_eval/api/task.py", line 1470, in process_results
    results = [res.strip() for res in results]
  File "/root/.python/sglang/lib/python3.10/site-packages/lmms_eval/api/task.py", line 1470, in <listcomp>
    results = [res.strip() for res in results]
AttributeError: 'NoneType' object has no attribute 'strip'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/root/.python/sglang/lib/python3.10/site-packages/lmms_eval/__main__.py", line 347, in cli_evaluate
    results, samples = cli_evaluate_single(args)
  File "/root/.python/sglang/lib/python3.10/site-packages/lmms_eval/__main__.py", line 474, in cli_evaluate_single
    results = evaluator.simple_evaluate(
  File "/root/.python/sglang/lib/python3.10/site-packages/lmms_eval/utils.py", line 533, in _wrapper
    return fn(*args, **kwargs)
  File "/root/.python/sglang/lib/python3.10/site-packages/lmms_eval/evaluator.py", line 268, in simple_evaluate
    results = evaluate(
  File "/root/.python/sglang/lib/python3.10/site-packages/lmms_eval/utils.py", line 533, in _wrapper
    return fn(*args, **kwargs)
  File "/root/.python/sglang/lib/python3.10/site-packages/lmms_eval/evaluator.py", line 555, in evaluate
    metrics = task.process_results(doc, [req.filtered_resps[filter_key] for req in requests])
  File "/root/.python/sglang/lib/python3.10/site-packages/tenacity/__init__.py", line 330, in wrapped_f
    return self(f, *args, **kw)
  File "/root/.python/sglang/lib/python3.10/site-packages/tenacity/__init__.py", line 467, in __call__
    do = self.iter(retry_state=retry_state)
  File "/root/.python/sglang/lib/python3.10/site-packages/tenacity/__init__.py", line 368, in iter
    result = action(retry_state)
  File "/root/.python/sglang/lib/python3.10/site-packages/tenacity/__init__.py", line 411, in exc_check
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7f19595cbc40 state=finished raised AttributeError>]
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval failed for SGLang deployed Qwen3-VL-Thinking, separate reasoning enabled #847

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval failed for SGLang deployed Qwen3-VL-Thinking, separate reasoning enabled #847

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions