Skip to content

[BUG] IndexError: list index out of rang #1162

@2niuhe

Description

@2niuhe

Describe the bug

IndexError: list index out of rang

To Reproduce

(eval_venv) root@b2c98f779d6b:~/workshop# cat qwen3_nothink.yaml
model_parameters:
provider: "openai"
model_name: "openai/qwen3-1.7b"
base_url: "http://192.168.5.39:9001/v1"
api_key: "EMPTY"
generation_parameters:
temperature: 0.7
top_k: 20
top_p: 0.8
min_p: 0

(eval_venv) root@b2c98f779d6b:~/workshop# lighteval endpoint litellm ./qwen3_nothink.yaml 'ifbench_multiturn,lcb:codegeneration_release_latest,narrativeqa' --max

Generating train split: 10 examples [00:00, 3347.68 examples/s]
Generating train split: 10 examples [00:00, 3121.46 examples/s]
[2026-01-31 17:30:06,855] [    INFO]: --- POST-PROCESSING MODEL RESPONSES --- (pipeline.py:349)
[2026-01-31 17:30:06,855] [    INFO]: --- COMPUTING METRICS --- (pipeline.py:376)
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.80s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.76s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.79s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.79s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.89s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.86s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.79s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.81s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.79s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.80s/it]
╭──────────────────────────────── Traceback (most recent call last) ─────────────────────────────────╮
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/main_endpoint.py:312 in litellm    │
│                                                                                                    │
│   309 │   │   metric_options=metric_options,                                                       │
│   310 │   )                                                                                        │
│   311 │                                                                                            │
│ ❱ 312 │   pipeline.evaluate()                                                                      │
│   313 │                                                                                            │
│   314 │   pipeline.show_results()                                                                  │
│   315                                                                                              │
│                                                                                                    │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/pipeline.py:291 in evaluate        │
│                                                                                                    │
│   288 │   │                                                                                        │
│   289 │   │   if self.is_main_process():                                                           │
│   290 │   │   │   self._post_process_outputs(outputs)                                              │
│ ❱ 291 │   │   │   self._compute_metrics(outputs)                                                   │
│   292 │   │   │                                                                                    │
│   293 │   │   │   self.evaluation_tracker.general_config_logger.log_end_time()                     │
│   294 │   │   │   self.evaluation_tracker.metrics_logger.aggregate(                                │
│                                                                                                    │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/pipeline.py:391 in                 │
│ _compute_metrics                                                                                   │
│                                                                                                    │
│   388 │   │   │   │   docs = [doc for doc, _ in samples]                                           │
│   389 │   │   │   │   responses = [response for _, response in samples]                            │
│   390 │   │   │   │                                                                                │
│ ❱ 391 │   │   │   │   outputs = apply_metric(                                                      │
│   392 │   │   │   │   │   docs=docs,                                                               │
│   393 │   │   │   │   │   responses=responses,                                                     │
│   394 │   │   │   │   │   metrics=metric_category_metrics,                                         │
│                                                                                                    │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/metrics/__init__.py:54 in          │
│ apply_metric                                                                                       │
│                                                                                                    │
│   51 │   │   # Add non-batched metric results for this sample                                      │
│   52 │   │   for metric in non_batched_metrics:                                                    │
│   53 │   │   │   output.update(                                                                    │
│ ❱ 54 │   │   │   │   metric.compute_sample(                                                        │
│   55 │   │   │   │   │   model_response=responses[i],                                              │
│   56 │   │   │   │   │   doc=docs[i],                                                              │
│   57 │   │   │   │   )                                                                             │
│                                                                                                    │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/metrics/utils/metric_utils.py:59   │
│ in compute_sample                                                                                  │
│                                                                                                    │
│    56 │   │                                                                                        │
│    57 │   │   if isinstance(self, MetricGrouping):                                                 │
│    58 │   │   │   return sample_level_fn(**kwargs)                                                 │
│ ❱  59 │   │   return {self.metric_name: sample_level_fn(**kwargs)}                                 │
│    60 │                                                                                            │
│    61 │   def get_corpus_aggregations(self) -> dict:                                               │
│    62 │   │   if isinstance(self, MetricGrouping):                                                 │
│                                                                                                    │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/metrics/metrics_sample.py:131 in   │
│ compute                                                                                            │
│                                                                                                    │
│    128 │   │   """
│    129 │   │   results = []                                                                        │
│    130 │   │   # We might need to flatten golds if they are a list of lists                        │
│ ❱  131 │   │   golds = doc.get_golds()                                                             │
│    132 │   │   for gold in golds:                                                                  │
│    133 │   │   │   for pred in model_response.final_text:                                          │
│    134 │   │   │   │   results.append(self.compute_one_item(gold=gold, pred=pred))                 │
│                                                                                                    │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/tasks/requests.py:222 in get_golds │
│                                                                                                    │
│   219 │   │   gold_indices = as_list(self.gold_index)                                              │
│   220 │   │   golds = []                                                                           │
│   221 │   │   for gold_ix in gold_indices:                                                         │
│ ❱ 222 │   │   │   golds.extend(as_list(self.choices[gold_ix]))                                     │
│   223 │   │   return golds                                                                         │
│   224 │                                                                                            │
│   225 │   def __repr__(self):                                                                      │
╰────────────────────────────────────────────────────────────────────────────────────────────────────╯
IndexError: list index out of range

Expected behavior

A clear and concise description of what you expected to happen.

Version info

lighteval 0.13.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions