Skip to content

Commit aa5a114

Browse files
authored
Merge branch 'main' into feat/asian-lang-translate-metic
2 parents 1b1a3b4 + f6fee3a commit aa5a114

28 files changed

+919
-142
lines changed

.github/workflows/tests.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ jobs:
2525
cache: 'pip'
2626
- name: Install lighteval in editable mode
2727
run: |
28-
pip install -e .[dev,extended_tasks,multilingual]
28+
pip install -e .[dev,extended_tasks,multilingual,litellm]
2929
- name: Get cached files
3030
uses: actions/cache@v4
3131
id: get-cache

README.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -44,12 +44,12 @@ Hub, S3, or locally.
4444

4545
## 🔑 Key Features
4646

47-
- **Speed**: [Use vllm as backend for fast evals](https://github.com/huggingface/lighteval/wiki/Use-VLLM-as-backend).
48-
- **Completeness**: [Use the accelerate backend to launch any models hosted on Hugging Face](https://github.com/huggingface/lighteval/wiki/Quicktour#accelerate).
49-
- **Seamless Storage**: [Save results in S3 or Hugging Face Datasets](https://github.com/huggingface/lighteval/wiki/Saving-and-reading-results).
50-
- **Python API**: [Simple integration with the Python API](https://github.com/huggingface/lighteval/wiki/Using-the-Python-API).
51-
- **Custom Tasks**: [Easily add custom tasks](https://github.com/huggingface/lighteval/wiki/Adding-a-Custom-Task).
52-
- **Versatility**: Tons of [metrics](https://github.com/huggingface/lighteval/wiki/Metric-List) and [tasks](https://github.com/huggingface/lighteval/wiki/Available-Tasks) ready to go.
47+
- **Speed**: [Use vllm as backend for fast evals](https://huggingface.co/docs/lighteval/use-vllm-as-backend).
48+
- **Completeness**: [Use the accelerate backend to launch any models hosted on Hugging Face](https://huggingface.co/docs/lighteval/quicktour#accelerate).
49+
- **Seamless Storage**: [Save results in S3 or Hugging Face Datasets](https://huggingface.co/docs/lighteval/saving-and-reading-results).
50+
- **Python API**: [Simple integration with the Python API](https://huggingface.co/docs/lighteval/using-the-python-api).
51+
- **Custom Tasks**: [Easily add custom tasks](https://huggingface.co/docs/lighteval/adding-a-custom-task).
52+
- **Versatility**: Tons of [metrics](https://huggingface.co/docs/lighteval/metric-list) and [tasks](https://huggingface.co/docs/lighteval/available-tasks) ready to go.
5353

5454

5555
## ⚡️ Installation
@@ -58,7 +58,7 @@ Hub, S3, or locally.
5858
pip install lighteval
5959
```
6060

61-
Lighteval allows for many extras when installing, see [here](https://github.com/huggingface/lighteval/wiki/Installation) for a complete list.
61+
Lighteval allows for many extras when installing, see [here](https://huggingface.co/docs/lighteval/installation) for a complete list.
6262

6363
If you want to push results to the Hugging Face Hub, add your access token as
6464
an environment variable:
@@ -106,8 +106,8 @@ Harness and HELM teams for their pioneering work on LLM evaluations.
106106
## 🌟 Contributions Welcome 💙💚💛💜🧡
107107

108108
Got ideas? Found a bug? Want to add a
109-
[task](https://github.com/huggingface/lighteval/wiki/Adding-a-Custom-Task) or
110-
[metric](https://github.com/huggingface/lighteval/wiki/Adding-a-New-Metric)?
109+
[task](https://huggingface.co/docs/lighteval/adding-a-custom-task) or
110+
[metric](https://huggingface.co/docs/lighteval/adding-a-new-metric)?
111111
Contributions are warmly welcomed!
112112

113113
If you're adding a new feature, please open an issue first.

docs/source/package_reference/models.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@
66

77

88
## Accelerate and Transformers Models
9-
### BaseModel
10-
[[autodoc]] models.transformers.base_model.BaseModelConfig
11-
[[autodoc]] models.transformers.base_model.BaseModel
9+
### TransformersModel
10+
[[autodoc]] models.transformers.transformers_model.TransformersModelConfig
11+
[[autodoc]] models.transformers.transformers_model.TransformersModel
1212

1313
### AdapterModel
1414
[[autodoc]] models.transformers.adapter_model.AdapterModelConfig
Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
model:
22
base_params:
3-
model_args: "pretrained=HuggingFaceH4/zephyr-7b-beta,revision=main" # pretrained=model_name,trust_remote_code=boolean,revision=revision_to_use,model_parallel=True ...
3+
model_args: "pretrained=HuggingFaceTB/SmolLM-1.7B,revision=main" # pretrained=model_name,trust_remote_code=boolean,revision=revision_to_use,model_parallel=True ...
44
dtype: "bfloat16"
55
compile: true
66
merged_weights: # Ignore this section if you are not using PEFT models
@@ -9,3 +9,4 @@ model:
99
base_model: null # path to the base_model
1010
generation:
1111
multichoice_continuations_start_space: null # If true/false, will force multiple choice continuations to start/not start with a space. If none, will do nothing
12+
temperature: 0.5

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,7 @@ dependencies = [
8282
]
8383

8484
[project.optional-dependencies]
85+
litellm = ["litellm", "diskcache"]
8586
tgi = ["text-generation==0.6.0"]
8687
optimum = ["optimum==1.12.0"]
8788
quantization = ["bitsandbytes>=0.41.0", "auto-gptq>=0.4.2"]

src/lighteval/__main__.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
2121
# SOFTWARE.
2222
import logging
23-
from logging.config import dictConfig
23+
import logging.config
2424

2525
import colorlog
2626
import typer
@@ -57,7 +57,8 @@
5757
},
5858
)
5959

60-
dictConfig(logging_config)
60+
logging.config.dictConfig(logging_config)
61+
logging.captureWarnings(capture=True)
6162

6263
app.command(rich_help_panel="Evaluation Backends")(lighteval.main_accelerate.accelerate)
6364
app.command(rich_help_panel="Evaluation Utils")(lighteval.main_baseline.baseline)

src/lighteval/main_accelerate.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ def accelerate( # noqa C901
4444
model_args: Annotated[
4545
str,
4646
Argument(
47-
help="Model arguments in the form key1=value1,key2=value2,... or path to yaml config file (see examples/model_configs/base_model.yaml)"
47+
help="Model arguments in the form key1=value1,key2=value2,... or path to yaml config file (see examples/model_configs/transformers_model.yaml)"
4848
),
4949
],
5050
tasks: Annotated[str, Argument(help="Comma-separated list of tasks to evaluate on.")],
@@ -107,9 +107,10 @@ def accelerate( # noqa C901
107107
from accelerate import Accelerator, InitProcessGroupKwargs
108108

109109
from lighteval.logging.evaluation_tracker import EvaluationTracker
110+
from lighteval.models.model_input import GenerationParameters
110111
from lighteval.models.transformers.adapter_model import AdapterModelConfig
111-
from lighteval.models.transformers.base_model import BaseModelConfig, BitsAndBytesConfig
112112
from lighteval.models.transformers.delta_model import DeltaModelConfig
113+
from lighteval.models.transformers.transformers_model import BitsAndBytesConfig, TransformersModelConfig
113114
from lighteval.pipeline import EnvConfig, ParallelismManager, Pipeline, PipelineParameters
114115

115116
accelerator = Accelerator(kwargs_handlers=[InitProcessGroupKwargs(timeout=timedelta(seconds=3000))])
@@ -154,6 +155,8 @@ def accelerate( # noqa C901
154155
# We extract the model args
155156
args_dict = {k.split("=")[0]: k.split("=")[1] for k in config["base_params"]["model_args"].split(",")}
156157

158+
args_dict["generation_parameters"] = GenerationParameters.from_dict(config)
159+
157160
# We store the relevant other args
158161
args_dict["base_model"] = config["merged_weights"]["base_model"]
159162
args_dict["compile"] = bool(config["base_params"]["compile"])
@@ -180,13 +183,13 @@ def accelerate( # noqa C901
180183
elif config["merged_weights"]["base_model"] not in ["", None]:
181184
raise ValueError("You can't specify a base model if you are not using delta/adapter weights")
182185
else:
183-
model_config = BaseModelConfig(**args_dict)
186+
model_config = TransformersModelConfig(**args_dict)
184187
else:
185188
model_args_dict: dict = {k.split("=")[0]: k.split("=")[1] if "=" in k else True for k in model_args.split(",")}
186189
model_args_dict["accelerator"] = accelerator
187190
model_args_dict["use_chat_template"] = use_chat_template
188191
model_args_dict["compile"] = bool(model_args_dict["compile"]) if "compile" in model_args_dict else False
189-
model_config = BaseModelConfig(**model_args_dict)
192+
model_config = TransformersModelConfig(**model_args_dict)
190193

191194
pipeline = Pipeline(
192195
tasks=tasks,

src/lighteval/main_endpoint.py

Lines changed: 119 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -42,8 +42,11 @@
4242
@app.command(rich_help_panel="Evaluation Backends")
4343
def openai(
4444
# === general ===
45-
model_name: Annotated[
46-
str, Argument(help="The model name to evaluate (has to be available through the openai API.")
45+
model_args: Annotated[
46+
str,
47+
Argument(
48+
help="Model name as a string (has to be available through the openai API) or path to yaml config file (see examples/model_configs/transformers_model.yaml)"
49+
),
4750
],
4851
tasks: Annotated[str, Argument(help="Comma-separated list of tasks to evaluate on.")],
4952
# === Common parameters ===
@@ -96,6 +99,11 @@ def openai(
9699
from lighteval.models.endpoints.openai_model import OpenAIModelConfig
97100
from lighteval.pipeline import EnvConfig, ParallelismManager, Pipeline, PipelineParameters
98101

102+
if model_args.endswith(".yaml"):
103+
model_config = OpenAIModelConfig.from_path(model_args)
104+
else:
105+
model_config = OpenAIModelConfig(model=model_args)
106+
99107
env_config = EnvConfig(token=TOKEN, cache_dir=cache_dir)
100108
evaluation_tracker = EvaluationTracker(
101109
output_dir=output_dir,
@@ -107,7 +115,6 @@ def openai(
107115
)
108116

109117
parallelism_manager = ParallelismManager.OPENAI
110-
model_config = OpenAIModelConfig(model=model_name)
111118

112119
pipeline_params = PipelineParameters(
113120
launcher_type=parallelism_manager,
@@ -205,7 +212,6 @@ def inference_endpoint(
205212
"""
206213
Evaluate models using inference-endpoints as backend.
207214
"""
208-
209215
from lighteval.logging.evaluation_tracker import EvaluationTracker
210216
from lighteval.models.endpoints.endpoint_model import InferenceEndpointModelConfig, ServerlessEndpointModelConfig
211217
from lighteval.pipeline import EnvConfig, ParallelismManager, Pipeline, PipelineParameters
@@ -319,7 +325,6 @@ def tgi(
319325
"""
320326
Evaluate models using TGI as backend.
321327
"""
322-
323328
from lighteval.logging.evaluation_tracker import EvaluationTracker
324329
from lighteval.models.endpoints.tgi_model import TGIModelConfig
325330
from lighteval.pipeline import EnvConfig, ParallelismManager, Pipeline, PipelineParameters
@@ -367,3 +372,112 @@ def tgi(
367372
pipeline.save_and_push_results()
368373

369374
return results
375+
376+
377+
@app.command(rich_help_panel="Evaluation Backends")
378+
def litellm(
379+
# === general ===
380+
model_name: Annotated[
381+
str, Argument(help="The model name to evaluate (has to be available through the litellm API.")
382+
],
383+
tasks: Annotated[str, Argument(help="Comma-separated list of tasks to evaluate on.")],
384+
# === Common parameters ===
385+
use_chat_template: Annotated[
386+
bool, Option(help="Use chat template for evaluation.", rich_help_panel=HELP_PANEL_NAME_4)
387+
] = False,
388+
system_prompt: Annotated[
389+
Optional[str], Option(help="Use system prompt for evaluation.", rich_help_panel=HELP_PANEL_NAME_4)
390+
] = None,
391+
dataset_loading_processes: Annotated[
392+
int, Option(help="Number of processes to use for dataset loading.", rich_help_panel=HELP_PANEL_NAME_1)
393+
] = 1,
394+
custom_tasks: Annotated[
395+
Optional[str], Option(help="Path to custom tasks directory.", rich_help_panel=HELP_PANEL_NAME_1)
396+
] = None,
397+
cache_dir: Annotated[
398+
str, Option(help="Cache directory for datasets and models.", rich_help_panel=HELP_PANEL_NAME_1)
399+
] = CACHE_DIR,
400+
num_fewshot_seeds: Annotated[
401+
int, Option(help="Number of seeds to use for few-shot evaluation.", rich_help_panel=HELP_PANEL_NAME_1)
402+
] = 1,
403+
# === saving ===
404+
output_dir: Annotated[
405+
str, Option(help="Output directory for evaluation results.", rich_help_panel=HELP_PANEL_NAME_2)
406+
] = "results",
407+
push_to_hub: Annotated[
408+
bool, Option(help="Push results to the huggingface hub.", rich_help_panel=HELP_PANEL_NAME_2)
409+
] = False,
410+
push_to_tensorboard: Annotated[
411+
bool, Option(help="Push results to tensorboard.", rich_help_panel=HELP_PANEL_NAME_2)
412+
] = False,
413+
public_run: Annotated[
414+
bool, Option(help="Push results and details to a public repo.", rich_help_panel=HELP_PANEL_NAME_2)
415+
] = False,
416+
results_org: Annotated[
417+
Optional[str], Option(help="Organization to push results to.", rich_help_panel=HELP_PANEL_NAME_2)
418+
] = None,
419+
save_details: Annotated[
420+
bool, Option(help="Save detailed, sample per sample, results.", rich_help_panel=HELP_PANEL_NAME_2)
421+
] = False,
422+
# === debug ===
423+
max_samples: Annotated[
424+
Optional[int], Option(help="Maximum number of samples to evaluate on.", rich_help_panel=HELP_PANEL_NAME_3)
425+
] = None,
426+
override_batch_size: Annotated[
427+
int, Option(help="Override batch size for evaluation.", rich_help_panel=HELP_PANEL_NAME_3)
428+
] = -1,
429+
job_id: Annotated[
430+
int, Option(help="Optional job id for future refenrence.", rich_help_panel=HELP_PANEL_NAME_3)
431+
] = 0,
432+
):
433+
"""
434+
Evaluate models using LiteLLM as backend.
435+
"""
436+
437+
from lighteval.logging.evaluation_tracker import EvaluationTracker
438+
from lighteval.models.litellm_model import LiteLLMModelConfig
439+
from lighteval.pipeline import EnvConfig, ParallelismManager, Pipeline, PipelineParameters
440+
441+
env_config = EnvConfig(token=TOKEN, cache_dir=cache_dir)
442+
evaluation_tracker = EvaluationTracker(
443+
output_dir=output_dir,
444+
save_details=save_details,
445+
push_to_hub=push_to_hub,
446+
push_to_tensorboard=push_to_tensorboard,
447+
public=public_run,
448+
hub_results_org=results_org,
449+
)
450+
451+
# TODO (nathan): better handling of model_args
452+
parallelism_manager = ParallelismManager.NONE
453+
454+
model_config = LiteLLMModelConfig(model=model_name)
455+
456+
pipeline_params = PipelineParameters(
457+
launcher_type=parallelism_manager,
458+
env_config=env_config,
459+
job_id=job_id,
460+
dataset_loading_processes=dataset_loading_processes,
461+
custom_tasks_directory=custom_tasks,
462+
override_batch_size=override_batch_size,
463+
num_fewshot_seeds=num_fewshot_seeds,
464+
max_samples=max_samples,
465+
use_chat_template=use_chat_template,
466+
system_prompt=system_prompt,
467+
)
468+
pipeline = Pipeline(
469+
tasks=tasks,
470+
pipeline_parameters=pipeline_params,
471+
evaluation_tracker=evaluation_tracker,
472+
model_config=model_config,
473+
)
474+
475+
pipeline.evaluate()
476+
477+
pipeline.show_results()
478+
479+
results = pipeline.get_results()
480+
481+
pipeline.save_and_push_results()
482+
483+
return results

src/lighteval/main_vllm.py

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,12 @@
3737

3838
def vllm(
3939
# === general ===
40-
model_args: Annotated[str, Argument(help="Model arguments in the form key1=value1,key2=value2,...")],
40+
model_args: Annotated[
41+
str,
42+
Argument(
43+
help="Model arguments in the form key1=value1,key2=value2,... or path to yaml config file (see examples/model_configs/transformers_model.yaml)"
44+
),
45+
],
4146
tasks: Annotated[str, Argument(help="Comma-separated list of tasks to evaluate on.")],
4247
# === Common parameters ===
4348
use_chat_template: Annotated[
@@ -88,7 +93,10 @@ def vllm(
8893
"""
8994
Evaluate models using vllm as backend.
9095
"""
96+
import yaml
97+
9198
from lighteval.logging.evaluation_tracker import EvaluationTracker
99+
from lighteval.models.model_input import GenerationParameters
92100
from lighteval.models.vllm.vllm_model import VLLMModelConfig
93101
from lighteval.pipeline import EnvConfig, ParallelismManager, Pipeline, PipelineParameters
94102

@@ -118,8 +126,15 @@ def vllm(
118126
system_prompt=system_prompt,
119127
)
120128

121-
model_args_dict: dict = {k.split("=")[0]: k.split("=")[1] if "=" in k else True for k in model_args.split(",")}
122-
model_config = VLLMModelConfig(**model_args_dict)
129+
if model_args.endswith(".yaml"):
130+
with open(model_args, "r") as f:
131+
config = yaml.safe_load(f)["model"]
132+
generation_parameters = GenerationParameters.from_dict(config)
133+
model_config = VLLMModelConfig(config, generation_parameters=generation_parameters)
134+
135+
else:
136+
model_args_dict: dict = {k.split("=")[0]: k.split("=")[1] if "=" in k else True for k in model_args.split(",")}
137+
model_config = VLLMModelConfig(**model_args_dict)
123138

124139
pipeline = Pipeline(
125140
tasks=tasks,

0 commit comments

Comments
 (0)