Skip to content

Commit 6e2754e

Browse files
authored
Nathan refacto cli (#407)
Complete revamp of the cli. See documentation for more details. `lighteval --help`
1 parent 8e977cb commit 6e2754e

28 files changed

+981
-587
lines changed

.github/workflows/tests.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,8 @@ jobs:
3636
- name: Test
3737
env:
3838
HF_TEST_TOKEN: ${{ secrets.HF_TEST_TOKEN }}
39+
HF_HOME: "cache/models"
40+
HF_DATASETS_CACHE: "cache/datasets"
3941
run: | # PYTHONPATH="${PYTHONPATH}:src" HF_DATASETS_CACHE="cache/datasets" HF_HOME="cache/models"
4042
python -m pytest --disable-pytest-warnings
4143
- name: Write cache

docs/source/adding-a-custom-task.mdx

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -191,8 +191,7 @@ Once your file is created you can then run the evaluation with the following com
191191

192192
```bash
193193
lighteval accelerate \
194-
--model_args "pretrained=HuggingFaceH4/zephyr-7b-beta" \
195-
--tasks "community|{custom_task}|{fewshots}|{truncate_few_shot}" \
196-
--custom_tasks {path_to_your_custom_task_file} \
197-
--output_dir "./evals"
194+
"pretrained=HuggingFaceH4/zephyr-7b-beta" \
195+
"community|{custom_task}|{fewshots}|{truncate_few_shot}" \
196+
--custom-tasks {path_to_your_custom_task_file}
198197
```

docs/source/available-tasks.mdx

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,13 @@
33
You can get a list of all the available tasks by running:
44

55
```bash
6-
lighteval tasks --list
6+
lighteval tasks list
7+
```
8+
9+
You can also inspect a specific task by running:
10+
11+
```bash
12+
lighteval tasks inspect <task_name>
713
```
814

915
## List of tasks

docs/source/evaluate-the-model-on-a-server-or-container.mdx

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,9 @@ to the server. The command is the same as before, except you specify a path to
66
a yaml config file (detailed below):
77

88
```bash
9-
lighteval accelerate \
10-
--model_config_path="/path/to/config/file"\
11-
--tasks <task parameters> \
12-
--output_dir output_dir
9+
lighteval endpoint {tgi,inference-endpoint} \
10+
"/path/to/config/file"\
11+
<task parameters>
1312
```
1413

1514
There are two types of configuration files that can be provided for running on
@@ -65,3 +64,19 @@ model:
6564
inference_server_auth: null
6665
model_id: null # Optional, only required if the TGI container was launched with model_id pointing to a local directory
6766
```
67+
68+
### OpenAI API
69+
70+
Lighteval also supports evaluating models on the OpenAI API. To do so you need to set your OpenAI API key in the environment variable.
71+
72+
```bash
73+
export OPENAI_API_KEY={your_key}
74+
```
75+
76+
And then run the following command:
77+
78+
```bash
79+
lighteval endpoint openai \
80+
{model-name} \
81+
<task parameters>
82+
```

docs/source/index.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ backends—whether it's
55
[transformers](https://github.com/huggingface/transformers),
66
[tgi](https://github.com/huggingface/text-generation-inference),
77
[vllm](https://github.com/vllm-project/vllm), or
8-
[nanotron](https://github.com/huggingface/nanotron)with
8+
[nanotron](https://github.com/huggingface/nanotron)-with
99
ease. Dive deep into your model’s performance by saving and exploring detailed,
1010
sample-by-sample results to debug and see how your models stack-up.
1111

docs/source/package_reference/model_config.mdx

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,5 +8,3 @@
88
[[autodoc]] models.model_config.InferenceModelConfig
99
[[autodoc]] models.model_config.TGIModelConfig
1010
[[autodoc]] models.model_config.VLLMModelConfig
11-
12-
[[autodoc]] models.model_config.create_model_config

docs/source/quicktour.mdx

Lines changed: 23 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,24 @@
11
# Quicktour
22

3-
We provide two main entry points to evaluate models:
3+
4+
> [!TIP]
5+
> We recommend using the `--help` flag to get more information about the
6+
> available options for each command.
7+
> `lighteval --help`
8+
9+
Lighteval can be used with a few different commands.
410

511
- `lighteval accelerate` : evaluate models on CPU or one or more GPUs using [🤗
612
Accelerate](https://github.com/huggingface/accelerate)
713
- `lighteval nanotron`: evaluate models in distributed settings using [⚡️
814
Nanotron](https://github.com/huggingface/nanotron)
15+
- `lighteval vllm`: evaluate models on one or more GPUs using [🚀
16+
VLLM](https://github.com/vllm-project/vllm)
17+
- `lighteval endpoint`
18+
- `inference-endpoint`: evaluate models on one or more GPUs using [🔗
19+
Inference Endpoint](https://huggingface.co/inference-endpoints/dedicated)
20+
- `tgi`: evaluate models on one or more GPUs using [🔗 Text Generation Inference](https://huggingface.co/docs/text-generation-inference/en/index)
21+
- `openai`: evaluate models on one or more GPUs using [🔗 OpenAI API](https://platform.openai.com/)
922

1023
## Accelerate
1124

@@ -15,10 +28,8 @@ To evaluate `GPT-2` on the Truthful QA benchmark, run:
1528

1629
```bash
1730
lighteval accelerate \
18-
--model_args "pretrained=gpt2" \
19-
--tasks "leaderboard|truthfulqa:mc|0|0" \
20-
--override_batch_size 1 \
21-
--output_dir="./evals/"
31+
"pretrained=gpt2" \
32+
"leaderboard|truthfulqa:mc|0|0"
2233
```
2334

2435
Here, `--tasks` refers to either a comma-separated list of supported tasks from
@@ -51,10 +62,8 @@ You can then evaluate a model using data parallelism on 8 GPUs like follows:
5162
```bash
5263
accelerate launch --multi_gpu --num_processes=8 -m \
5364
lighteval accelerate \
54-
--model_args "pretrained=gpt2" \
55-
--tasks "leaderboard|truthfulqa:mc|0|0" \
56-
--override_batch_size 1 \
57-
--output_dir="./evals/"
65+
"pretrained=gpt2" \
66+
"leaderboard|truthfulqa:mc|0|0"
5867
```
5968
6069
Here, `--override_batch_size` defines the batch size per device, so the effective
@@ -66,10 +75,8 @@ To evaluate a model using pipeline parallelism on 2 or more GPUs, run:
6675
6776
```bash
6877
lighteval accelerate \
69-
--model_args "pretrained=gpt2,model_parallel=True" \
70-
--tasks "leaderboard|truthfulqa:mc|0|0" \
71-
--override_batch_size 1 \
72-
--output_dir="./evals/"
78+
"pretrained=gpt2,model_parallel=True" \
79+
"leaderboard|truthfulqa:mc|0|0"
7380
```
7481
7582
This will automatically use accelerate to distribute the model across the GPUs.
@@ -81,7 +88,7 @@ GPUs.
8188
8289
### Model Arguments
8390
84-
The `--model_args` argument takes a string representing a list of model
91+
The `model-args` argument takes a string representing a list of model
8592
argument. The arguments allowed vary depending on the backend you use (vllm or
8693
accelerate).
8794
@@ -150,8 +157,8 @@ To evaluate a model trained with nanotron on a single gpu.
150157
```bash
151158
torchrun --standalone --nnodes=1 --nproc-per-node=1 \
152159
src/lighteval/__main__.py nanotron \
153-
--checkpoint_config_path ../nanotron/checkpoints/10/config.yaml \
154-
--lighteval_config_path examples/nanotron/lighteval_config_override_template.yaml
160+
--checkpoint-config-path ../nanotron/checkpoints/10/config.yaml \
161+
--lighteval-config-path examples/nanotron/lighteval_config_override_template.yaml
155162
```
156163
157164
The `nproc-per-node` argument should match the data, tensor and pipeline

docs/source/saving-and-reading-results.mdx

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,30 +3,32 @@
33
## Saving results locally
44

55
Lighteval will automatically save results and evaluation details in the
6-
directory set with the `--output_dir` argument. The results will be saved in
6+
directory set with the `--output-dir` option. The results will be saved in
77
`{output_dir}/results/{model_name}/results_{timestamp}.json`. [Here is an
88
example of a result file](#example-of-a-result-file). The output path can be
99
any [fsspec](https://filesystem-spec.readthedocs.io/en/latest/index.html)
1010
compliant path (local, s3, hf hub, gdrive, ftp, etc).
1111

12-
To save the details of the evaluation, you can use the `--save_details`
13-
argument. The details will be saved in a parquet file
12+
To save the details of the evaluation, you can use the `--save-details`
13+
option. The details will be saved in a parquet file
1414
`{output_dir}/details/{model_name}/{timestamp}/details_{task}_{timestamp}.parquet`.
1515

1616
## Pushing results to the HuggingFace hub
1717

1818
You can push the results and evaluation details to the HuggingFace hub. To do
19-
so, you need to set the `--push_to_hub` as well as the `--results_org`
20-
argument. The results will be saved in a dataset with the name at
19+
so, you need to set the `--push-to-hub` as well as the `--results-org`
20+
option. The results will be saved in a dataset with the name at
2121
`{results_org}/{model_org}/{model_name}`. To push the details, you need to set
22-
the `--save_details` argument.
22+
the `--save-details` option.
2323
The dataset created will be private by default, you can make it public by
24-
setting the `--public_run` argument.
24+
setting the `--public-run` option.
2525

2626

2727
## Pushing results to Tensorboard
2828

29-
You can push the results to Tensorboard by setting `--push_to_tensorboard`.
29+
You can push the results to Tensorboard by setting `--push-to-tensorboard`.
30+
This will create a Tensorboard dashboard in a HF org set with the `--results-org`
31+
option.
3032

3133

3234
## How to load and investigate details

docs/source/use-vllm-as-backend.mdx

Lines changed: 9 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,9 @@ Lighteval allows you to use `vllm` as backend allowing great speedups.
44
To use, simply change the `model_args` to reflect the arguments you want to pass to vllm.
55

66
```bash
7-
lighteval accelerate \
8-
--model_args="vllm,pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16" \
9-
--tasks "leaderboard|truthfulqa:mc|0|0" \
10-
--output_dir="./evals/"
7+
lighteval vllm \
8+
"pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16" \
9+
"leaderboard|truthfulqa:mc|0|0"
1110
```
1211

1312
`vllm` is able to distribute the model across multiple GPUs using data
@@ -17,19 +16,17 @@ You can choose the parallelism method by setting in the the `model_args`.
1716
For example if you have 4 GPUs you can split it across using `tensor_parallelism`:
1817

1918
```bash
20-
export VLLM_WORKER_MULTIPROC_METHOD=spawn && lighteval accelerate \
21-
--model_args="vllm,pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16,tensor_parallel_size=4" \
22-
--tasks "leaderboard|truthfulqa:mc|0|0" \
23-
--output_dir="./evals/"
19+
export VLLM_WORKER_MULTIPROC_METHOD=spawn && lighteval vllm \
20+
"pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16,tensor_parallel_size=4" \
21+
"leaderboard|truthfulqa:mc|0|0"
2422
```
2523

2624
Or, if your model fits on a single GPU, you can use `data_parallelism` to speed up the evaluation:
2725

2826
```bash
29-
lighteval accelerate \
30-
--model_args="vllm,pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16,data_parallel_size=4" \
31-
--tasks "leaderboard|truthfulqa:mc|0|0" \
32-
--output_dir="./evals/"
27+
lighteval vllm \
28+
"pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16,data_parallel_size=4" \
29+
"leaderboard|truthfulqa:mc|0|0"
3330
```
3431

3532
Available arguments for `vllm` can be found in the `VLLMModelConfig`:
@@ -50,4 +47,3 @@ Available arguments for `vllm` can be found in the `VLLMModelConfig`:
5047
> [!WARNING]
5148
> In the case of OOM issues, you might need to reduce the context size of the
5249
> model as well as reduce the `gpu_memory_utilisation` parameter.
53-

examples/model_configs/base_model.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
model:
2-
type: "base" # can be base, tgi, or endpoint
32
base_params:
43
model_args: "pretrained=HuggingFaceH4/zephyr-7b-beta,revision=main" # pretrained=model_name,trust_remote_code=boolean,revision=revision_to_use,model_parallel=True ...
54
dtype: "bfloat16"

0 commit comments

Comments
 (0)