Skip to content

Commit 9bfa1ea

Browse files
Add docstring docs (#413)
* Add Reference docs with Pipeline docs * Pin numpy<2 * Add Tasks docs * Add more Tasks docs * Add Models docs * Fix Models docs * Remove AdapterModel that requires peft * Remove NanotronLightevalModel and VLLMModel that require nanotron and vllm * Fix markdown comment syntax * Add Metrics docs * Fix typo * Remove Main classes section * Add Datasets docs * Create Main classes section with Pipeline * Add EvaluationTracker docs * Add ModelConfig docs * Add ParallelismManager to Pipeline docs * Add inter-links from using-the-python-api * Fix inter-links * Add more Metrics docs * Comment Metrics enum * Fix typo * Add explanation and GH issue to comment in Metrics enum * Add inter-link to Metrics * Add subsection titles to LightevalTask * Add inter-link to LightevalTaskConfig * Add inter-link to section heading anchor * Add more Metrics docs * Add inter-link to SampleLevelMetric and Grouping * Add inter-link to LightevalTaskConfig * Fix section title with trailing colon * Add sections to Models docs * Move Models docs to Main classes section * Document you can pass either model or model config to Pipeline * Move Datasets docs to Tasks docs * Add logging docs
1 parent 0c80801 commit 9bfa1ea

14 files changed

+216
-14
lines changed

docs/source/_toctree.yml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,3 +28,21 @@
2828
- local: available-tasks
2929
title: Available Tasks
3030
title: API
31+
- sections:
32+
- sections:
33+
- local: package_reference/evaluation_tracker
34+
title: EvaluationTracker
35+
- local: package_reference/models
36+
title: Models
37+
- local: package_reference/model_config
38+
title: ModelConfig
39+
- local: package_reference/pipeline
40+
title: Pipeline
41+
title: Main classes
42+
- local: package_reference/metrics
43+
title: Metrics
44+
- local: package_reference/tasks
45+
title: Tasks
46+
- local: package_reference/logging
47+
title: Logging
48+
title: Reference

docs/source/adding-a-custom-task.mdx

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -45,8 +45,9 @@ def prompt_fn(line, task_name: str = None):
4545
)
4646
```
4747

48-
Then, you need to choose a metric, you can either use an existing one (defined
49-
in `lighteval/metrics/metrics.py`) or [create a custom one](adding-a-new-metric)).
48+
Then, you need to choose a metric: you can either use an existing one (defined
49+
in [`lighteval.metrics.metrics.Metrics`]) or [create a custom one](adding-a-new-metric)).
50+
[//]: # (TODO: Replace lighteval.metrics.metrics.Metrics with ~metrics.metrics.Metrics once its autodoc is added)
5051

5152
```python
5253
custom_metric = SampleLevelMetric(
@@ -59,7 +60,8 @@ custom_metric = SampleLevelMetric(
5960
)
6061
```
6162

62-
Then, you need to define your task. You can define a task with or without subsets.
63+
Then, you need to define your task using [`~tasks.lighteval_task.LightevalTaskConfig`].
64+
You can define a task with or without subsets.
6365
To define a task with no subsets:
6466

6567
```python

docs/source/adding-a-new-metric.mdx

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# Adding a New Metric
22

33
First, check if you can use one of the parametrized functions in
4-
[src.lighteval.metrics.metrics_corpus]() or
5-
[src.lighteval.metrics.metrics_sample]().
4+
[Corpus Metrics](package_reference/metrics#corpus-metrics) or
5+
[Sample Metrics](package_reference/metrics#sample-metrics).
66

77
If not, you can use the `custom_task` system to register your new metric:
88

@@ -49,7 +49,8 @@ def agg_function(items):
4949
return score
5050
```
5151

52-
Finally, you can define your metric. If it's a sample level metric, you can use the following code:
52+
Finally, you can define your metric. If it's a sample level metric, you can use the following code
53+
with [`~metrics.utils.metric_utils.SampleLevelMetric`]:
5354

5455
```python
5556
my_custom_metric = SampleLevelMetric(
@@ -62,7 +63,8 @@ my_custom_metric = SampleLevelMetric(
6263
)
6364
```
6465

65-
If your metric defines multiple metrics per sample, you can use the following code:
66+
If your metric defines multiple metrics per sample, you can use the following code
67+
with [`~metrics.utils.metric_utils.SampleLevelMetricGrouping`]:
6668

6769
```python
6870
custom_metric = SampleLevelMetricGrouping(

docs/source/contributing-to-multilingual-evaluations.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ Browse the list of all templates [here](https://github.com/huggingface/lighteval
5151
Then, when ready, to define your own task, you should:
5252
1. create a Python file as indicated in the above guide
5353
2. import the relevant templates for your task type (XNLI, Copa, Multiple choice, Question Answering, etc)
54-
3. define one or a list of tasks for each relevant language and evaluation formulation (for multichoice) using our parametrizable `LightevalTaskConfig` class
54+
3. define one or a list of tasks for each relevant language and evaluation formulation (for multichoice) using our parametrizable [`~tasks.lighteval_task.LightevalTaskConfig`] class
5555

5656
```python
5757
your_tasks = [
@@ -101,7 +101,7 @@ your_tasks = [
101101
4. then, you can go back to the guide to test if your task is correctly implemented!
102102

103103
> [!TIP]
104-
> All `LightevalTaskConfig` parameters are strongly typed, including the inputs to the template function. Make sure to take advantage of your IDE's functionality to make it easier to correctly fill these parameters.
104+
> All [`~tasks.lighteval_task.LightevalTaskConfig`] parameters are strongly typed, including the inputs to the template function. Make sure to take advantage of your IDE's functionality to make it easier to correctly fill these parameters.
105105

106106

107-
Once everything is good, open a PR, and we'll be happy to review it!
107+
Once everything is good, open a PR, and we'll be happy to review it!

docs/source/metric-list.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ These metrics need the model to generate an output. They are therefore slower.
6969
- `quasi_exact_match_gsm8k`: Fraction of instances where the normalized prediction matches the normalized gold (normalization done for gsm8k, where latex symbols, units, etc are removed)
7070
- `maj_at_8_gsm8k`: Majority choice evaluation, using the gsm8k normalisation for the predictions and gold
7171

72-
## LLM-as-Judge:
72+
## LLM-as-Judge
7373
- `llm_judge_gpt3p5`: Can be used for any generative task, the model will be scored by a GPT3.5 model using the OpenAI API
7474
- `llm_judge_llama_3_405b`: Can be used for any generative task, the model will be scored by a Llama 3.405B model using the HuggingFace API
7575
- `llm_judge_multi_turn_gpt3p5`: Can be used for any generative task, the model will be scored by a GPT3.5 model using the OpenAI API. It is used for multiturn tasks like mt-bench.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# EvaluationTracker
2+
3+
[[autodoc]] logging.evaluation_tracker.EvaluationTracker
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# Loggers
2+
3+
## GeneralConfigLogger
4+
[[autodoc]] logging.info_loggers.GeneralConfigLogger
5+
## DetailsLogger
6+
[[autodoc]] logging.info_loggers.DetailsLogger
7+
## MetricsLogger
8+
[[autodoc]] logging.info_loggers.MetricsLogger
9+
## VersionsLogger
10+
[[autodoc]] logging.info_loggers.VersionsLogger
11+
## TaskConfigLogger
12+
[[autodoc]] logging.info_loggers.TaskConfigLogger
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# Metrics
2+
3+
## Metrics
4+
[//]: # (TODO: aenum.Enum raises error when generating docs: not supported by inspect.signature. See: https://github.com/ethanfurman/aenum/issues/44)
5+
[//]: # (### Metrics)
6+
[//]: # ([[autodoc]] metrics.metrics.Metrics)
7+
### Metric
8+
[[autodoc]] metrics.utils.metric_utils.Metric
9+
### CorpusLevelMetric
10+
[[autodoc]] metrics.utils.metric_utils.CorpusLevelMetric
11+
### SampleLevelMetric
12+
[[autodoc]] metrics.utils.metric_utils.SampleLevelMetric
13+
### MetricGrouping
14+
[[autodoc]] metrics.utils.metric_utils.MetricGrouping
15+
### CorpusLevelMetricGrouping
16+
[[autodoc]] metrics.utils.metric_utils.CorpusLevelMetricGrouping
17+
### SampleLevelMetricGrouping
18+
[[autodoc]] metrics.utils.metric_utils.SampleLevelMetricGrouping
19+
20+
## Corpus Metrics
21+
### CorpusLevelF1Score
22+
[[autodoc]] metrics.metrics_corpus.CorpusLevelF1Score
23+
### CorpusLevelPerplexityMetric
24+
[[autodoc]] metrics.metrics_corpus.CorpusLevelPerplexityMetric
25+
### CorpusLevelTranslationMetric
26+
[[autodoc]] metrics.metrics_corpus.CorpusLevelTranslationMetric
27+
### matthews_corrcoef
28+
[[autodoc]] metrics.metrics_corpus.matthews_corrcoef
29+
30+
## Sample Metrics
31+
### ExactMatches
32+
[[autodoc]] metrics.metrics_sample.ExactMatches
33+
### F1_score
34+
[[autodoc]] metrics.metrics_sample.F1_score
35+
### LoglikelihoodAcc
36+
[[autodoc]] metrics.metrics_sample.LoglikelihoodAcc
37+
### NormalizedMultiChoiceProbability
38+
[[autodoc]] metrics.metrics_sample.NormalizedMultiChoiceProbability
39+
### Probability
40+
[[autodoc]] metrics.metrics_sample.Probability
41+
### Recall
42+
[[autodoc]] metrics.metrics_sample.Recall
43+
### MRR
44+
[[autodoc]] metrics.metrics_sample.MRR
45+
### ROUGE
46+
[[autodoc]] metrics.metrics_sample.ROUGE
47+
### BertScore
48+
[[autodoc]] metrics.metrics_sample.BertScore
49+
### Extractiveness
50+
[[autodoc]] metrics.metrics_sample.Extractiveness
51+
### Faithfulness
52+
[[autodoc]] metrics.metrics_sample.Faithfulness
53+
### BLEURT
54+
[[autodoc]] metrics.metrics_sample.BLEURT
55+
### BLEU
56+
[[autodoc]] metrics.metrics_sample.BLEU
57+
### StringDistance
58+
[[autodoc]] metrics.metrics_sample.StringDistance
59+
### JudgeLLM
60+
[[autodoc]] metrics.metrics_sample.JudgeLLM
61+
### JudgeLLMMTBench
62+
[[autodoc]] metrics.metrics_sample.JudgeLLMMTBench
63+
### JudgeLLMMixEval
64+
[[autodoc]] metrics.metrics_sample.JudgeLLMMixEval
65+
### MajAtK
66+
[[autodoc]] metrics.metrics_sample.MajAtK
67+
68+
## LLM-as-a-Judge
69+
### JudgeLM
70+
[[autodoc]] metrics.llm_as_judge.JudgeLM
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# ModelConfig
2+
3+
[[autodoc]] models.model_config.BaseModelConfig
4+
5+
[[autodoc]] models.model_config.AdapterModelConfig
6+
[[autodoc]] models.model_config.DeltaModelConfig
7+
[[autodoc]] models.model_config.InferenceEndpointModelConfig
8+
[[autodoc]] models.model_config.InferenceModelConfig
9+
[[autodoc]] models.model_config.TGIModelConfig
10+
[[autodoc]] models.model_config.VLLMModelConfig
11+
12+
[[autodoc]] models.model_config.create_model_config
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Models
2+
3+
## Model
4+
### LightevalModel
5+
[[autodoc]] models.abstract_model.LightevalModel
6+
7+
## Accelerate and Transformers Models
8+
### BaseModel
9+
[[autodoc]] models.base_model.BaseModel
10+
[//]: # (TODO: Fix import error)
11+
[//]: # (### AdapterModel)
12+
[//]: # ([[autodoc]] models.adapter_model.AdapterModel)
13+
### DeltaModel
14+
[[autodoc]] models.delta_model.DeltaModel
15+
16+
## Inference Endpoints and TGI Models
17+
### InferenceEndpointModel
18+
[[autodoc]] models.endpoint_model.InferenceEndpointModel
19+
### ModelClient
20+
[[autodoc]] models.tgi_model.ModelClient
21+
22+
[//]: # (TODO: Fix import error)
23+
[//]: # (## Nanotron Model)
24+
[//]: # (### NanotronLightevalModel)
25+
[//]: # ([[autodoc]] models.nanotron_model.NanotronLightevalModel)
26+
27+
[//]: # (TODO: Fix import error)
28+
[//]: # (## VLLM Model)
29+
[//]: # (### VLLMModel)
30+
[//]: # ([[autodoc]] models.vllm_model.VLLMModel)

0 commit comments

Comments
 (0)