Skip to content

Releases: huggingface/evaluate

v0.4.2

30 Apr 09:45
a4bdc10
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.4.1...v0.4.2

v0.4.1

13 Oct 15:57
87f7b37
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.4.0...v0.4.1

v0.4.0

13 Dec 13:35
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.3.0...v0.4.0

v0.3.0

13 Oct 13:04
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.2...v0.3.0

v0.2.2

29 Jul 14:58
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.2.1...v0.2.2

v0.2.1

28 Jul 13:13
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.2.0...v0.2.1

v0.2.0

25 Jul 14:34
Compare
Choose a tag to compare

What's New

evaluator

The evaluator has been extended to three new tasks:

  • "image-classification"
  • "token-classification"
  • "question-answering"

combine

With combine one can bundle several metrics into a single object that can be evaluated in one call and also used in combination with the evalutor.

What's Changed

New Contributors

Full Changelog: v0.1.2...v0.2.0

v0.1.2

16 Jun 10:01
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.1.1...v0.1.2

v0.1.1

08 Jun 12:38
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.1.0...v0.1.1

Initial relase of `evaluate`

31 May 13:57
Compare
Choose a tag to compare

Release notes


These are the release notes of the initial release of the Evaluate library.

Goals


Goals of the Evaluate library:

  • reproducibility: reporting and reproducing results is easy
  • ease-of-use: access to a wide range of evaluation tools with a unified interface
  • diversity: provide wide range of evaluation tools with metrics, comparisons, and measurements
  • multimodal: models and datasets of many modalities can be evaluated
  • community-driven: anybody can add custom evaluations by hosting them on the Hugging Face Hub

Release overview:

  • evaluate.load(): The load() function is the main entry point into evaluate and allows to load evaluation modules from a local folder, the evaluate repository, or the Hugging Face Hub. It downloads, caches, and loads the evaluation modules and returns an evaluate.EvaluationModule.
  • evaluate.save(): With save() a user can save evaluation results in a JSON file. In addition to the results from evaluate.EvaluationModule it can save additional parameters and automatically saves the timestamp, git commit hash, library version as well as Python path. One can either provide a directory for the results, in which case file names are automatically created, or an explicit file name for the result.
  • evaluate.push_to_hub(): The push_to_hub function allows to push the results of a model evaluation to the model card on the Hugging Face Hub. The model, dataset, and metric are specified such that they can be linked on the hub.
  • evaluate.EvaluationModule: The EvaluationModule class is the baseclass for all evaluation modules. There are three module types: metrics (to evaluate models), comparisons (to compare models), and measurements (to analyze datasets). The inputs can be either added with add (single input) and add_batch (batch of inputs) followed by a final compute call to compute the scores or all inputs can be passed to compute directly. Under the hood, Apache Arrow stores and loads the input data to compute the scores.
  • evaluate.EvaluationModuleInfo: The EvaluationModule class is used to store attributes:
    • description: A short description of the evaluation module.
    • citation: A BibTex string for citation when available.
    • features: A Features object defining the input format. The inputs provided to add, add_batch, and compute are tested against these types and an error is thrown in case of a mismatch.
    • inputs_description: This is equivalent to the modules docstring.
    • homepage: The homepage of the module.
    • license: The license of the module.
    • codebase_urls: Link to the code behind the module.
    • reference_urls: Additional reference URLs.
  • evaluate.evaluator: The evaluator provides automated evaluation and only requires a model, dataset, metric, in contrast to the metrics in the EvaluationModule which require model predictions. It has three main components: a model wrapped in a pipeline, a dataset, and a metric, and it returns the computed evaluation scores. Besides the three main components, it may also require two mappings to align the columns in the dataset and the pipeline labels with the datasets labels. This is an experimental feature -- currently, only text classification is supported.
  • evaluate-cli: The community can add custom metrics by adding the necessary module script to a Space on the Hugging Face Hub. The evaluate-cli is a tool that simplifies this process by creating the Space, populating a template, and pushing it to the Hub. It also provides instructions to customize the template and integrate custom logic.

Main contributors:


@lvwerra , @sashavor , @NimaBoscarino , @ola13 , @osanseviero , @lhoestq , @lewtun , @douwekiela