Big Documentation Update + align naming of configspace for yaml usage

automl · Feb 10, 2024 · 01fe73f · 01fe73f
1 parent 8a0ebd8
commit 01fe73f
Show file tree

Hide file tree

Showing 23 changed files with 525 additions and 116 deletions.
diff --git a/README.md b/README.md
@@ -50,9 +50,10 @@ pip install neural-pipeline-search
 
 Using `neps` always follows the same pattern:
 
-1. Define a `run_pipeline` function that evaluates architectures/hyperparameters for your problem
-1. Define a search space `pipeline_space` of architectures/hyperparameters
-1. Call `neps.run` to optimize `run_pipeline` over `pipeline_space`
+1. Define a `run_pipeline` function capable of evaluating different architectural and/or hyperparameter configurations
+   for your problem.
+2. Define a search space named `pipeline_space` of those Parameters e.g. via a dictionary
+3. Call `neps.run` to optimize `run_pipeline` over `pipeline_space`
 
 In code, the usage pattern can look like this:
 
@@ -69,20 +70,20 @@ def run_pipeline(
     model = MyModel(architecture_parameter)
 
     # Train and evaluate the model with your training pipeline
-    validation_error, test_error = train_and_eval(
+    validation_error, training_error = train_and_eval(
         model, hyperparameter_a, hyperparameter_b
     )
 
     return {  # dict or float(validation error)
         "loss": validation_error,
         "info_dict": {
-            "test_error": test_error
+            "training_error": training_error
             # + Other metrics
         },
     }
 
 
-# 2. Define a search space of hyperparameters; use the same names as in run_pipeline
+# 2. Define a search space of parameters; use the same names for the parameters as in run_pipeline
 pipeline_space = dict(
     hyperparameter_b=neps.IntegerParameter(
         lower=1, upper=42, is_fidelity=True
@@ -111,20 +112,19 @@ if __name__ == "__main__":
 ## Examples
 
 Discover how NePS works through these practical examples:
+* **[Pipeline Space via YAML](neps_examples/basic_usage/defining_search_space)**: Explore how to define the `pipeline_space` using a
+  YAML file instead of a dictionary.
 
-* **Hyperparameter Optimization (HPO)**: Learn the essentials of hyperparameter optimization with NePS. [View Example](neps_examples/basic_usage/hyperparameters.py)
+* **[Hyperparameter Optimization (HPO)](neps_examples/basic_usage/hyperparameters.py)**: Learn the essentials of hyperparameter optimization with NePS.
 
-* **Defining Search Space with YAML**: Explore how to define the search space for your neural network models using a YAML file. [View Example](neps_examples/basic_usage/defining_search_space)
+* **[Architecture Search with Primitives](neps_examples/basic_usage/architecture.py)**: Dive into architecture search using primitives in NePS.
 
-* **Architecture Search with Primitives**: Dive into architecture search using primitives in NePS. [View Example](neps_examples/basic_usage/architecture.py)
+* **[Multi-Fidelity Optimization](neps_examples/efficiency/multi_fidelity.py)**: Understand how to leverage multi-fidelity optimization for efficient model tuning.
 
-* **Multi-Fidelity Optimization**: Understand how to leverage multi-fidelity optimization for efficient model tuning. [View Example](neps_examples/efficiency/multi_fidelity.py)
-
-* **Utilizing Expert Priors for Hyperparameters**: Learn how to incorporate expert priors for more efficient hyperparameter selection. [View Example](neps_examples/efficiency/expert_priors_for_hyperparameters.py)
+* **[Utilizing Expert Priors for Hyperparameters](neps_examples/efficiency/expert_priors_for_hyperparameters.py)**: Learn how to incorporate expert priors for more efficient hyperparameter selection.
 
 * **[Additional NePS Examples](neps_examples/)**: Explore more examples, including various use cases and advanced configurations in NePS.
 
-
 ## Documentation
 
 For more details and features please have a look at our [documentation](https://automl.github.io/neps/latest/)

diff --git a/docs/README.md b/docs/README.md
@@ -1,18 +1,33 @@
-# Installation
+# Neural Pipeline Search (NePS)
 
-## Install from pip
+[![PyPI version](https://img.shields.io/pypi/v/neural-pipeline-search?color=informational)](https://pypi.org/project/neural-pipeline-search/)
+[![Python versions](https://img.shields.io/pypi/pyversions/neural-pipeline-search)](https://pypi.org/project/neural-pipeline-search/)
+[![License](https://img.shields.io/pypi/l/neural-pipeline-search?color=informational)](LICENSE)
+[![Tests](https://github.com/automl/neps/actions/workflows/tests.yaml/badge.svg)](https://github.com/automl/neps/actions)
 
-```bash
-pip install neural-pipeline-search
-```
+Welcome to NePS, a powerful and flexible Python library for hyperparameter optimization (HPO) and neural architecture search (NAS) with its primary goal: enable HPO adoption in practice for deep learners!
 
-## Install from source
+NePS houses recently published and some more well-established algorithms that are all capable of being run massively parallel on any distributed setup, with tools to analyze runs, restart runs, etc.
 
-!!! note
-    We use [poetry](https://python-poetry.org/docs/) to manage dependecies.
 
-```bash
-git clone https://github.com/automl/neps.git
-cd neps
-poetry install --no-dev
-```
+## Key Features
+
+In addition to the common features offered by traditional HPO and NAS libraries, NePS stands out with the following key features:
+
+1. [**Hyperparameter Optimization (HPO) With Prior Knowledge:**](neps_examples/template/priorband_template.py)
+    - NePS excels in efficiently tuning hyperparameters using algorithms that enable users to make use of their prior knowledge within the search space. This is leveraged by the insights presented in:
+        - [PriorBand: Practical Hyperparameter Optimization in the Age of Deep Learning](https://arxiv.org/abs/2306.12370)
+        - [πBO: Augmenting Acquisition Functions with User Beliefs for Bayesian Optimization](https://arxiv.org/abs/2204.11051)
+
+2. [**Neural Architecture Search (NAS) With Context-free Grammar Search Spaces:**](neps_examples/basic_usage/architecture.py)
+    - NePS is equipped to handle context-free grammar search spaces, providing advanced capabilities for designing and optimizing architectures. this is leveraged by the insights presented in:
+        - [Construction of Hierarchical Neural Architecture Search Spaces based on Context-free Grammars](https://arxiv.org/abs/2211.01842)
+
+3. [**Easy Parallelization:**](docs/parallelization.md)
+    - NePS simplifies the parallelization of optimization tasks. Whether experiments are running on a single machine or a distributed computing environment.
+
+4. [**Resume Runs After Termination:**](docs/parallelization.md)
+    - NePS allows users to easily resume optimization runs after termination, providing a convenient and efficient workflow for long-running experiments.
+
+5. [**Seamless User Code Integration:**](neps_examples/template/)
+    - NePS's modular design ensures flexibility and extensibility. Integrate NePS effortlessly into existing machine learning workflows.
diff --git a/docs/getting_started.md b/docs/getting_started.md
@@ -0,0 +1,101 @@
+# Getting Started
+
+Getting started with NePS involves a straightforward yet powerful process, centering around its three main components.
+This approach ensures flexibility and efficiency in evaluating different architecture and hyperparameter configurations
+for your problem.
+
+## The 3 Main Components
+1. **Define a [`run_pipeline`](https://automl.github.io/neps/latest/run_pipeline) Function**: This function is essential
+for evaluating different configurations. You'll implement the specific logic for your problem within this function.
+For detailed instructions on initializing and effectively using `run_pipeline`, refer to the guide.
+
+2. **Establish a [`pipeline_space`](https://automl.github.io/neps/latest/pipeline_space)**: Your search space for
+defining parameters. You can structure this in various formats, including dictionaries, YAML, or ConfigSpace.
+The guide offers insights into defining and configuring your search space.
+
+3. **Execute with [`neps.run`](https://automl.github.io/neps/latest/neps_run)**: Optimize your `run_pipeline` over
+the `pipeline_space` using this function. For a thorough overview of the arguments and their explanations,
+check out the detailed documentation.
+
+By following these steps and utilizing the extensive resources provided in the guides, you can tailor NePS to meet
+your specific requirements, ensuring a streamlined and effective optimization process.
+
+## Basic Usage
+In code, the usage pattern can look like this:
+
+```python
+import neps
+import logging
+
+
+# 1. Define a function that accepts hyperparameters and computes the validation error
+def run_pipeline(
+    hyperparameter_a: float, hyperparameter_b: int, architecture_parameter: str
+) -> dict:
+    # insert here your own model
+    model = MyModel(architecture_parameter)
+
+    # insert here your training/evaluation pipeline
+    validation_error, training_error = train_and_eval(
+        model, hyperparameter_a, hyperparameter_b
+    )
+
+    return {  # dict or float(validation error)
+        "loss": validation_error,
+        "info_dict": {
+            "training_error": training_error
+            # + Other metrics
+        },
+    }
+
+
+# 2. Define a search space of the parameters of interest; ensure that the names are consistent with those defined
+# in the run_pipeline function
+pipeline_space = dict(
+    hyperparameter_b=neps.IntegerParameter(
+        lower=1, upper=42, is_fidelity=True
+    ),  # Mark 'is_fidelity' as true for a multi-fidelity approach.
+    hyperparameter_a=neps.FloatParameter(
+        lower=0.001, upper=0.1, log=True
+    ),  # If True, the search space is sampled in log space.
+    architecture_parameter=neps.CategoricalParameter(
+        ["option_a", "option_b", "option_c"]
+    ),
+)
+
+if __name__ == "__main__":
+    # 3. Run the NePS optimization
+    logging.basicConfig(level=logging.INFO)
+    neps.run(
+        run_pipeline=run_pipeline,
+        pipeline_space=pipeline_space,
+        root_directory="path/to/save/results",  # Replace with the actual path.
+        max_evaluations_total=100,
+        searcher="hyperband"  # Optional specifies the search strategy,
+        # otherwise NePs decides based on your data.
+    )
+```
+
+## Examples
+
+Discover the features of NePS through these practical examples:
+
+* **[Hyperparameter Optimization (HPO)](
+https://github.com/automl/neps/blob/master/neps_examples/template/basic_template.py)**: Learn the essentials of
+hyperparameter optimization with NePS.
+
+* **[Architecture Search with Primitives](
+https://github.com/automl/neps/tree/master/neps_examples/basic_usage/architecture.py)**: Dive into architecture search
+using primitives in NePS.
+
+* **[Multi-Fidelity Optimization](
+https://github.com/automl/neps/tree/master/neps_examples/efficiency/multi_fidelity.py)**: Understand how to leverage
+multi-fidelity optimization for efficient model tuning.
+
+* **[Utilizing Expert Priors for Hyperparameters](
+https://github.com/automl/neps/blob/master/neps_examples/template/priorband_template.py)**:
+Learn how to incorporate expert priors for more efficient hyperparameter selection.
+
+* **[Additional NePS Examples](
+https://github.com/automl/neps/tree/master/neps_examples/)**: Explore more examples, including various use cases and
+advanced configurations in NePS.
diff --git a/docs/installation.md b/docs/installation.md
@@ -0,0 +1,24 @@
+# Installation
+
+## Prerequisites
+
+Ensure you have Python version 3.8, 3.9, 3.10, or 3.11 installed. NePS installation will automatically handle
+any additional dependencies via pip.
+
+## Install from pip
+
+```bash
+pip install neural-pipeline-search
+```
+> Note: As indicated with the `v0.x.x` version number, NePS is early stage code and APIs might change in the future.
+
+## Install from source
+
+!!! note
+    We use [poetry](https://python-poetry.org/docs/) to manage dependecies.
+
+```bash
+git clone https://github.com/automl/neps.git
+cd neps
+poetry install --no-dev
+```
diff --git a/docs/neps_run.md b/docs/neps_run.md
@@ -0,0 +1,100 @@
+# Configuring and Running Optimizations
+
+The `neps.run` function is the core of the NePS optimization process, where the search for the best hyperparameters
+and architectures takes place. This document outlines the arguments and options available within this function,
+providing a detailed guide to customize the optimization process to your specific needs.
+
+## Search Strategy
+At default NePS intelligently selects the most appropriate search strategy based on your defined configurations in
+`pipeline_space`.
+The characteristics of your search space, as represented in the `pipeline_space`, play a crucial role in determining
+which optimizer NePS will choose. This automatic selection process ensures that the strategy aligns perfectly
+with the specific requirements and nuances of your search space, thereby optimizing the effectiveness of the
+hyperparameter and/or architecture optimization. You can also manually select a specific or custom optimizer that better
+matches your specific needs. For more information, refer [here](https://automl.github.io/neps/latest/optimizers).
+
+## Arguments
+
+### Mandatory Arguments
+- **`run_pipeline`** (function): The objective function, targeted by NePS for minimization, by evaluation various
+  configurations. It requires these configurations as input and should return either a dictionary or a sole loss
+  value as the
+output. For correct setup instructions, refer to [here](https://automl.github.io/neps/latest/run_pipeline)
+- **`pipeline_space`** (dict | yaml | configspace): This defines the search space for the configurations from which the
+  optimizer samples. It accepts either a dictionary with the configuration names as keys, a path to a YAML
+  configuration file, or a configSpace.ConfigurationSpace object. For comprehensive information and examples,
+  please refer to the detailed guide available [here](https://automl.github.io/neps/latest/pipeline_space)
+
+- **`root_directory`** (str): The directory path where the information about the optimization and its progress gets
+  stored. This is also used to synchronize multiple calls to run(.) for parallelization.
+
+- **Budget**:
+To define a budget, provide either or both of the following parameters:
+
+    - **`max_evaluations_total`** (int, default: None): Specifies the total number of evaluations to conduct before
+      halting the optimization process.
+    - **`max_cost_total`** (int, default: None): Prevents the initiation of new evaluations once this cost
+      threshold is surpassed. This requires adding a cost value to the output of the `run_pipeline` function,
+      for example, return {'loss': loss, 'cost': cost}. For more details, please refer
+      [here](https://automl.github/io/neps/latest/run_pipeline)
+
+### Optional Arguments
+##### Further Monitoring Options
+  - **`overwrite_working_directory`** (bool, default: False): When set to True, the working directory
+    specified by
+    `root_directory` will be
+    cleared at the beginning of the run. This is e.g. useful when debugging a `run_pipeline` function.
+  - **`post_run_summary`** (bool, default: False): When enabled, this option generates a summary CSV file
+    upon the
+    completion of the
+    optimization process. The summary includes details of the optimization procedure, such as the best configuration,
+    the number of errors occurred, and the final performance metrics.
+  - **`development_stage_id`** (int | float | str, default: None): An optional identifier used when working with
+    multiple development stages. Instead of creating new root directories, use this identifier to save the results
+    of an optimization run in a separate dev_id folder within the root_directory.
+  - **`task_id`** (int | float | str, default: None): An optional identifier used when the optimization process
+    involves multiple tasks. This functions similarly to `development_stage_id`, but it creates a folder named
+    after the task_id instead of dev_id, providing an organized way to separate results for different tasks within
+    the `root_directory`.
+##### Parallelization Setup
+  - **`max_evaluations_per_run`** (int, default: None): Limits the number of evaluations for this specific call of
+    `neps.run`.
+  - **`continue_until_max_evaluation_completed`** (bool, default: False): In parallel setups, pending evaluations
+    normally count towards max_evaluations_total, halting new ones when this limit is reached. Setting this to
+    True enables continuous sampling of new evaluations until the total of completed ones meets max_evaluations_total,
+    optimizing resource use in time-sensitive scenarios.
+
+For an overview and further resources on how NePS supports parallelization in distributed systems, refer to
+the [Parallelization Overview](#parallelization).
+##### Handling Errors
+  - **`loss_value_on_error`** (float, default: None): When set, any error encountered in an evaluated configuration
+    will not halt the process; instead, the specified loss value will be used for that configuration.
+  - **`cost_value_on_error`** (float, default: None): Similar to `loss_value_on_error`, but for the cost value.
+  - **`ignore_errors`** (bool, default: False): If True, errors encountered during the evaluation of configurations
+    will be ignored, and the optimization will continue. Note: This error configs still count towards
+    max_evaluations_total.
+##### Search Strategy Customization
+  - **`searcher`** (Literal["bayesian_optimization", "hyperband",..] | BaseOptimizer, default: "default"): Specifies
+    manually which of the optimization strategy to use. Provide a string identifying one of the built-in
+    search strategies or an instance of a custom `BaseOptimizer`.
+  - **`searcher_path`** (Path | str, default: None): A path to a custom searcher implementation.
+  - **`**searcher_kwargs`**: Additional keyword arguments to be passed to the searcher.
+
+  For more information about the available searchers and how to customize your own, refer
+[here](https://automl.github.io/neps/latest/optimizers).
+##### Others
+  - **`pre_load_hooks`** (Iterable, default: None): A list of hook functions to be called before loading results.
+
+## Parallelization
+
+`neps.run` can be called multiple times with multiple processes or machines, to parallelize the optimization process.
+Ensure that `root_directory` points to a shared location across all instances to synchronize the optimization efforts.
+For more information [look here](https://automl.github.io/neps/latest/parallelization)
+
+## Customization
+
+The `neps.run` function allows for extensive customization through its arguments, enabling to adapt the
+optimization process to the complexities of your specific problems.
+
+For a deeper understanding of how to use `neps.run` in a practical scenario, take a look at our
+[examples and templates](https://github.com/automl/neps/tree/master/neps_examples).