Merge pull request #34 from automl/readme_changes_danrgll

README Update: Minor Format Changes and new Code Usage Part
automl · Jan 8, 2024 · d6ce79a · d6ce79a
2 parents 5f8ea6a + 8f62f02
commit d6ce79a
Show file tree

Hide file tree

Showing 12 changed files with 313 additions and 47 deletions.
diff --git a/README.md b/README.md
@@ -5,43 +5,48 @@
 [![License](https://img.shields.io/pypi/l/neural-pipeline-search?color=informational)](LICENSE)
 [![Tests](https://github.com/automl/neps/actions/workflows/tests.yaml/badge.svg)](https://github.com/automl/neps/actions)
 
-NePS helps deep learning experts to optimize the hyperparameters and/or architecture of their deep learning pipeline with:
+Welcome to NePS, a powerful and flexible Python library for hyperparameter optimization (HPO) and neural architecture search (NAS) with its primary goal: enable HPO adoption in practice for deep learners!
 
-- Hyperparameter Optimization (HPO) ([example](neps_examples/basic_usage/hyperparameters.py))
-- Neural Architecture Search (NAS) ([example](neps_examples/basic_usage/architecture.py), [paper](https://openreview.net/forum?id=Ok58hMNXIQ))
-- Joint Architecture and Hyperparameter Search (JAHS) ([example](neps_examples/basic_usage/architecture_and_hyperparameters.py), [paper](https://openreview.net/forum?id=_HLcjaVlqJ))
+NePS houses recently published and some more well-established algorithms that are all capable of being run massively parallel on any distributed setup, with tools to analyze runs, restart runs, etc.
 
-For efficiency and convenience NePS allows you to
+Take a look at our [documentation](https://automl.github.io/neps/latest/) and continue following through current README for instructions on how to use NePS!
 
-- Add your intuition as priors for the search ([example HPO](neps_examples/efficiency/expert_priors_for_hyperparameters.py), [example JAHS](neps_examples/experimental/expert_priors_for_architecture_and_hyperparameters.py), [paper](https://openreview.net/forum?id=MMAeCXIa89))
-- Utilize low fidelity (e.g., low epoch) evaluations to focus on promising configurations ([example](neps_examples/efficiency/multi_fidelity.py), [paper](https://openreview.net/forum?id=ds21dwfBBH))
-- Trivially parallelize across machines ([example](neps_examples/efficiency/parallelization.md), [documentation](https://automl.github.io/neps/latest/parallelization/))
 
-Or [all of the above](neps_examples/efficiency/multi_fidelity_and_expert_priors.py) for maximum efficiency!
+## Key Features
 
-## Recent publications
+In addition to the common features offered by traditional HPO and NAS libraries, NePS stands out with the following key features:
 
-* [PriorBand: Practical Hyperparameter Optimization in the Age of Deep Learning](https://arxiv.org/abs/2306.12370), NeurIPS 2023
-* [Construction of Hierarchical Neural Architecture Search Spaces based on Context-free Grammars](https://arxiv.org/abs/2211.01842), NeurIPS 2023
-* [πBO: Augmenting Acquisition Functions with User Beliefs for Bayesian Optimization](https://arxiv.org/abs/2204.11051), ICLR 2022
+1. [**Hyperparameter Optimization (HPO) With Prior Knowledge:**](neps_examples/template/priorband_template.py)
+    - NePS excels in efficiently tuning hyperparameters using algorithms that enable users to make use of their prior knowledge within the search space. This is leveraged by the insights presented in:
+        - [PriorBand: Practical Hyperparameter Optimization in the Age of Deep Learning](https://arxiv.org/abs/2306.12370)
+        - [πBO: Augmenting Acquisition Functions with User Beliefs for Bayesian Optimization](https://arxiv.org/abs/2204.11051)
 
-## Documentation
+2. [**Neural Architecture Search (NAS) With Context-free Grammar Search Spaces:**](neps_examples/basic_usage/architecture.py)
+    - NePS is equipped to handle context-free grammar search spaces, providing advanced capabilities for designing and optimizing architectures. this is leveraged by the insights presented in:
+        - [Construction of Hierarchical Neural Architecture Search Spaces based on Context-free Grammars](https://arxiv.org/abs/2211.01842)
+
+3. [**Easy Parallelization:**](docs/parallelization.md)
+    - NePS simplifies the parallelization of optimization tasks. Whether experiments are running on a single machine or a distributed computing environment.
 
-Please have a look at our [documentation](https://automl.github.io/neps/latest/) and [examples](neps_examples).
+4. [**Resume Runs After Termination:**](docs/parallelization.md)
+    - NePS allows users to easily resume optimization runs after termination, providing a convenient and efficient workflow for long-running experiments.
 
-## Note
+5. [**Seamless User Code Integration:**](neps_examples/template/)
+    - NePS's modular design ensures flexibility and extensibility. Integrate NePS effortlessly into existing machine learning workflows.
 
-As indicated with the `v0.x.x` version number, NePS is early-stage code and APIs might change in the future.
+## Getting Started
 
-## Installation
+### 1. Installation
 
-Using pip
+Using pip:
 
 ```bash
 pip install neural-pipeline-search
 ```
 
-## Usage
+> Note: As indicated with the `v0.x.x` version number, NePS is early stage code and APIs might change in the future.
+
+### 2. Basic Usage
 
 Using `neps` always follows the same pattern:
 
@@ -57,28 +62,72 @@ import logging
 
 
 # 1. Define a function that accepts hyperparameters and computes the validation error
-def run_pipeline(hyperparameter_a: float, hyperparameter_b: int):
-    validation_error = -hyperparameter_a * hyperparameter_b
-    return validation_error
+def run_pipeline(
+    hyperparameter_a: float, hyperparameter_b: int, architecture_parameter: str
+) -> dict:
+    # Create your model
+    model = MyModel(architecture_parameter)
+
+    # Train and evaluate the model with your training pipeline
+    validation_error, test_error = train_and_eval(
+        model, hyperparameter_a, hyperparameter_b
+    )
+
+    return {  # dict or float(validation error)
+        "loss": validation_error,
+        "info_dict": {
+            "test_error": test_error
+            # + Other metrics
+        },
+    }
 
 
 # 2. Define a search space of hyperparameters; use the same names as in run_pipeline
 pipeline_space = dict(
-    hyperparameter_a=neps.FloatParameter(lower=0, upper=1),
-    hyperparameter_b=neps.IntegerParameter(lower=1, upper=100),
+    hyperparameter_b=neps.IntegerParameter(
+        lower=1, upper=42, is_fidelity=True
+    ),  # Mark 'is_fidelity' as true for a multi-fidelity approach.
+    hyperparameter_a=neps.FloatParameter(
+        lower=0.001, upper=0.1, log=True
+    ),  # If True, the search space is sampled in log space.
+    architecture_parameter=neps.CategoricalParameter(
+        ["option_a", "option_b", "option_c"]
+    ),
 )
 
-# 3. Call neps.run to optimize run_pipeline over pipeline_space
-logging.basicConfig(level=logging.INFO)
-neps.run(
-    run_pipeline=run_pipeline,
-    pipeline_space=pipeline_space,
-    root_directory="usage_example",
-    max_evaluations_total=5,
-)
+if __name__ == "__main__":
+    # 3. Run the NePS optimization
+    logging.basicConfig(level=logging.INFO)
+    neps.run(
+        run_pipeline=run_pipeline,
+        pipeline_space=pipeline_space,
+        root_directory="path/to/save/results",  # Replace with the actual path.
+        max_evaluations_total=100,
+        searcher="hyperband"  # Optional specifies the search strategy,
+        # otherwise NePs decides based on your data.
+    )
 ```
 
-For more details and features please have a look at our [documentation](https://automl.github.io/neps/latest/) and [examples](neps_examples).
+## Examples
+
+Discover how NePS works through these practical examples:
+
+* **Hyperparameter Optimization (HPO)**: Learn the essentials of hyperparameter optimization with NePS. [View Example](neps_examples/basic_usage/hyperparameters.py)
+
+* **Defining Search Space with YAML**: Explore how to define the search space for your neural network models using a YAML file. [View Example](neps_examples/basic_usage/defining_search_space)
+
+* **Architecture Search with Primitives**: Dive into architecture search using primitives in NePS. [View Example](neps_examples/basic_usage/architecture.py)
+
+* **Multi-Fidelity Optimization**: Understand how to leverage multi-fidelity optimization for efficient model tuning. [View Example](neps_examples/efficiency/multi_fidelity.py)
+
+* **Utilizing Expert Priors for Hyperparameters**: Learn how to incorporate expert priors for more efficient hyperparameter selection. [View Example](neps_examples/efficiency/expert_priors_for_hyperparameters.py)
+
+* **[Additional NePS Examples](neps_examples/)**: Explore more examples, including various use cases and advanced configurations in NePS.
+
+
+## Documentation
+
+For more details and features please have a look at our [documentation](https://automl.github.io/neps/latest/)
 
 ## Analysing runs
 

diff --git a/docs/README.md b/docs/README.md
@@ -1,9 +1,18 @@
-# Introduction and Installation
+# Installation
 
-## Installation
-
-Using pip
+## Install from pip
 
 ```bash
 pip install neural-pipeline-search
 ```
+
+## Install from source
+
+!!! note
+    We use [poetry](https://python-poetry.org/docs/) to manage dependecies.
+
+```bash
+git clone https://github.com/automl/neps.git
+cd neps
+poetry install --no-dev
+```
diff --git a/docs/analyse.md b/docs/analyse.md
@@ -18,6 +18,132 @@ ROOT_DIRECTORY
 └── best_loss_with_config_trajectory.txt
 ```
 
+## TensorBoard Integration
+
+### Introduction
+
+[TensorBoard](https://www.tensorflow.org/tensorboard) serves as a valuable tool for visualizing machine learning experiments, offering the ability to observe losses and metrics throughout the model training process. In NePS, we use this powerful tool to show metrics of configurations during training in addition to comparisons to different hyperparameters used in the search for better diagnosis of the model.
+
+### The Logging Function
+
+The `tblogger.log` function is invoked within the model's training loop to facilitate logging of key metrics.
+
+!!! tip 
+
+    The logger function is primarily designed for implementation within the `run_pipeline` function during the training of the neural network.
+
+- **Signature:**
+```python
+tblogger.log(
+    loss: float,
+    current_epoch: int,
+    write_summary_incumbent: bool = False,
+    write_config_scalar: bool = False,
+    write_config_hparam: bool = True,
+    extra_data: dict | None = None
+)
+```
+
+- **Parameters:**
+    - `loss` (float): The loss value to be logged.
+    - `current_epoch` (int): The current epoch or iteration number.
+    - `write_summary_incumbent` (bool, optional): Set to `True` for a live incumbent trajectory.
+    - `write_config_scalar` (bool, optional): Set to `True` for a live loss trajectory for each configuration.
+    - `write_config_hparam` (bool, optional): Set to `True` for live parallel coordinate, scatter plot matrix, and table view.
+    - `extra_data` (dict, optional): Additional data to be logged, provided as a dictionary.
+
+### Extra Custom Logging
+
+NePS provides dedicated functions for customized logging using the `extra_data` argument. 
+
+!!! note "Custom Logging Instructions"
+
+    Name the dictionary keys as the names of the values you want to log and pass one of the following functions as the values for a successful logging process.
+
+#### 1- Extra Scalar Logging
+
+Logs new scalar data during training. Uses `current_epoch` from the log function as its `global_step`.
+
+- **Signature:**
+```python
+tblogger.scalar_logging(value: float)
+```
+- **Parameters:**
+    - `value` (float): Any scalar value to be logged at the current epoch of `tblogger.log` function.
+
+#### 2- Extra Image Logging
+
+Logs images during training. Images can be resized, randomly selected, and a specified number can be logged at specified intervals. Uses `current_epoch` from the log function as its `global_step`.
+
+- **Signature:**
+```python
+tblogger.image_logging(
+    image: torch.Tensor,
+    counter: int = 1,
+    resize_images: list[None | int] | None = None,
+    random_images: bool = True,
+    num_images: int = 20,
+    seed: int | np.random.RandomState | None = None,
+)
+```
+
+- **Parameters:**
+    - `image` (torch.Tensor): Image tensor to be logged.
+    - `counter` (int): Log images every counter epochs (i.e., when current_epoch % counter equals 0).
+    - `resize_images` (list of int, optional): List of integers for image sizes after resizing (default: [32, 32]).
+    - `random_images` (bool, optional): Images are randomly selected if True (default: True).
+    - `num_images` (int, optional): Number of images to log (default: 20).
+    - `seed` (int or np.random.RandomState or None, optional): Seed value or RandomState instance to control randomness and reproducibility (default: None).
+
+### Logging Example
+
+For illustration purposes, we have employed a straightforward example involving the tuning of hyperparameters for a model utilized in the classification of the MNIST dataset provided by [torchvision](https://pytorch.org/vision/main/generated/torchvision.datasets.MNIST.html).
+
+You can find this example [here](https://github.com/automl/neps/blob/master/neps_examples/convenience/neps_tblogger_tutorial.py)
+
+!!! info "Important"
+    We have optimized the example for computational efficiency. If you wish to replicate the exact results showcased in the following section, we recommend the following modifications:
+
+    1- Increase maximum epochs [here](https://github.com/automl/neps/blob/master/neps_examples/convenience/neps_tblogger_tutorial.py#L260) from 2 to 10
+
+    2- Set the `write_summary_incumbent` argument [here](https://github.com/automl/neps/blob/master/neps_examples/convenience/neps_tblogger_tutorial.py#L300) to `True`
+
+    3- Change the searcher [here](https://github.com/automl/neps/blob/master/neps_examples/convenience/neps_tblogger_tutorial.py#L357) from `random_search` to `bayesian_optimization`
+
+    4- Increase the maximum evaluations [here](https://github.com/automl/neps/blob/master/neps_examples/convenience/neps_tblogger_tutorial.py#L362) from 2 to 14
+
+    5- Increase the maximum evaluations [here](https://github.com/automl/neps/blob/master/neps_examples/convenience/neps_tblogger_tutorial.py#L391) from 3 to 15
+
+### Visualization Results
+
+The following command will open a local host for TensorBoard visualizations, allowing you to view them either in real-time or after the run is complete.
+
+```bash
+tensorboard --logdir path/to/root_directory
+```
+
+This image shows visualizations related to scalar values logged during training. Scalars typically include metrics such as loss, incumbent trajectory, a summary of losses for all configurations, and any additional data provided via the `extra_data` argument in the `tblogger.log` function. 
+
+![scalar_loggings](doc_images/tensorboard/tblogger_scalar.jpg)
+
+This image represents visualizations related to logged images during training. It could include snapshots of input data, model predictions, or any other image-related information. In our case, we use images to depict instances of incorrect predictions made by the model.
+
+![image_loggings](doc_images/tensorboard/tblogger_image.jpg)
+
+The following images showcase visualizations related to hyperparameter logging in TensorBoard. These plots include three different views, providing insights into the relationship between different hyperparameters and their impact on the model.
+
+In the table view, you can explore hyperparameter configurations across five different trials. The table displays various hyperparameter values alongside corresponding evaluation metrics.
+
+![hparam_loggings1](doc_images/tensorboard/tblogger_hparam1.jpg)
+
+The parallel coordinate plot offers a holistic perspective on hyperparameter configurations. By presenting multiple hyperparameters simultaneously, this view allows you to observe the interactions between variables, providing insights into their combined influence on the model.
+
+![hparam_loggings2](doc_images/tensorboard/tblogger_hparam2.jpg)
+
+The scatter plot matrix view provides an in-depth analysis of pairwise relationships between different hyperparameters. By visualizing correlations and patterns, this view aids in identifying key interactions that may influence the model's performance.
+
+![hparam_loggings3](doc_images/tensorboard/tblogger_hparam3.jpg)
+
 ## Status
 
 To show status information about a neural pipeline search run, use
@@ -38,7 +164,7 @@ To show the status repeatedly, on unix systems you can use
 watch --interval 30 python -m neps.status ROOT_DIRECTORY
 ```
 
-## Visualizations
+## CLI commands
 
 To generate plots to the root directory, run
 

diff --git a/docs/doc_images/tensorboard/tblogger_hparam1.jpg b/docs/doc_images/tensorboard/tblogger_hparam1.jpg
diff --git a/docs/doc_images/tensorboard/tblogger_hparam2.jpg b/docs/doc_images/tensorboard/tblogger_hparam2.jpg
diff --git a/docs/doc_images/tensorboard/tblogger_hparam3.jpg b/docs/doc_images/tensorboard/tblogger_hparam3.jpg
diff --git a/docs/doc_images/tensorboard/tblogger_image.jpg b/docs/doc_images/tensorboard/tblogger_image.jpg
diff --git a/docs/doc_images/tensorboard/tblogger_scalar.jpg b/docs/doc_images/tensorboard/tblogger_scalar.jpg
diff --git a/docs/parallelization.md b/docs/parallelization.md
@@ -1,4 +1,43 @@
-# Parallelization
+# Parallelization and Resuming Runs
 
-In order to run a neural pipeline search with multiple processes or multiple machines, simply call `neps.run` multiple times.
-All calls to `neps.run` need to use the same `root_directory` on the same filesystem, otherwise there is no synchronization between the `neps.run`'s.
+NePS utilizes files as a means of communication for implementing parallelization and resuming runs. As a result, when `neps.run` is called multiple times with the same `root_directory` in the file system, NePS will automatically load the optimizer state, allowing seamless parallelization of the run across different processes or machines. This concept also applies to resuming runs even after termination. 
+
+Example:
+
+!!! note
+    The following example assumes all necessary imports are included, in addition to already having defined the [pipeline_space](https://automl.github.io/neps/latest/pipeline_space/) and the [run_pipeline](https://automl.github.io/neps/latest/run_pipeline/) functions. One can apply the same idea on [this](https://github.com/automl/neps/blob/master/neps_examples/basic_usage/hyperparameters.py) example.
+
+```python
+logging.basicConfig(level=logging.INFO)
+
+# Initial run
+neps.run(
+    run_pipeline=run_pipeline,
+    pipeline_space=pipeline_space,
+    root_directory="results/my_example",
+    max_evaluations_total=5,
+)
+```
+
+After the initial run, NePS will log the following message:
+
+```bash
+INFO:neps:Maximum total evaluations is reached, shutting down
+```
+
+If you wish to extend the search with more evaluations, simply update the `max_evaluations_total` parameter:
+
+```python
+logging.basicConfig(level=logging.INFO)
+
+
+# Resuming run with increased evaluations
+neps.run(
+    run_pipeline=run_pipeline,
+    pipeline_space=pipeline_space,
+    root_directory="results/my_example",
+    max_evaluations_total=10,
+)
+```
+
+Now, NePS will continue the search, loading the latest information for the searcher. For parallelization, as mentioned above, you can also run this code multiple times on different processes or machines. The file system communication will link them, as long as the `root_directory` has the same location.
diff --git a/docs/run_pipeline.md b/docs/run_pipeline.md
@@ -0,0 +1,28 @@
+# The run_pipeline Function
+
+The `run_pipeline` function is crucial for NePS. It encapsulates the `objective function` to be minimized, which could be a regular equation or a neural network.
+
+This function receives the configuration to be utilized from the parameters defined in the search space. Consequently, it executes the same set of instructions or equations based on the provided configuration to minimize the objective function.
+
+The `run_pipeline` function will look similar to the following:
+
+```python
+def run_pipeline(
+    pipeline_directory,           # The directory where the config is saved
+    previous_pipeline_directory,  # The directory of the immediate lower fidelity config
+    **config,                     # The hyperparameters to be used in the pipeline
+):
+
+    element_1 = config[element_1]
+    element_2 = config[element_2]
+    element_3 = config[element_3]
+
+    loss = element_1 - element_2 + element_3
+
+    return loss
+```
+
+The `run_pipeline` function should be replaced with the user's specific objective function. It is invoked by `neps.run` without any arguments, as these arguments are automatically handled by NePS. Additionally, NePS provides the pipeline directory and the previous pipeline directory for user convenience (mainly useful for searches that require fidelities).
+
+Have a look at our examples and templates [here](https://github.com/automl/neps/tree/master/neps_examples) to see how we use this function in different scenarios.
+