Skip to content

Commit

Permalink
Merge pull request #34 from automl/readme_changes_danrgll
Browse files Browse the repository at this point in the history
README Update: Minor Format Changes and new Code Usage Part
  • Loading branch information
Neeratyoy authored Jan 8, 2024
2 parents 5f8ea6a + 8f62f02 commit d6ce79a
Show file tree
Hide file tree
Showing 12 changed files with 313 additions and 47 deletions.
117 changes: 83 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,43 +5,48 @@
[![License](https://img.shields.io/pypi/l/neural-pipeline-search?color=informational)](LICENSE)
[![Tests](https://github.com/automl/neps/actions/workflows/tests.yaml/badge.svg)](https://github.com/automl/neps/actions)

NePS helps deep learning experts to optimize the hyperparameters and/or architecture of their deep learning pipeline with:
Welcome to NePS, a powerful and flexible Python library for hyperparameter optimization (HPO) and neural architecture search (NAS) with its primary goal: enable HPO adoption in practice for deep learners!

- Hyperparameter Optimization (HPO) ([example](neps_examples/basic_usage/hyperparameters.py))
- Neural Architecture Search (NAS) ([example](neps_examples/basic_usage/architecture.py), [paper](https://openreview.net/forum?id=Ok58hMNXIQ))
- Joint Architecture and Hyperparameter Search (JAHS) ([example](neps_examples/basic_usage/architecture_and_hyperparameters.py), [paper](https://openreview.net/forum?id=_HLcjaVlqJ))
NePS houses recently published and some more well-established algorithms that are all capable of being run massively parallel on any distributed setup, with tools to analyze runs, restart runs, etc.

For efficiency and convenience NePS allows you to
Take a look at our [documentation](https://automl.github.io/neps/latest/) and continue following through current README for instructions on how to use NePS!

- Add your intuition as priors for the search ([example HPO](neps_examples/efficiency/expert_priors_for_hyperparameters.py), [example JAHS](neps_examples/experimental/expert_priors_for_architecture_and_hyperparameters.py), [paper](https://openreview.net/forum?id=MMAeCXIa89))
- Utilize low fidelity (e.g., low epoch) evaluations to focus on promising configurations ([example](neps_examples/efficiency/multi_fidelity.py), [paper](https://openreview.net/forum?id=ds21dwfBBH))
- Trivially parallelize across machines ([example](neps_examples/efficiency/parallelization.md), [documentation](https://automl.github.io/neps/latest/parallelization/))

Or [all of the above](neps_examples/efficiency/multi_fidelity_and_expert_priors.py) for maximum efficiency!
## Key Features

## Recent publications
In addition to the common features offered by traditional HPO and NAS libraries, NePS stands out with the following key features:

* [PriorBand: Practical Hyperparameter Optimization in the Age of Deep Learning](https://arxiv.org/abs/2306.12370), NeurIPS 2023
* [Construction of Hierarchical Neural Architecture Search Spaces based on Context-free Grammars](https://arxiv.org/abs/2211.01842), NeurIPS 2023
* [πBO: Augmenting Acquisition Functions with User Beliefs for Bayesian Optimization](https://arxiv.org/abs/2204.11051), ICLR 2022
1. [**Hyperparameter Optimization (HPO) With Prior Knowledge:**](neps_examples/template/priorband_template.py)
- NePS excels in efficiently tuning hyperparameters using algorithms that enable users to make use of their prior knowledge within the search space. This is leveraged by the insights presented in:
- [PriorBand: Practical Hyperparameter Optimization in the Age of Deep Learning](https://arxiv.org/abs/2306.12370)
- [πBO: Augmenting Acquisition Functions with User Beliefs for Bayesian Optimization](https://arxiv.org/abs/2204.11051)

## Documentation
2. [**Neural Architecture Search (NAS) With Context-free Grammar Search Spaces:**](neps_examples/basic_usage/architecture.py)
- NePS is equipped to handle context-free grammar search spaces, providing advanced capabilities for designing and optimizing architectures. this is leveraged by the insights presented in:
- [Construction of Hierarchical Neural Architecture Search Spaces based on Context-free Grammars](https://arxiv.org/abs/2211.01842)

3. [**Easy Parallelization:**](docs/parallelization.md)
- NePS simplifies the parallelization of optimization tasks. Whether experiments are running on a single machine or a distributed computing environment.

Please have a look at our [documentation](https://automl.github.io/neps/latest/) and [examples](neps_examples).
4. [**Resume Runs After Termination:**](docs/parallelization.md)
- NePS allows users to easily resume optimization runs after termination, providing a convenient and efficient workflow for long-running experiments.

## Note
5. [**Seamless User Code Integration:**](neps_examples/template/)
- NePS's modular design ensures flexibility and extensibility. Integrate NePS effortlessly into existing machine learning workflows.

As indicated with the `v0.x.x` version number, NePS is early-stage code and APIs might change in the future.
## Getting Started

## Installation
### 1. Installation

Using pip
Using pip:

```bash
pip install neural-pipeline-search
```

## Usage
> Note: As indicated with the `v0.x.x` version number, NePS is early stage code and APIs might change in the future.
### 2. Basic Usage

Using `neps` always follows the same pattern:

Expand All @@ -57,28 +62,72 @@ import logging


# 1. Define a function that accepts hyperparameters and computes the validation error
def run_pipeline(hyperparameter_a: float, hyperparameter_b: int):
validation_error = -hyperparameter_a * hyperparameter_b
return validation_error
def run_pipeline(
hyperparameter_a: float, hyperparameter_b: int, architecture_parameter: str
) -> dict:
# Create your model
model = MyModel(architecture_parameter)

# Train and evaluate the model with your training pipeline
validation_error, test_error = train_and_eval(
model, hyperparameter_a, hyperparameter_b
)

return { # dict or float(validation error)
"loss": validation_error,
"info_dict": {
"test_error": test_error
# + Other metrics
},
}


# 2. Define a search space of hyperparameters; use the same names as in run_pipeline
pipeline_space = dict(
hyperparameter_a=neps.FloatParameter(lower=0, upper=1),
hyperparameter_b=neps.IntegerParameter(lower=1, upper=100),
hyperparameter_b=neps.IntegerParameter(
lower=1, upper=42, is_fidelity=True
), # Mark 'is_fidelity' as true for a multi-fidelity approach.
hyperparameter_a=neps.FloatParameter(
lower=0.001, upper=0.1, log=True
), # If True, the search space is sampled in log space.
architecture_parameter=neps.CategoricalParameter(
["option_a", "option_b", "option_c"]
),
)

# 3. Call neps.run to optimize run_pipeline over pipeline_space
logging.basicConfig(level=logging.INFO)
neps.run(
run_pipeline=run_pipeline,
pipeline_space=pipeline_space,
root_directory="usage_example",
max_evaluations_total=5,
)
if __name__ == "__main__":
# 3. Run the NePS optimization
logging.basicConfig(level=logging.INFO)
neps.run(
run_pipeline=run_pipeline,
pipeline_space=pipeline_space,
root_directory="path/to/save/results", # Replace with the actual path.
max_evaluations_total=100,
searcher="hyperband" # Optional specifies the search strategy,
# otherwise NePs decides based on your data.
)
```

For more details and features please have a look at our [documentation](https://automl.github.io/neps/latest/) and [examples](neps_examples).
## Examples

Discover how NePS works through these practical examples:

* **Hyperparameter Optimization (HPO)**: Learn the essentials of hyperparameter optimization with NePS. [View Example](neps_examples/basic_usage/hyperparameters.py)

* **Defining Search Space with YAML**: Explore how to define the search space for your neural network models using a YAML file. [View Example](neps_examples/basic_usage/defining_search_space)

* **Architecture Search with Primitives**: Dive into architecture search using primitives in NePS. [View Example](neps_examples/basic_usage/architecture.py)

* **Multi-Fidelity Optimization**: Understand how to leverage multi-fidelity optimization for efficient model tuning. [View Example](neps_examples/efficiency/multi_fidelity.py)

* **Utilizing Expert Priors for Hyperparameters**: Learn how to incorporate expert priors for more efficient hyperparameter selection. [View Example](neps_examples/efficiency/expert_priors_for_hyperparameters.py)

* **[Additional NePS Examples](neps_examples/)**: Explore more examples, including various use cases and advanced configurations in NePS.


## Documentation

For more details and features please have a look at our [documentation](https://automl.github.io/neps/latest/)

## Analysing runs

Expand Down
17 changes: 13 additions & 4 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,18 @@
# Introduction and Installation
# Installation

## Installation

Using pip
## Install from pip

```bash
pip install neural-pipeline-search
```

## Install from source

!!! note
We use [poetry](https://python-poetry.org/docs/) to manage dependecies.

```bash
git clone https://github.com/automl/neps.git
cd neps
poetry install --no-dev
```
128 changes: 127 additions & 1 deletion docs/analyse.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,132 @@ ROOT_DIRECTORY
└── best_loss_with_config_trajectory.txt
```

## TensorBoard Integration

### Introduction

[TensorBoard](https://www.tensorflow.org/tensorboard) serves as a valuable tool for visualizing machine learning experiments, offering the ability to observe losses and metrics throughout the model training process. In NePS, we use this powerful tool to show metrics of configurations during training in addition to comparisons to different hyperparameters used in the search for better diagnosis of the model.

### The Logging Function

The `tblogger.log` function is invoked within the model's training loop to facilitate logging of key metrics.

!!! tip

The logger function is primarily designed for implementation within the `run_pipeline` function during the training of the neural network.

- **Signature:**
```python
tblogger.log(
loss: float,
current_epoch: int,
write_summary_incumbent: bool = False,
write_config_scalar: bool = False,
write_config_hparam: bool = True,
extra_data: dict | None = None
)
```

- **Parameters:**
- `loss` (float): The loss value to be logged.
- `current_epoch` (int): The current epoch or iteration number.
- `write_summary_incumbent` (bool, optional): Set to `True` for a live incumbent trajectory.
- `write_config_scalar` (bool, optional): Set to `True` for a live loss trajectory for each configuration.
- `write_config_hparam` (bool, optional): Set to `True` for live parallel coordinate, scatter plot matrix, and table view.
- `extra_data` (dict, optional): Additional data to be logged, provided as a dictionary.

### Extra Custom Logging

NePS provides dedicated functions for customized logging using the `extra_data` argument.

!!! note "Custom Logging Instructions"

Name the dictionary keys as the names of the values you want to log and pass one of the following functions as the values for a successful logging process.

#### 1- Extra Scalar Logging

Logs new scalar data during training. Uses `current_epoch` from the log function as its `global_step`.

- **Signature:**
```python
tblogger.scalar_logging(value: float)
```
- **Parameters:**
- `value` (float): Any scalar value to be logged at the current epoch of `tblogger.log` function.

#### 2- Extra Image Logging

Logs images during training. Images can be resized, randomly selected, and a specified number can be logged at specified intervals. Uses `current_epoch` from the log function as its `global_step`.

- **Signature:**
```python
tblogger.image_logging(
image: torch.Tensor,
counter: int = 1,
resize_images: list[None | int] | None = None,
random_images: bool = True,
num_images: int = 20,
seed: int | np.random.RandomState | None = None,
)
```

- **Parameters:**
- `image` (torch.Tensor): Image tensor to be logged.
- `counter` (int): Log images every counter epochs (i.e., when current_epoch % counter equals 0).
- `resize_images` (list of int, optional): List of integers for image sizes after resizing (default: [32, 32]).
- `random_images` (bool, optional): Images are randomly selected if True (default: True).
- `num_images` (int, optional): Number of images to log (default: 20).
- `seed` (int or np.random.RandomState or None, optional): Seed value or RandomState instance to control randomness and reproducibility (default: None).

### Logging Example

For illustration purposes, we have employed a straightforward example involving the tuning of hyperparameters for a model utilized in the classification of the MNIST dataset provided by [torchvision](https://pytorch.org/vision/main/generated/torchvision.datasets.MNIST.html).

You can find this example [here](https://github.com/automl/neps/blob/master/neps_examples/convenience/neps_tblogger_tutorial.py)

!!! info "Important"
We have optimized the example for computational efficiency. If you wish to replicate the exact results showcased in the following section, we recommend the following modifications:

1- Increase maximum epochs [here](https://github.com/automl/neps/blob/master/neps_examples/convenience/neps_tblogger_tutorial.py#L260) from 2 to 10

2- Set the `write_summary_incumbent` argument [here](https://github.com/automl/neps/blob/master/neps_examples/convenience/neps_tblogger_tutorial.py#L300) to `True`

3- Change the searcher [here](https://github.com/automl/neps/blob/master/neps_examples/convenience/neps_tblogger_tutorial.py#L357) from `random_search` to `bayesian_optimization`

4- Increase the maximum evaluations [here](https://github.com/automl/neps/blob/master/neps_examples/convenience/neps_tblogger_tutorial.py#L362) from 2 to 14

5- Increase the maximum evaluations [here](https://github.com/automl/neps/blob/master/neps_examples/convenience/neps_tblogger_tutorial.py#L391) from 3 to 15

### Visualization Results

The following command will open a local host for TensorBoard visualizations, allowing you to view them either in real-time or after the run is complete.

```bash
tensorboard --logdir path/to/root_directory
```

This image shows visualizations related to scalar values logged during training. Scalars typically include metrics such as loss, incumbent trajectory, a summary of losses for all configurations, and any additional data provided via the `extra_data` argument in the `tblogger.log` function.

![scalar_loggings](doc_images/tensorboard/tblogger_scalar.jpg)

This image represents visualizations related to logged images during training. It could include snapshots of input data, model predictions, or any other image-related information. In our case, we use images to depict instances of incorrect predictions made by the model.

![image_loggings](doc_images/tensorboard/tblogger_image.jpg)

The following images showcase visualizations related to hyperparameter logging in TensorBoard. These plots include three different views, providing insights into the relationship between different hyperparameters and their impact on the model.

In the table view, you can explore hyperparameter configurations across five different trials. The table displays various hyperparameter values alongside corresponding evaluation metrics.

![hparam_loggings1](doc_images/tensorboard/tblogger_hparam1.jpg)

The parallel coordinate plot offers a holistic perspective on hyperparameter configurations. By presenting multiple hyperparameters simultaneously, this view allows you to observe the interactions between variables, providing insights into their combined influence on the model.

![hparam_loggings2](doc_images/tensorboard/tblogger_hparam2.jpg)

The scatter plot matrix view provides an in-depth analysis of pairwise relationships between different hyperparameters. By visualizing correlations and patterns, this view aids in identifying key interactions that may influence the model's performance.

![hparam_loggings3](doc_images/tensorboard/tblogger_hparam3.jpg)

## Status

To show status information about a neural pipeline search run, use
Expand All @@ -38,7 +164,7 @@ To show the status repeatedly, on unix systems you can use
watch --interval 30 python -m neps.status ROOT_DIRECTORY
```

## Visualizations
## CLI commands

To generate plots to the root directory, run

Expand Down
Binary file added docs/doc_images/tensorboard/tblogger_hparam1.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/doc_images/tensorboard/tblogger_hparam2.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/doc_images/tensorboard/tblogger_hparam3.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/doc_images/tensorboard/tblogger_image.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/doc_images/tensorboard/tblogger_scalar.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
45 changes: 42 additions & 3 deletions docs/parallelization.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,43 @@
# Parallelization
# Parallelization and Resuming Runs

In order to run a neural pipeline search with multiple processes or multiple machines, simply call `neps.run` multiple times.
All calls to `neps.run` need to use the same `root_directory` on the same filesystem, otherwise there is no synchronization between the `neps.run`'s.
NePS utilizes files as a means of communication for implementing parallelization and resuming runs. As a result, when `neps.run` is called multiple times with the same `root_directory` in the file system, NePS will automatically load the optimizer state, allowing seamless parallelization of the run across different processes or machines. This concept also applies to resuming runs even after termination.

Example:

!!! note
The following example assumes all necessary imports are included, in addition to already having defined the [pipeline_space](https://automl.github.io/neps/latest/pipeline_space/) and the [run_pipeline](https://automl.github.io/neps/latest/run_pipeline/) functions. One can apply the same idea on [this](https://github.com/automl/neps/blob/master/neps_examples/basic_usage/hyperparameters.py) example.

```python
logging.basicConfig(level=logging.INFO)

# Initial run
neps.run(
run_pipeline=run_pipeline,
pipeline_space=pipeline_space,
root_directory="results/my_example",
max_evaluations_total=5,
)
```

After the initial run, NePS will log the following message:

```bash
INFO:neps:Maximum total evaluations is reached, shutting down
```

If you wish to extend the search with more evaluations, simply update the `max_evaluations_total` parameter:

```python
logging.basicConfig(level=logging.INFO)


# Resuming run with increased evaluations
neps.run(
run_pipeline=run_pipeline,
pipeline_space=pipeline_space,
root_directory="results/my_example",
max_evaluations_total=10,
)
```

Now, NePS will continue the search, loading the latest information for the searcher. For parallelization, as mentioned above, you can also run this code multiple times on different processes or machines. The file system communication will link them, as long as the `root_directory` has the same location.
28 changes: 28 additions & 0 deletions docs/run_pipeline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# The run_pipeline Function

The `run_pipeline` function is crucial for NePS. It encapsulates the `objective function` to be minimized, which could be a regular equation or a neural network.

This function receives the configuration to be utilized from the parameters defined in the search space. Consequently, it executes the same set of instructions or equations based on the provided configuration to minimize the objective function.

The `run_pipeline` function will look similar to the following:

```python
def run_pipeline(
pipeline_directory, # The directory where the config is saved
previous_pipeline_directory, # The directory of the immediate lower fidelity config
**config, # The hyperparameters to be used in the pipeline
):

element_1 = config[element_1]
element_2 = config[element_2]
element_3 = config[element_3]

loss = element_1 - element_2 + element_3

return loss
```

The `run_pipeline` function should be replaced with the user's specific objective function. It is invoked by `neps.run` without any arguments, as these arguments are automatically handled by NePS. Additionally, NePS provides the pipeline directory and the previous pipeline directory for user convenience (mainly useful for searches that require fidelities).

Have a look at our examples and templates [here](https://github.com/automl/neps/tree/master/neps_examples) to see how we use this function in different scenarios.

Loading

0 comments on commit d6ce79a

Please sign in to comment.