Skip to content

Commit e4863fe

Browse files
Final changes for v0.1.0 (#341)
* [enhance] Increase the coverage (#336) * [feat] Support statistics print by adding results manager object (#334) * [feat] Support statistics print by adding results manager object * [refactor] Make SearchResults extract run_history at __init__ Since the search results should not be kept in eternally, I made this class to take run_history in __init__ so that we can implicitly call extraction inside. From this change, the call of extraction from outside is not recommended. However, you can still call it from outside and to prevent mixup of the environment, self.clear() will be called. * [fix] Separate those changes into PR#336 * [fix] Fix so that test_loss includes all the metrics * [enhance] Strengthen the test for sprint and SearchResults * [fix] Fix an issue in documentation * [enhance] Increase the coverage * [refactor] Separate the test for results_manager to organize the structure * [test] Add the test for get_incumbent_Result * [test] Remove the previous test_get_incumbent and see the coverage * [fix] [test] Fix reversion of metric and strengthen the test cases * [fix] Fix flake8 issues and increase coverage * [fix] Address Ravin's comments * [enhance] Increase the coverage * [fix] Fix a flake8 issu * Update for release (#335) * Create release workflow and CITATION.cff and update README, setup.py * fix bug in pypy token * fix documentation formatting * TODO for docker image * accept suggestions from shuhei * add further options for disable_file_output documentation * remove from release.yml * [feat] Add templates for issue and PR with the Ravin's suggestions (#136) * [doc] Add the workflow of the Auto-Pytorch (#285) * [doc] Add workflow of the AutoPytorch * [doc] Address Ravin's comment * [FIX] Silence catboost (#338) * set verbose=False in catboost * fix flake * change worst possible result of r2 (#340) * Update README.md with link for master branch * [FIX formatting in docs (#342) * fix formatting in docs * Update examples/40_advanced/example_resampling_strategy.py * Update README.md, remove cat requirements.txt Co-authored-by: nabenabe0928 <[email protected]>
1 parent a1512d5 commit e4863fe

28 files changed

+3018
-259
lines changed

.github/ISSUE_TEMPLATE.md

+48
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
NOTE: ISSUES ARE NOT FOR CODE HELP - Ask for Help at https://stackoverflow.com
2+
3+
Your issue may already be reported!
4+
Also, please search on the [issue tracker](../) before creating one.
5+
6+
* **I'm submitting a ...**
7+
- [ ] bug report
8+
- [ ] feature request
9+
- [ ] support request => Please do not submit support request here, see note at the top of this template.
10+
11+
# Issue Description
12+
* When Issue Happens
13+
* Steps To Reproduce
14+
1.
15+
1.
16+
1.
17+
18+
## Expected Behavior
19+
<!--- If you're describing a bug, tell us what should happen -->
20+
<!--- If you're suggesting a change/improvement, tell us how it should work -->
21+
22+
## Current Behavior
23+
<!--- If describing a bug, tell us what happens instead of the expected behavior -->
24+
<!--- If suggesting a change/improvement, explain the difference from current behavior -->
25+
26+
## Possible Solution
27+
<!--- Not obligatory, but suggest a fix/reason for the bug, -->
28+
<!--- or ideas how to implement the addition or change -->
29+
30+
## Your Code
31+
32+
```
33+
If relevant, paste all of your challenge code here
34+
```
35+
36+
## Error message
37+
38+
```
39+
If relevant, paste all of your error messages here
40+
```
41+
42+
## Your Local environment
43+
* Operating System, version
44+
* Python, version
45+
* Outputs of `pip freeze` or `conda list`
46+
47+
Make sure to add **all the information needed to understand the bug** so that someone can help.
48+
If the info is missing, we'll add the 'Needs more information' label and close the issue until there is enough information.

.github/PULL_REQUEST_TEMPLATE.md

+38
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
<!--- Provide a general summary of your changes in the Title above -->
2+
3+
## Types of changes
4+
<!--- What types of changes does your code introduce? Put an `x` in all the boxes that apply: -->
5+
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
6+
- [ ] Bug fix (non-breaking change which fixes an issue)
7+
- [ ] New feature (non-breaking change which adds functionality)
8+
9+
Note that a Pull Request should only contain one of refactoring, new features or documentation changes.
10+
Please separate these changes and send us individual PRs for each.
11+
For more information on how to create a good pull request, please refer to [The anatomy of a perfect pull request](https://medium.com/@hugooodias/the-anatomy-of-a-perfect-pull-request-567382bb6067).
12+
13+
## Checklist:
14+
<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
15+
<!--- If you're unsure about any of these, don't hesitate to ask. We're here to help! -->
16+
- [ ] My code follows the code style of this project.
17+
- [ ] My change requires a change to the documentation.
18+
- [ ] I have updated the documentation accordingly.
19+
* [ ] Have you checked to ensure there aren't other open [Pull Requests](../../../pulls) for the same update/change?
20+
* [ ] Have you added an explanation of what your changes do and why you'd like us to include them?
21+
* [ ] Have you written new tests for your core changes, as applicable?
22+
* [ ] Have you successfully ran tests with your changes locally?
23+
<!--
24+
* [ ] Have you followed the guidelines in our Contributing document?
25+
-->
26+
27+
28+
## Description
29+
<!--- Describe your changes in detail -->
30+
31+
## Motivation and Context
32+
<!--- Why is this change required? What problem does it solve? -->
33+
<!--- If it fixes an open issue, please link to the issue here. -->
34+
35+
## How has this been tested?
36+
<!--- Please describe in detail how you tested your changes. -->
37+
<!--- Include details of your testing environment, tests ran to see how -->
38+
<!--- your change affects other areas of the code, etc. -->

.github/workflows/release.yml

+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
name: Push to PyPi
2+
3+
on:
4+
push:
5+
branches:
6+
- master
7+
8+
jobs:
9+
test:
10+
runs-on: "ubuntu-latest"
11+
12+
steps:
13+
- name: Checkout source
14+
uses: actions/checkout@v2
15+
16+
- name: Set up Python 3.8
17+
uses: actions/setup-python@v1
18+
with:
19+
python-version: 3.8
20+
21+
- name: Install build dependencies
22+
run: python -m pip install build wheel
23+
24+
- name: Build distributions
25+
shell: bash -l {0}
26+
run: python setup.py sdist bdist_wheel
27+
28+
- name: Publish package to PyPI
29+
if: github.repository == 'automl/Auto-PyTorch' && github.event_name == 'push' && startsWith(github.ref, 'refs/tags')
30+
uses: pypa/gh-action-pypi-publish@master
31+
with:
32+
user: __token__
33+
password: ${{ secrets.pypi_token }}

CITATION.cff

+19
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
preferred-citation:
2+
type: article
3+
authors:
4+
- family-names: "Zimmer"
5+
given-names: "Lucas"
6+
affiliation: "University of Freiburg, Germany"
7+
- family-names: "Lindauer"
8+
given-names: "Marius"
9+
affiliation: "University of Freiburg, Germany"
10+
- family-names: "Hutter"
11+
given-names: "Frank"
12+
affiliation: "University of Freiburg, Germany"
13+
doi: "10.1109/TPAMI.2021.3067763"
14+
journal-title: "IEEE Transactions on Pattern Analysis and Machine Intelligence"
15+
title: "Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL"
16+
year: 2021
17+
note: "also available under https://arxiv.org/abs/2006.13799"
18+
start: 3079
19+
end: 3090

README.md

+81-10
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,42 @@
11
# Auto-PyTorch
22

3-
Copyright (C) 2019 [AutoML Group Freiburg](http://www.automl.org/)
3+
Copyright (C) 2021 [AutoML Groups Freiburg and Hannover](http://www.automl.org/)
44

5-
This an alpha version of Auto-PyTorch with improved API.
6-
So far, Auto-PyTorch supports tabular data (classification, regression).
7-
We plan to enable image data and time-series data.
5+
While early AutoML frameworks focused on optimizing traditional ML pipelines and their hyperparameters, another trend in AutoML is to focus on neural architecture search. To bring the best of these two worlds together, we developed **Auto-PyTorch**, which jointly and robustly optimizes the network architecture and the training hyperparameters to enable fully automated deep learning (AutoDL).
86

7+
Auto-PyTorch is mainly developed to support tabular data (classification, regression).
8+
The newest features in Auto-PyTorch for tabular data are described in the paper ["Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL"](https://arxiv.org/abs/2006.13799) (see below for bibtex ref).
9+
Also, find the documentation [here](https://automl.github.io/Auto-PyTorch/master).
910

10-
Find the documentation [here](https://automl.github.io/Auto-PyTorch/development)
11+
***From v0.1.0, AutoPyTorch has been updated to further improve usability, robustness and efficiency by using SMAC as the underlying optimization package as well as changing the code structure. Therefore, moving from v0.0.2 to v0.1.0 will break compatibility.
12+
In case you would like to use the old API, you can find it at [`master_old`](https://github.com/automl/Auto-PyTorch/tree/master-old).***
1113

14+
## Workflow
15+
16+
The rough description of the workflow of Auto-Pytorch is drawn in the following figure.
17+
18+
<img src="figs/apt_workflow.png" width="500">
19+
20+
In the figure, **Data** is provided by user and
21+
**Portfolio** is a set of configurations of neural networks that work well on diverse datasets.
22+
The current version only supports the *greedy portfolio* as described in the paper *Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL*
23+
This portfolio is used to warm-start the optimization of SMAC.
24+
In other words, we evaluate the portfolio on a provided data as initial configurations.
25+
Then API starts the following procedures:
26+
1. **Validate input data**: Process each data type, e.g. encoding categorical data, so that Auto-Pytorch can handled.
27+
2. **Create dataset**: Create a dataset that can be handled in this API with a choice of cross validation or holdout splits.
28+
3. **Evaluate baselines** *1: Train each algorithm in the predefined pool with a fixed hyperparameter configuration and dummy model from `sklearn.dummy` that represents the worst possible performance.
29+
4. **Search by [SMAC](https://github.com/automl/SMAC3)**:\
30+
a. Determine budget and cut-off rules by [Hyperband](https://jmlr.org/papers/volume18/16-558/16-558.pdf)\
31+
b. Sample a pipeline hyperparameter configuration *2 by SMAC\
32+
c. Update the observations by obtained results\
33+
d. Repeat a. -- c. until the budget runs out
34+
5. Build the best ensemble for the provided dataset from the observations and [model selection of the ensemble](https://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml04.icdm06long.pdf).
35+
36+
*1: Baselines are a predefined pool of machine learning algorithms, e.g. LightGBM and support vector machine, to solve either regression or classification task on the provided dataset
37+
38+
*2: A pipeline hyperparameter configuration specifies the choice of components, e.g. target algorithm, the shape of neural networks, in each step and
39+
(which specifies the choice of components in each step and their corresponding hyperparameters.
1240

1341
## Installation
1442

@@ -25,14 +53,57 @@ We recommend using Anaconda for developing as follows:
2553
git submodule update --init --recursive
2654

2755
# Create the environment
28-
conda create -n autopytorch python=3.8
29-
conda activate autopytorch
56+
conda create -n auto-pytorch python=3.8
57+
conda activate auto-pytorch
3058
conda install swig
31-
cat requirements.txt | xargs -n 1 -L 1 pip install
3259
python setup.py install
3360

3461
```
3562

63+
## Examples
64+
65+
In a nutshell:
66+
67+
```py
68+
from autoPyTorch.api.tabular_classification import TabularClassificationTask
69+
70+
# data and metric imports
71+
import sklearn.model_selection
72+
import sklearn.datasets
73+
import sklearn.metrics
74+
X, y = sklearn.datasets.load_digits(return_X_y=True)
75+
X_train, X_test, y_train, y_test = \
76+
sklearn.model_selection.train_test_split(X, y, random_state=1)
77+
78+
# initialise Auto-PyTorch api
79+
api = TabularClassificationTask()
80+
81+
# Search for an ensemble of machine learning algorithms
82+
api.search(
83+
X_train=X_train,
84+
y_train=y_train,
85+
X_test=X_test,
86+
y_test=y_test,
87+
optimize_metric='accuracy',
88+
total_walltime_limit=300,
89+
func_eval_time_limit_secs=50
90+
)
91+
92+
# Calculate test accuracy
93+
y_pred = api.predict(X_test)
94+
score = api.score(y_pred, y_test)
95+
print("Accuracy score", score)
96+
```
97+
98+
For more examples including customising the search space, parellising the code, etc, checkout the `examples` folder
99+
100+
```sh
101+
$ cd examples/
102+
```
103+
104+
105+
Code for the [paper](https://arxiv.org/abs/2006.13799) is available under `examples/ensemble` in the [TPAMI.2021.3067763](https://github.com/automl/Auto-PyTorch/tree/TPAMI.2021.3067763`) branch.
106+
36107
## Contributing
37108

38109
If you want to contribute to Auto-PyTorch, clone the repository and checkout our current development branch
@@ -63,8 +134,8 @@ Please refer to the branch `TPAMI.2021.3067763` to reproduce the paper *Auto-PyT
63134
title = {Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL},
64135
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
65136
year = {2021},
66-
note = {IEEE early access; also available under https://arxiv.org/abs/2006.13799},
67-
pages = {1-12}
137+
note = {also available under https://arxiv.org/abs/2006.13799},
138+
pages = {3079 - 3090}
68139
}
69140
```
70141

0 commit comments

Comments
 (0)