You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* [enhance] Increase the coverage (#336)
* [feat] Support statistics print by adding results manager object (#334)
* [feat] Support statistics print by adding results manager object
* [refactor] Make SearchResults extract run_history at __init__
Since the search results should not be kept in eternally,
I made this class to take run_history in __init__ so that
we can implicitly call extraction inside.
From this change, the call of extraction from outside is not recommended.
However, you can still call it from outside and to prevent mixup of
the environment, self.clear() will be called.
* [fix] Separate those changes into PR#336
* [fix] Fix so that test_loss includes all the metrics
* [enhance] Strengthen the test for sprint and SearchResults
* [fix] Fix an issue in documentation
* [enhance] Increase the coverage
* [refactor] Separate the test for results_manager to organize the structure
* [test] Add the test for get_incumbent_Result
* [test] Remove the previous test_get_incumbent and see the coverage
* [fix] [test] Fix reversion of metric and strengthen the test cases
* [fix] Fix flake8 issues and increase coverage
* [fix] Address Ravin's comments
* [enhance] Increase the coverage
* [fix] Fix a flake8 issu
* Update for release (#335)
* Create release workflow and CITATION.cff and update README, setup.py
* fix bug in pypy token
* fix documentation formatting
* TODO for docker image
* accept suggestions from shuhei
* add further options for disable_file_output documentation
* remove from release.yml
* [feat] Add templates for issue and PR with the Ravin's suggestions (#136)
* [doc] Add the workflow of the Auto-Pytorch (#285)
* [doc] Add workflow of the AutoPytorch
* [doc] Address Ravin's comment
* [FIX] Silence catboost (#338)
* set verbose=False in catboost
* fix flake
* change worst possible result of r2 (#340)
* Update README.md with link for master branch
* [FIX formatting in docs (#342)
* fix formatting in docs
* Update examples/40_advanced/example_resampling_strategy.py
* Update README.md, remove cat requirements.txt
Co-authored-by: nabenabe0928 <[email protected]>
<!--- Provide a general summary of your changes in the Title above -->
2
+
3
+
## Types of changes
4
+
<!--- What types of changes does your code introduce? Put an `x` in all the boxes that apply: -->
5
+
-[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
6
+
-[ ] Bug fix (non-breaking change which fixes an issue)
7
+
-[ ] New feature (non-breaking change which adds functionality)
8
+
9
+
Note that a Pull Request should only contain one of refactoring, new features or documentation changes.
10
+
Please separate these changes and send us individual PRs for each.
11
+
For more information on how to create a good pull request, please refer to [The anatomy of a perfect pull request](https://medium.com/@hugooodias/the-anatomy-of-a-perfect-pull-request-567382bb6067).
12
+
13
+
## Checklist:
14
+
<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
15
+
<!--- If you're unsure about any of these, don't hesitate to ask. We're here to help! -->
16
+
-[ ] My code follows the code style of this project.
17
+
-[ ] My change requires a change to the documentation.
18
+
-[ ] I have updated the documentation accordingly.
19
+
*[ ] Have you checked to ensure there aren't other open [Pull Requests](../../../pulls) for the same update/change?
20
+
*[ ] Have you added an explanation of what your changes do and why you'd like us to include them?
21
+
*[ ] Have you written new tests for your core changes, as applicable?
22
+
*[ ] Have you successfully ran tests with your changes locally?
23
+
<!--
24
+
* [ ] Have you followed the guidelines in our Contributing document?
25
+
-->
26
+
27
+
28
+
## Description
29
+
<!--- Describe your changes in detail -->
30
+
31
+
## Motivation and Context
32
+
<!--- Why is this change required? What problem does it solve? -->
33
+
<!--- If it fixes an open issue, please link to the issue here. -->
34
+
35
+
## How has this been tested?
36
+
<!--- Please describe in detail how you tested your changes. -->
37
+
<!--- Include details of your testing environment, tests ran to see how -->
38
+
<!--- your change affects other areas of the code, etc. -->
Copy file name to clipboardExpand all lines: README.md
+81-10
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,42 @@
1
1
# Auto-PyTorch
2
2
3
-
Copyright (C) 2019[AutoML Group Freiburg](http://www.automl.org/)
3
+
Copyright (C) 2021[AutoML Groups Freiburg and Hannover](http://www.automl.org/)
4
4
5
-
This an alpha version of Auto-PyTorch with improved API.
6
-
So far, Auto-PyTorch supports tabular data (classification, regression).
7
-
We plan to enable image data and time-series data.
5
+
While early AutoML frameworks focused on optimizing traditional ML pipelines and their hyperparameters, another trend in AutoML is to focus on neural architecture search. To bring the best of these two worlds together, we developed **Auto-PyTorch**, which jointly and robustly optimizes the network architecture and the training hyperparameters to enable fully automated deep learning (AutoDL).
8
6
7
+
Auto-PyTorch is mainly developed to support tabular data (classification, regression).
8
+
The newest features in Auto-PyTorch for tabular data are described in the paper ["Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL"](https://arxiv.org/abs/2006.13799) (see below for bibtex ref).
9
+
Also, find the documentation [here](https://automl.github.io/Auto-PyTorch/master).
9
10
10
-
Find the documentation [here](https://automl.github.io/Auto-PyTorch/development)
11
+
***From v0.1.0, AutoPyTorch has been updated to further improve usability, robustness and efficiency by using SMAC as the underlying optimization package as well as changing the code structure. Therefore, moving from v0.0.2 to v0.1.0 will break compatibility.
12
+
In case you would like to use the old API, you can find it at [`master_old`](https://github.com/automl/Auto-PyTorch/tree/master-old).***
11
13
14
+
## Workflow
15
+
16
+
The rough description of the workflow of Auto-Pytorch is drawn in the following figure.
17
+
18
+
<imgsrc="figs/apt_workflow.png"width="500">
19
+
20
+
In the figure, **Data** is provided by user and
21
+
**Portfolio** is a set of configurations of neural networks that work well on diverse datasets.
22
+
The current version only supports the *greedy portfolio* as described in the paper *Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL*
23
+
This portfolio is used to warm-start the optimization of SMAC.
24
+
In other words, we evaluate the portfolio on a provided data as initial configurations.
25
+
Then API starts the following procedures:
26
+
1.**Validate input data**: Process each data type, e.g. encoding categorical data, so that Auto-Pytorch can handled.
27
+
2.**Create dataset**: Create a dataset that can be handled in this API with a choice of cross validation or holdout splits.
28
+
3.**Evaluate baselines***1: Train each algorithm in the predefined pool with a fixed hyperparameter configuration and dummy model from `sklearn.dummy` that represents the worst possible performance.
29
+
4.**Search by [SMAC](https://github.com/automl/SMAC3)**:\
30
+
a. Determine budget and cut-off rules by [Hyperband](https://jmlr.org/papers/volume18/16-558/16-558.pdf)\
31
+
b. Sample a pipeline hyperparameter configuration *2 by SMAC\
32
+
c. Update the observations by obtained results\
33
+
d. Repeat a. -- c. until the budget runs out
34
+
5. Build the best ensemble for the provided dataset from the observations and [model selection of the ensemble](https://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml04.icdm06long.pdf).
35
+
36
+
*1: Baselines are a predefined pool of machine learning algorithms, e.g. LightGBM and support vector machine, to solve either regression or classification task on the provided dataset
37
+
38
+
*2: A pipeline hyperparameter configuration specifies the choice of components, e.g. target algorithm, the shape of neural networks, in each step and
39
+
(which specifies the choice of components in each step and their corresponding hyperparameters.
12
40
13
41
## Installation
14
42
@@ -25,14 +53,57 @@ We recommend using Anaconda for developing as follows:
from autoPyTorch.api.tabular_classification import TabularClassificationTask
69
+
70
+
# data and metric imports
71
+
import sklearn.model_selection
72
+
import sklearn.datasets
73
+
import sklearn.metrics
74
+
X, y = sklearn.datasets.load_digits(return_X_y=True)
75
+
X_train, X_test, y_train, y_test = \
76
+
sklearn.model_selection.train_test_split(X, y, random_state=1)
77
+
78
+
# initialise Auto-PyTorch api
79
+
api = TabularClassificationTask()
80
+
81
+
# Search for an ensemble of machine learning algorithms
82
+
api.search(
83
+
X_train=X_train,
84
+
y_train=y_train,
85
+
X_test=X_test,
86
+
y_test=y_test,
87
+
optimize_metric='accuracy',
88
+
total_walltime_limit=300,
89
+
func_eval_time_limit_secs=50
90
+
)
91
+
92
+
# Calculate test accuracy
93
+
y_pred = api.predict(X_test)
94
+
score = api.score(y_pred, y_test)
95
+
print("Accuracy score", score)
96
+
```
97
+
98
+
For more examples including customising the search space, parellising the code, etc, checkout the `examples` folder
99
+
100
+
```sh
101
+
$ cd examples/
102
+
```
103
+
104
+
105
+
Code for the [paper](https://arxiv.org/abs/2006.13799) is available under `examples/ensemble` in the [TPAMI.2021.3067763](https://github.com/automl/Auto-PyTorch/tree/TPAMI.2021.3067763`) branch.
106
+
36
107
## Contributing
37
108
38
109
If you want to contribute to Auto-PyTorch, clone the repository and checkout our current development branch
@@ -63,8 +134,8 @@ Please refer to the branch `TPAMI.2021.3067763` to reproduce the paper *Auto-PyT
63
134
title = {Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL},
64
135
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
65
136
year = {2021},
66
-
note = {IEEE early access; also available under https://arxiv.org/abs/2006.13799},
67
-
pages = {1-12}
137
+
note = {also available under https://arxiv.org/abs/2006.13799},
0 commit comments