Skip to content

uncharted-distil/distil-auto-ml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Distil Auto ML

Distil Auto ML is an AutoML system that integrates with D3M

More specifically it is the TA2 system from Uncharted and Qntfy

Main repo is https://github.com/uncharted-distil/distil-auto-ml

Quickstart using Docker

The TA2 system can be built and started via docker-compose however several static files must be downloaded before hand.

Datasets to train on. These may be user created or many examples can be downloaded from https://datasets.datadrivendiscovery.org/d3m/datasets

To train only using the TA2 user generated datasets must be formatted in the same way as the public datasets

Static Files may be pretrained weights of a neural network model, or a simple dictionary mapping tokens to necessary ids. Pretty much anything extra needed to run a ML model within the pipelines.

To bulk download all static files within the D3M universe WARNING this may be quite large

docker-compose run distil bash 
# cd /static && python3 -m d3m index download

One can also pick and choose which static files they wish to download via

python3 -m d3m primitive download -p d3m.primitives.path.of.Primitive -o /static

For more info on how static files integrate within D3M: https://datadrivendiscovery.org/v2020.11.3/tutorial.html#advanced-primitive-with-static-files

Once the static files and the dataset(s) you want to run on are downloaded

# symlink your datasets directory 
ln -s ../datasets/seed_datasets_current seed_datasets_current`

# choose the dataset you want to run 
export DATASET=185_baseball

# run it
docker-compose up distil

There are two testing TA3 systems also available via docker-compose:

# run the dummy-ta3 test suite
docker-compose up distil dummy-ta3

# run the simple-ta3 system, which will then be available in the browser at localhost:80
# this requires a directory named 'output' to exist, in addition to the seed_datasets_current directory
docker-compose up distil envoy simple-ta3

Development

Running From Source

Requirements:

  1. Python 3.6
  2. Pip (Python 3.6 should come with it)
  3. virtualvenv

Instructions on setting up to run from source:

  • Clone distil-auto-ml
git clone https://github.com/uncharted-distil/distil-auto-ml
  • Install libraries on Linux
sudo apt-get install snappy-dev build-essential libopenblas-dev libcap-dev ffmpeg
  • Install libraries on MacOS
brew install snappy cmake openblas libpcap ffmpeg
  • Clone common-primitives
 git clone https://gitlab.com/datadrivendiscovery/common-primitives.git
  • Clone d3m-primitives
 git clone https://github.com/cdbethune/d3m-primitives
  • Clone d3m
 git clone https://gitlab.com/datadrivendiscovery/d3m
  • Clone distil-primitives
 git clone https://github.com/uncharted-distil/distil-primitives
  • Clone distil-primitives-contrib
 git clone https://github.com/uncharted-distil/distil-primitives-contrib
  • Change into the distil-auto-ml directory
 cd distil-auto-ml
  • To avoid package collision it is recommended to create a virtual environment
  • If virtualenv is not installed. Install virtualenv now.
 python3 -m pip install virtualenv
  • Create the environment
 python3 -m virtualenv env
  • Activate the environment
 source env/bin/activate
  • Installing through server-requirements.txt Linux
pip install -r server-requirements.txt
  • Installing through server-requirements.txt MacOS
CPPFLAGS="-I/usr/local/include -L/usr/local/lib" pip install -r server-requirements.txt
  • Install all the other repository dependencies IMPORTANT: if running on the CPU replace [gpu] with [cpu]
 cd ..
 cd d3m
 pip install -e .\[gpu\]
 cd ..
 cd common-primitives
 pip install -e .\[gpu\]
 cd ..
 cd distil-primitives
 pip install -e .\[gpu\]
 cd ..
 cd d3m-primitives
 pip install -e .\[gpu\]
 cd ..
 cd distil-primitives-contrib
 pip install -e .\[gpu\]
 pip install python-lzo hyppo==0.1.3 mxnet
 pip install -e git+https://github.com/NewKnowledge/simon-d3m-wrapper.git#egg=SimonD3MWrapper
 pip install -e git+https://gitlab.com/datadrivendiscovery/sklearn-wrap.git@dist#egg=sklearn_wrap
 pip install -e git+https://github.com/usc-isi-i2/dsbox-primitives#egg=dsbox-primitives
 pip install -e git+https://github.com/neurodata/primitives-interfaces#egg=jhu-primitives
  # if error with enum and IntFlag try pip uninstall -y enum34
  • MongoDB

Distil AutoML uses MongoDB as a backend store for it's internal hyperparameter tuning There are good instructions depending on your os from the official MongoDB Docs: https://docs.mongodb.com/manual/installation/

  • Distil-auto-ml is ready for use
 ./run.sh
  • generate pipelines
 mkdir pipelines
 python3 export_pipelines.sh
  • Use D3M CLI to interface with distil-auto-ml

Running D3M CLI Example

This section assumes the source has been successfully installed and the datasets have been downloaded. Launch d3m with the following arguments.

python3 d3m runtime -v {location/to/static_resources} -d {location/to/datasets/seed_datasets_current} fit-score 
-r {..seed_datasets_current/LL1_PHEM_Monthly_Malnutrition_MIN_METADATA/LL1_PHEM_Monthly_Malnutrition_MIN_METADATA_problem/problemDoc.json}
-i {..seeds_datasets_current/LL1_PHEM_Monthly_Malnutrition_MIN_METADATA/TRAIN/dataset_TRAIN/datasetDoc.json}
-t {..seeds_datasets_current/LL1_PHEM_Monthly_Malnutrition_MIN_METADATA/TEST/dataset_TEST/datasetDoc.json}
-a {..seeds_datasets_current/LL1_PHEM_Monthly_Malnutrition_MIN_METADATA/SCORE/dataset_SCORE/datasetDoc.json}
-p {..distil-auto-ml/pipelines/timeseries_rnn__a9cc5349-e328-401d-abb7-ada6b101e573.json}
-O {..distil-auto-ml/pipelines/timeseries_rnn__a9cc5349-e328-401d-abb7-ada6b101e573_run.yaml}

Building the Docker Container

CPU:

Building a docker image with CPU support is accomplished by invoking the docker_build.sh script:

MacOS/Linux

sudo ./docker_build.sh

Windows

Run command prompt as administrator.

./docker_build.sh

GPU:

Building a docker image with GPU support is accomplished by adding the -g flag to the docker_build.sh call:

MacOS/Linux

sudo ./docker_build.sh -g

Windows

Run command prompt as administrator.

./docker_build.sh -g

Troubleshooting Docker Image Failing to Build:

In the event that building the docker image fails and all of the above criteria has been met. One can invoke the docker_build.sh script again this time adding the -f flag. The -f flag forces the download and reinstall of all dependencies regardless of if they meet criteria. Note: if one is building for GPU support - remember the additional -g flag.

MacOS/Linux

sudo ./docker_build.sh -f

Windows

Run command prompt as administrator.

./docker_build.sh -f