Hitting the Target: Stopping Active Learning at the Cost-Based Optimum

This repository contains the code for the paper 'Hitting the Target: Stopping Active Learning at the Cost-Based Optimum'.

A preprint is available on arXiv.

Supplementary information is included in this repository.

Installation

Necessary dependencies can be installed with:

$ pip install poetry
$ poetry install

Reproducing

There are four parts to running the results:

Configuration
Perform the active learning runs
Evaluate the stopping criteria
Produce the figures and other summary results

Configuration

Create a .env file with two keys, DATASET_DIR referring to the location to store the datasets (~1.4GB) and OUT_DIR referring to the place to record the results (~1TB compressed).

For example:

DATASET_DIR=/home/user/datasets
OUT_DIR=/home/user/out

Active Learning Runs

Running the active learning process is time consuming and computationally expensive. For the paper a dedicated 72 core machine was used for the SVM results while the random forest and neural network results were computed on NeSI.

nesi_base2.py is responsible for running active learning, the first parameter is the start of the experiment index to run, the second is the length of experiments to run, and the last is range of seeds (different splits) to run. To run all of the results found in the paper run:

$ poetry run nesi_base2.py 0 26 0-30

Note that this will likely take upwards of a week even on a powerful machine.

Evaluating Stopping Criteria

Evaluating stopping criteria given the above results is significantly faster. To evaluate stopping criteria for all of the runs computed in the previous step run:

$ poetry run stop_eval.py 0 26 0-30 --jobs=<N_CPUS>

Unlike the prior command this does not autodetect the number of CPUs and defaults to 20, so specify an appropriate value for your machine. On a 72 core machine this took approximately three days.

Produce Summary Figures

To produce the figures and other summary results used in the paper first register the kernel, then start a notebook server:

$ jupyter lab

From here run plots_svm.ipynb, plots_random_forest.ipynb, and plots_neural_network.ipynb to produce the summary results.

License

All datasets are the property of their respective owners and are not redistributed with this repository.

Unless otherwise specified all code, including notebooks, are licensed under GPL v2. The text of this license can be found in LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 209 Commits
.github/workflows		.github/workflows
Imitate		Imitate
tests		tests
tvregdiff @ 80c3ce4		tvregdiff @ 80c3ce4
.gitconfig		.gitconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
Supplementary Information.pdf		Supplementary Information.pdf
__init__.py		__init__.py
libactive.py		libactive.py
libadversarial.py		libadversarial.py
libconfig.py		libconfig.py
libdatasets.py		libdatasets.py
libmutators.py		libmutators.py
libplot.py		libplot.py
libregionplot.py		libregionplot.py
librun.py		librun.py
libstop.py		libstop.py
libstore.py		libstore.py
libutil.py		libutil.py
modal_learner.py		modal_learner.py
nesi_base2.py		nesi_base2.py
plots_neural_network.ipynb		plots_neural_network.ipynb
plots_random_forest.ipynb		plots_random_forest.ipynb
plots_svm.ipynb		plots_svm.ipynb
plt_style.txt		plt_style.txt
poetry.lock		poetry.lock
progress.py		progress.py
pyproject.toml		pyproject.toml
stop_eval.py		stop_eval.py

License

zacps/al-stopping-conditions

Folders and files

Latest commit

History

Repository files navigation

Hitting the Target: Stopping Active Learning at the Cost-Based Optimum

Installation

Reproducing

Configuration

Active Learning Runs

Evaluating Stopping Criteria

Produce Summary Figures

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages