ActiveDelta is an adaptive active learning approach that leverages paired molecular representations to predict molecular improvements from the best current training compound to prioritize molecules for data aquisition.
For more information, please refer to the associated publication
If you use this data or code, please kindly cite: Fralish, Z. & Reker, D. (2024). Finding the most potent compounds using active learning on molecular pairs. Beilstein J. Org. Chem. 20, 2152-2162
We would like to thank the Chemprop, XGBoost, and the Scikit-learn developers for making their machine learning algorithms publicly available.
Base Machine Learning Models
Given the larger size of delta datasets, we recommend using a GPU for significantly faster training.
To use ChemProp with GPUs, you will need:
- cuda >= 8.0
- cuDNN
Python code for evaluating ActiveDelta and traditional approaches based on their ability to identify the most potent leads during exploitative active learning.
99 curated benchmarking training and test sets from the SIMPD publication and 3 random splits of the training data used for our exploitative active learning evaluations.
Results from exploitative active learning.
The copyrights of the software are owned by Duke University. As such, two licenses for this software are offered:
- An open-source license under the GPLv2 license for non-commercial academic use.
- A custom license with Duke University, for commercial use or uses without the GPLv2 license restrictions.