Official implementation of ChemSpacE, an efficient and effective approach to explore chemical space via aligning the latent space of molecular generative models, by Yuanqi Du, Xian Liu, Nilay Mahesh Shah, Shengchao Liu, Jieyu Zhang, Bolei Zhou.
Anaconda is recommended for managing the enviroment for this project. Create an environment for this project with:
conda env create -f environment.yml
Note: Examples are provided for the ZINC dataset. Files and parameters can be substituted to work with the qm9 dataset.
Generate molecular graphs from SMILES strings:
cd data
python data_preprocess.py --data_name zinc250k
https://bit.ly/chemspace-models
Trained MoFlow models were obtained from https://github.com/calvin-zcx/moflow.
python chemspace.py --data_name zinc250k --random
python train_boundary_zinc.py
python chemspace.py --data_name zinc250k --traverse
python chemspace.py --data_name zinc250k --multi_property --traverse
python calculate_statistics_single_prop.py --mani_range 1
python calculate_statistics_multi_prop.py --mani_range 1
python optimize_property_chemspace.py --data_name zinc250k --property_name gsk3b --save_path gsk3b_0.6_range30 --topscore
python optimize_property_chemspace.py --data_name zinc250k --property_name qed_plogp --save_path qed_plogp_0.6_range30 --multi_property --topscore
python optimize_property_chemspace.py --data_name zinc250k --path_range 30 --sim_cutoff 0.6 --property_name gsk3b --save_path gsk3b_0.6_range30 --consopt
python optimize_property_chemspace.py --data_name zinc250k --path_range 30 --sim_cutoff 0.6 --property_name qed_plogp --save_path qed_plogp_0.6_range30 --multi_property --consopt
@article{
du2023chemspace,
title={ChemSpacE: Interpretable and Interactive Chemical Space Exploration},
author={Yuanqi Du and Xian Liu and Nilay Mahesh Shah and Shengchao Liu and Jieyu Zhang and Bolei Zhou},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2023},
url={https://openreview.net/forum?id=C1Xl8dYCBn},
note={}
}