Ersilia's LazyQSAR

A library to build QSAR models fastly.

Installation

Install LazyQSAR from source:

git clone https://github.com/ersilia-os/lazy-qsar.git
cd lazy-qsar
python -m pip install -e .

Usage

Choose one of the available descriptors of small molecules.
Fit a model using AutoML. LazyQSAR will search several hyperparametrs.
Get the validation of the model on the test set.

Example for binary classifications

Get the data

You can find example data in the fantastic Therapeutic Data Commons portal.

from tdc.single_pred import Tox
data = Tox(name = 'hERG')
split = data.get_split()

Here we are selecting the hERG blockade toxicity dataset. Let's refactor data for convenience.

# refactor fetched data in a convenient format
smiles_train = list(split["train"]["Drug"])
y_train = list(split["train"]["Y"])
smiles_valid = list(split["valid"]["Drug"])
y_valid = list(split["valid"]["Y"])

Build a model

Now we can train a model based on Morgan fingerprints.

import lazyqsar as lq

model = lq.LazyBinaryQSAR(descriptor_type="morgan", model_type="xgboost") 
model.fit(smiles_list=smiles_train, y=y_train)
model.save_model(model_dir="my_model")

Validate its performance

from sklearn.metrics import roc_curve, auc
y_hat = model.predict_proba(smiles_valid)[:,1]
fpr, tpr, _ = roc_curve(y_valid, y_hat)
print("AUROC", auc(fpr, tpr))

Example for Regressions

In the current version of LazyQSAR regression is not yet implemented...

Get the data

You can find example data in the fantastic Therapeutic Data Commons portal.

from tdc.single_pred import Tox
data = Tox(name = 'LD50_Zhu')
split = data.get_split()

Here we are selecting the Acute Toxicity dataset. Let's refactor data for convenience.

# refactor fetched data in a convenient format
smiles_train = list(split["train"]["Drug"])
y_train = list(split["train"]["Y"])
smiles_valid = list(split["valid"]["Drug"])
y_valid = list(split["valid"]["Y"])

Build a model

Now we can train a model based on Morgan fingerprints.

import lazyqsar as lq

model = lq.MorganRegressor() 
# time_budget (in seconds) and estimator_list can be passed as parameters of the regressor. Defaults to 20s and all the available estimators in FLAML.
model.fit(smiles_train, y_train)

Validate its performance

from sklearn.metrics import mean_absolute_error, r2_score
y_hat = model.predict(smiles_valid)
mae = mean_absolute_error(y_valid, y_hat)
r2 = r2_score(y_valid, y_hat)
print("MAE", mae, "R2", r2)

Benchmark

The pipeline has been validated using the Therapeutic Data Commons ADMET datasets. More information about its results can be found in the /benchmark folder.

Disclaimer

This library is only intended for quick-and-dirty QSAR modeling. For a more complete automated QSAR modeling, please refer to Zaira Chem

About us

Learn about the Ersilia Open Source Initiative!

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
benchmark		benchmark
lazyqsar		lazyqsar
test_model		test_model
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Ersilia's LazyQSAR

Installation

Usage

Example for binary classifications

Get the data

Build a model

Validate its performance

Example for Regressions

Get the data

Build a model

Validate its performance

Benchmark

Disclaimer

About us

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

ersilia-os/lazy-qsar

Folders and files

Latest commit

History

Repository files navigation

Ersilia's LazyQSAR

Installation

Usage

Example for binary classifications

Get the data

Build a model

Validate its performance

Example for Regressions

Get the data

Build a model

Validate its performance

Benchmark

Disclaimer

About us

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages