Skip to content

Project examing sparse deep learning architectures for ligand classification.

Notifications You must be signed in to change notification settings

jkarolczak/ligand-classification

Repository files navigation

Streamlit - Demo bioRxiv - Preprint Zenodo - Data example workflow

Deep Learning Methods for Ligand Identification in Density Maps

Jacek Karolczak, Anna Przybyłowska, Konrad Szewczyk, Witold Taisner, John M. Heumann, Michael H.B. Stowell, Michał Nowicki, Dariusz Brzezinski

Accurately identifying ligands plays a crucial role in structure-guided drug design. Based on density maps from X-ray diffraction or cryogenic-sample electron microscopy (cryoEM), scientists verify whether small-molecule ligands bind to active sites. However, the interpretation of density maps is challenging, and cognitive bias can sometimes mislead investigators into modeling fictitious compounds. Ligand identification can be aided by automatic methods, but existing approaches are available only for X-ray diffraction. Here, we propose to identify ligands using a deep learning approach that treats density maps as 3D point clouds. We show that the proposed model is on par with existing methods for X-ray crystallography while also being applicable to cryoEM density maps. Our study demonstrates that electron density map fragments can be used to train models that can be applied to cryoEM structures, but also highlights challenges associated with the standardization of electron microscopy maps and the quality assessment of cryoEM ligands.

In the repository, we provide the code for the experiments conducted in the paper, including model implementations and transformations for generating datasets. To reproduce the results, use scripts from the scripts directory. Configuration files for the experiments are available in the cfg directory.

Weights of the model that was revealed as the best in the paper are published as model.pt (link).


Presented below are schematics of deep learning architectures used to predict ligands:

  1. The RiConv++ architecture with five enhanced rotation invariant convolution (RIConv++) layers.
  2. The MinkLoc3Dv2 architecture utilizing information from a pyramid of three feature maps with different receptive fields.
  3. The TransLoc3D architecture built from four modules: 3D Sparse Convolution, Adaptive Receptive Field, External Transformer, and NetVLAD.

All the architectures were modified to take as input the same sample of 2000 voxels (or less in case of ligands is described by default by smaller number of voxels) and output the probability scores of all the studied 219 ligand groups.

Deep Learning Architectures Schematics


Here are some snapshots of ligand identifications made by the proposed MinkLoc3Dv2 model.

  • (A–D) Examples of correctly predicted X-ray ligands.
  • (E) Uridine-5’-diphosphate (UDP) misclassified as uridine (URI, black dashed frame).
  • (F–I) Examples of correctly predicted cryoEM ligands.
  • (J) Heme A (HEM) misclassified as a rare ligand due to incorrect density thresholding.

Blobs Identified by MinkLoc3Dv2

Each ligand is labeled by its Chemical Component Dictionary ID, structure resolution, and (in parentheses) the PDB ID, chain, and residue number. X-ray diffraction ligands shown in green mesh based on Fo-Fc maps contoured at 2.8σ calculated after removal of solvent and other small molecules (including the ligand) from the model. CryoEM ligands depicted in pink mesh based on difference maps contoured according to the proposed automatic density thresholding method (13.642, 3.385, 17.997, 7.850, and 5.613 V for panels F–J, respectively). The white mesh in panel J shows a manually selected contour threshold of 11.000 V. Atomic coordinates were taken from the PDB deposits.


Environment setup

Docker

To simplify the setup and ensure consistency, we provide a Docker configuration that includes all necessary dependencies.

Prerequisites

Ensure you have the following installed:

Steps to Start

  1. Clone this repository.
  2. Set the necessary permissions: sudo chmod 744 ./start.sh ./stop.sh
  3. Configure the environment by editing the docker/.env file:
    • Adjust PYTORCH, CUDA, and CUDNN settings if needed (for GPU use).
    • Set the DATA_PATH to point to your data directory. Default is ../../data/.
  4. Start the container:
    • For GPU use: ./start.sh
    • For CPU use: ./start.sh cpu
  5. To stop the container:
    • For GPU use: ./stop.sh
    • For CPU use: ./stop.sh cpu

Demo

The best model from the paper can be tested without the need to install anything. The model is deployed as a Streamlit app under the link ligands.cs.put.poznan.pl.

Data

All the data necessary to reproduce results is available at Zenodo.

Repository with code for extracting ligands from CryoEM difference maps is a submodule of this repository, but can be also found here.

Additionally, the preprocessed data (uniformly sampled and max pooled 2000 points per ligand) that were used to train the final model are available here.

Citation

@article {Karolczak2024.08.27.610022,
	author = {Karolczak, Jacek and Przyby{\l}owska, Anna and Szewczyk, Konrad and Taisner, Witold and Heumann, John M. and Stowell, Michael H.B. and Nowicki, Micha{\l} and Brzezinski, Dariusz},
	title = {Ligand Identification using Deep Learning},
	elocation-id = {2024.08.27.610022},
	year = {2024},
	doi = {10.1101/2024.08.27.610022},
	publisher = {Cold Spring Harbor Laboratory},
	abstract = {Motivation Accurately identifying ligands plays a crucial role in the process of structure-guided drug design. Based on density maps from X-ray diffraction or cryogenic-sample electron microscopy (cryoEM), scientists verify whether small-molecule ligands bind to active sites of interest. However, the interpretation of density maps is challenging, and cognitive bias can sometimes mislead investigators into modeling fictitious compounds. Ligand identification can be aided by automatic methods, but existing approaches are available only for X-ray diffraction and are based on iterative fitting or feature-engineered machine learning rather than end-to-end deep learning.Results Here, we propose to identify ligands using a deep learning approach that treats density maps as 3D point clouds. We show that the proposed model is on par with existing machine learning methods for X-ray crystallography while also being applicable to cryoEM density maps. Our study demonstrates that electron density map fragments can be used to train models that can be applied to cryoEM structures, but also highlights challenges associated with the standardization of electron microscopy maps and the quality assessment of cryoEM ligands.Availability Code and model weights are available on GitHub at https://github.com/jkarolczak/ligands-classification. Datasets used for training and testing are hosted at Zenodo: 10.5281/zenodo.10908325.Contact dariusz.brzezinski{at}cs.put.poznan.plCompeting Interest StatementThe authors have declared no competing interest.},
	URL = {https://www.biorxiv.org/content/early/2024/08/28/2024.08.27.610022},
	eprint = {https://www.biorxiv.org/content/early/2024/08/28/2024.08.27.610022.full.pdf},
	journal = {bioRxiv}
}