GitHub

Top-DTI: Integrating Topological Deep Learning and Large Language Models for Drug Target Interaction Prediction

Top-DTI

We propose Top-DTI framework for predicting Drug-Target Interaction (DTI) by integrating Topological Data Analysis (TDA) and Large Language Models (LLMs). Top-DTI leverages Persistent Homology (PH) to extract topological features from protein contact maps and drug molecular images. Simultaneously, protein and drug LLMs generate semantically rich embeddings that capture sequential and contextual information from protein sequences and drug SMILES strings. TDA and LLM embeddings are combined through a learnable fusion mechanism that dynamically balances the contributions of topological and sequence-based features. The integrated representations are then fed into a heterogeneous Graph Neural Network (GNN) to learn relational information from the DTI network. Finally, the embeddings learned from the GNN are used to train a multilayer perceptron (MLP) classifier to predict DTIs.

How to Run Top-DTI

Generate 2D Representations
Generate two-dimensional representations of drug molecular structures and protein contact maps to capture structural features:
- Drug Images: Generated from SMILES using the RDKit library.
- Protein Contact Maps: Created using a transformer-based contact prediction model.
Run Notebook: 01_generate_images.ipynb
Extract Topological Features
Extract topological features from drug molecular images and protein contact maps using Persistent Homology.

Run Notebook: 02_topological_features.ipynb
Generate Sequence-Based Embeddings
Capture sequence-based features using LLMs:
- ProtT5: For protein sequences.
- MoLFormer: For drug representations.
Run Notebook: 03_LLM_embeddigns.ipynb
Top-DTI Evaluation and Results

The embeddings generated from Step 2: Topological Features and Step 3: Sequence-Based Embeddings are utilized to evaluate the performance of Top-DTI on benchmark datasets:

Datasets

The public benchmark datasets are available in the datasets folder for direct access.

BioSNAP and Human datasets were obtained from DrugLAMP repository.
BioSNAP Unseen Drug and BioSNAP Unseen Target datasets were sourced from ConPLex_dev repository.

Installation Guide

The environment.yml file and requirements.txt are provided in the main repository for your convenience.

The provided environment.yml file can be used to create the required Conda environment as follows:

conda env create -f environment.yml
conda activate top_dti

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
Notebooks		Notebooks
datasets		datasets
images		images
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Top-DTI: Integrating Topological Deep Learning and Large Language Models for Drug Target Interaction Prediction

Top-DTI

How to Run Top-DTI

Datasets

Installation Guide

About

Releases

Packages

Languages

License

muhammedtalo/Top_DTI

Folders and files

Latest commit

History

Repository files navigation

Top-DTI: Integrating Topological Deep Learning and Large Language Models for Drug Target Interaction Prediction

Top-DTI

How to Run Top-DTI

Datasets

Installation Guide

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages