Skip to content

muhammedtalo/Top_DTI

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Top-DTI: Integrating Topological Deep Learning and Large Language Models for Drug Target Interaction Prediction

Top-DTI Overview


Top-DTI

We propose Top-DTI framework for predicting Drug-Target Interaction (DTI) by integrating Topological Data Analysis (TDA) and Large Language Models (LLMs). Top-DTI leverages Persistent Homology (PH) to extract topological features from protein contact maps and drug molecular images. Simultaneously, protein and drug LLMs generate semantically rich embeddings that capture sequential and contextual information from protein sequences and drug SMILES strings. TDA and LLM embeddings are combined through a learnable fusion mechanism that dynamically balances the contributions of topological and sequence-based features. The integrated representations are then fed into a heterogeneous Graph Neural Network (GNN) to learn relational information from the DTI network. Finally, the embeddings learned from the GNN are used to train a multilayer perceptron (MLP) classifier to predict DTIs.

How to Run Top-DTI

  1. Generate 2D Representations
    Generate two-dimensional representations of drug molecular structures and protein contact maps to capture structural features:

    • Drug Images: Generated from SMILES using the RDKit library.
    • Protein Contact Maps: Created using a transformer-based contact prediction model.

    Run Notebook: 01_generate_images.ipynb

  2. Extract Topological Features
    Extract topological features from drug molecular images and protein contact maps using Persistent Homology.

    Run Notebook: 02_topological_features.ipynb

  3. Generate Sequence-Based Embeddings
    Capture sequence-based features using LLMs:

    Run Notebook: 03_LLM_embeddigns.ipynb

  4. Top-DTI Evaluation and Results

    The embeddings generated from Step 2: Topological Features and Step 3: Sequence-Based Embeddings are utilized to evaluate the performance of Top-DTI on benchmark datasets:

Datasets

The public benchmark datasets are available in the datasets folder for direct access.

  • BioSNAP and Human datasets were obtained from DrugLAMP repository.
  • BioSNAP Unseen Drug and BioSNAP Unseen Target datasets were sourced from ConPLex_dev repository.

Installation Guide

The environment.yml file and requirements.txt are provided in the main repository for your convenience.

The provided environment.yml file can be used to create the required Conda environment as follows:

conda env create -f environment.yml
conda activate top_dti

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%