Skip to content

Latest commit

 

History

History
138 lines (76 loc) · 14.4 KB

README.md

File metadata and controls

138 lines (76 loc) · 14.4 KB

LIGYSIS-web

LIGYSIS_prot_alph

Details on how to generate these protein alphabet figures here.

This is the repository for our ligand binding site analysis, LIGYSIS, web server. LIGYSIS is a Python Flask Web application.

DOI

This is a web resource for the analysis of ligand binding sites defined from biologically relevant protein-ligand interactions deposited on the PDBe. LIGYSIS web allows the user to explore these interactions in a 3DMol.js structure viewer and contextualise the atomic interactions with features such as evolutionary divergence, enrichment in human variation and solvent accessibility.

This is the LIGYSIS WEB logo

Dependencies

Third party dependencies include:

Other dependencies constitute standard Python libraries:

Installation

To install LIGYSIS WEB locally, you must simply create the LIGYSIS_WEB enviornment with the following command:

conda create -n LIGYSIS_WEB python numpy pandas flask

This will install the necessary libraries to locally run the LIGYSIS web app.

Alternatively, one can install the environment from the .yml file:

conda env create -f LIGYSIS_WEB.yml

Run the app

After installation, this is how you can run the LIGYSIS app locally:

# activate LIGYSIS_WEB conda environment
conda activate LIGYSIS_WEB

# run the LIGYSIS app
python app.py

We do not offer full LIGYSIS dataset download, so a local installation of the LIGYSIS web app would only make sense if the customised LIGYSIS pipeline was installed and a user wanted to explore their results locally without relying on our public server.

Exploring LIGYSIS results

There are two modes in which LIGYSIS Web can be employed:

  1. Exploring the pre-computed results of the LIGYSIS dataset, comprised of 65,000 ligand binding sites across 25,000 different proteins with >100,000 structures deposited on the PDBe. Protein UniProt accession, as well as protein names are supported to access results.

  2. Submitting your own set of structures (.cif, .ent or .pdb formats) for analysis and subsequent results exploration. This can be done through the LIGYSIS or the slivka-bio server. See here for slivka-bio source code.

This is the LIGYSIS pipeline graphical abstract

Results page

The results page of the application is divided in three panels: Binding Sites Panel (left), Structure Panel (centre) and Binding Site Residues panel (right). These panels are interactive and connected thhrough hover and click events.

This is the LIGYSIS WEB main resutls page

General information about the protein and the available structure data can be found at the top of the results page, above the three main results panels. From left to right, we can find the protein's UniProt accession ID, entry, protein names as well as the number of individual protein chains, relevant ligands and defined binding sites for the protein segment of interest.

Binding Sites Panel

This panel is formed by an interactive Chart.js graph and a table which display the binding site features, averaged from the residues forming them. These features include average Relative Solvent Accessibility (RSA)[1], Normalised Shenkin Divergence Score (DS) [2, 3], Missense Enrichment Score (MES) [4, 5, 6], binding site size, i.e., number of binding site residues (Size), RSA-derived Cluster (Cluster) [7] and RSA-derived Functional Score (FS).

Variables on each axis can be changed, to explore the relationship between the different features and a screenshot of the graph can be saved. Columns of the table can be sorted by value and the data saved to a .csv file. Both the graph and the table react to hover and click events displaying the sites on the structure panel. Hover events will have a temporary effect, whilst clicking on a data point or row will fix the corresponding binding site on the structure panel, until another site is clicked or the same site clicked again (unclicked). These events can be easily tracked by a yellow highlight on points and rows when hovered on or green when clicked.

Structure Panel

This is the central panel of the LIGYSIS web results page. At the very top, we find the name of the protein of interest, and below, the structural segment selector, as defined by the PDBe-KB [8] as well as the structure selector. By default, the ligand superposition view is shown, which has a single protein scaffold and the heteroatoms (non-protein atoms) of all other structures for a given protein segment. A user can click on this drop-up and explore the PISA-defined biological assembly [9] of any of the chains in the superposition. Transformation matrices for the superposition were obtained from the PDBe FTP site.

At the very centre of the panel, we find the 3DMol.js [10, 11] structure viewer with a white protein chain cartoon representation. Buttons are implemented to show/hide surfaces, residue labels, ligand, water molecules and protein-ligand contacts as calculated by pdbe-arpeggio [12]. These contacts can be downloaded as a .csv for the currently visualised assembly (not the Superposition), or for all assemblies at once as a zipped folder of .csv files. Apart from showing the residue labels upon clicking on a site, or displaying protein-ligand contacts, atom labels are shown upon hovering (one must hover on a residue for at least 1 second, otherwise, labels would be shown for every single atom whilst the user navigates through the structure).

The Superposition with coloured ligands, as well as individual assemblies with their protein-ligand interactions can be downloaded for external visualisation. Currently, ChimeraX (.cxc) [13] and PyMol (.pml) [14] are supported. Additionally, a screenshot of the current view can also be saved to .png format.

Superopsition viewers support

Binding Residues Panel

This panel is very similar to the Binding Sites Panel in structure. It is also formed by an interactive chart and table. However, each data point and row correspond now to the individual amino acid residues forming the binding sites, the average of which result on the data displayed on the Binding Sites Panel. The data for this panel includes UniProt Residue Number (UPResNum), the multiple sequence alignment (MSA) column to which this residue aligns (MSACol), residue Divergence Score (DS), Missense Enrichment Score (MES), its associated p-value (p), the amino acid one-letter code (AA), Relative Solvent Accessibility (RSA) and Secondary Structure (SS) element as calculated by DSSP [15]. This panel is linked to the structure viewer through hover, but not click events.

Citation

If you use the LIGYSIS web application, please cite:

Utgés JS, MacGowan SA, Ives CM, Barton GJ. Classification of likely functional class for ligand binding sites identified from fragment screening. Commun Biol. 2024 Mar 13;7(1):320. doi: 10.1038/s42003-024-05970-8. PMID: 38480979; PMCID: PMC10937669.

Utgés JS, Barton GJ. Comparative evaluation of methods for the prediction of protein-ligand binding sites. J Cheminform. 2024 Nov 11;16(1):126. doi: 10.1186/s13321-024-00923-z. PMID: 39529176; PMCID: PMC11552181.

Utgés JS, MacGowan SA, Barton GJ. LIGYSIS-web: a resource for the analysis of protein-ligand binding sites. 2025. Work in progress....

References

  1. Tien MZ, Meyer AG, Sydykova DK, Spielman SJ, Wilke CO. Maximum allowed solvent accessibilites of residues in proteins. PLoS One. 2013 Nov 21;8(11):e80635. doi: 10.1371/journal.pone.0080635. PMID: 24278298; PMCID: PMC3836772.

  2. Shenkin PS, Erman B, Mastrandrea LD. Information-theoretical entropy as a measure of sequence variability. Proteins. 1991; 11(4):297–313. Epub 1991/01/01. https://doi.org/10.1002/prot.340110408 PMID: 1758884.

  3. Utgés JS, Tsenkov MI, Dietrich NJM, MacGowan SA, Barton GJ. Ankyrin repeats in context with human population variation. PLoS Comput Biol. 2021 Aug 24;17(8):e1009335. doi: 10.1371/journal.pcbi.1009335. PMID: 34428215; PMCID: PMC8415598.

  4. MacGowan, SA, Madeira, F, Britto-Borges, T, Schmittner, MS, Cole, C, & Barton, GJ (2017). Human missense variation is constrained by domain structure and highlights functional and pathogenic residues. bioRxiv, 127050. https://doi.org/10.1101/127050.

  5. MacGowan SA, Madeira F, Britto-Borges T, Warowny M, Drozdetskiy A, Procter JB, Barton GJ. The Dundee Resource for Sequence Analysis and Structure Prediction. Protein Sci. 2020 Jan;29(1):277-297. doi: 10.1002/pro.3783. Epub 2019 Nov 28. PMID: 31710725; PMCID: PMC6933851.

  6. MacGowan SA, Madeira F, Britto-Borges T, Barton GJ. A unified analysis of evolutionary and population constraint in protein domains highlights structural features and pathogenic sites. Commun Biol. 2024 Apr 11;7(1):447. doi: 10.1038/s42003-024-06117-5. PMID: 38605212; PMCID: PMC11009406.

  7. Utgés JS, MacGowan SA, Ives CM, Barton GJ. Classification of likely functional class for ligand binding sites identified from fragment screening. Commun Biol. 2024 Mar 13;7(1):320. doi: 10.1038/s42003-024-05970-8. PMID: 38480979; PMCID: PMC10937669.

  8. Ellaway JIJ, Anyango S, Nair S, Zaki HA, Nadzirin N, Powell HR, Gutmanas A, Varadi M, Velankar S. Identifying protein conformational states in the Protein Data Bank: Toward unlocking the potential of integrative dynamics studies. Struct Dyn. 2024 May 17;11(3):034701. doi: 10.1063/4.0000251. PMID: 38774441; PMCID: PMC11106648.

  9. Krissinel E, Henrick K. Inference of macromolecular assemblies from crystalline state. J Mol Biol. 2007 Sep 21;372(3):774-97. doi: 10.1016/j.jmb.2007.05.022. Epub 2007 May 13. PMID: 17681537.

  10. Rego N, Koes D. 3Dmol.js: molecular visualization with WebGL. Bioinformatics. 2015 Apr 15;31(8):1322-4. doi: 10.1093/bioinformatics/btu829. Epub 2014 Dec 12. PMID: 25505090; PMCID: PMC4393526.

  11. Seshadri K, Liu P, Koes DR. The 3Dmol.js learning environment: a classroom response system for 3D chemical structures. J Chem Educ. 2020 Oct 13;97(10):3872-3876. doi: 10.1021/acs.jchemed.0c00579. Epub 2020 Aug 25. PMID: 36035779; PMCID: PMC9416521.

  12. Jubb HC, Higueruelo AP, Ochoa-Montaño B, Pitt WR, Ascher DB, Blundell TL. Arpeggio: A Web Server for Calculating and Visualising Interatomic Interactions in Protein Structures. J Mol Biol. 2017 Feb 3;429(3):365-371. doi: 10.1016/j.jmb.2016.12.004. Epub 2016 Dec 10. PMID: 27964945; PMCID: PMC5282402.

  13. Pettersen EF, Goddard TD, Huang CC, Meng EC, Couch GS, Croll TI, Morris JH, Ferrin TE. UCSF ChimeraX: Structure visualization for researchers, educators, and developers. Protein Sci. 2021 Jan;30(1):70-82. doi: 10.1002/pro.3943. Epub 2020 Oct 22. PMID: 32881101; PMCID: PMC7737788.

  14. Schrödinger, LLC. The PyMOL Molecular Graphics System, Version 2.0, Schrödinger, LLC.

  15. Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983 Dec;22(12):2577-637. doi: 10.1002/bip.360221211. PMID: 6667333.