-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
v2.4.3 - bug fixes, unit tests, docs
- Loading branch information
1 parent
8e9c0dc
commit 9ad2ca6
Showing
10 changed files
with
65 additions
and
67 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,14 +18,19 @@ | |
<!-- [![DOI](https://zenodo.org/badge/344290370.svg)](https://zenodo.org/badge/latestdoi/344290370) --> | ||
<!-- [![Documentation Status](https://readthedocs.org/projects/ansicolortags/badge/?version=latest)](http://ansicolortags.readthedocs.io/?badge=latest) --> | ||
|
||
`pySAR` is a Python library for analysing Sequence Activity Relationships (SARs)/Sequence Function Relationships (SFRs) of protein sequences. | ||
|
||
* The published research article is available [here][article]. | ||
* A quick Colab notebook demo of `pySAR` is available [here][demo]. | ||
* A **Medium** article that dives deeper into SARs and the `pySAR` software itself is available [here][medium]. | ||
|
||
Table of Contents | ||
================= | ||
* [Introduction](#Introduction) | ||
* [Requirements](#requirements) | ||
* [Installation](#installation) | ||
* [Usage](#usage) | ||
* [Directories](#directories) | ||
* [Tests](#tests) | ||
* [Issues](#Issues) | ||
* [Contact](#contact) | ||
* [License](#license) | ||
|
@@ -34,7 +39,7 @@ Table of Contents | |
|
||
Research Article | ||
================ | ||
The research article that accompanied this software is titled: "Machine Learning Based Predictive Model for the Analysis of Sequence Activity Relationships Using Protein Spectra and Protein Descriptors" and was published in the Journal of Biomedical Informatics and is available [here][article] [[1]](#references). There is also a quick <b>Colab notebook demo</b> of `pySAR` available [here][demo]. | ||
The research article that accompanied this software is titled: "Machine Learning Based Predictive Model for the Analysis of Sequence Activity Relationships Using Protein Spectra and Protein Descriptors" and was published in the Journal of Biomedical Informatics and is available [here][article] [[1]](#references). | ||
|
||
How to cite | ||
=========== | ||
|
@@ -46,10 +51,12 @@ Introduction | |
|
||
After finding the optimal technique and feature set at which to numerically encode your dataset of sequences, `pySAR` can then be used to build a predictive regression ML model with the training data being that of the encoded protein sequences, and training labels being the in vitro experimentally pre-calculated activity values for each protein sequence. This model maps a set of protein sequences to the sought-after activity value, being able to accurately predict the activity/fitness value of new unseen sequences. The use-case for the software is within the field of Protein Engineering, Directed Evolution and or Drug Discovery, where a user has a set of in vitro experimentally determined activity/fitness values for a library of mutant protein sequences and wants to computationally predict the sought activity value for a selection of mutated unseen sequences, in the aim of finding the best sequence that minimises/maximises their activity value. <br> | ||
|
||
In the published [research][article], the sought activity/fitness characterisitc is the thermostability of proteins from a recombination library designed from parental cytochrome P450's. This thermostability is measured using the T50 metric (temperature at which 50% of a protein is irreversibly denatured after 10 mins of incubation, ranging from 39.2 to 64.4 degrees C), which we want to maximise [[1]](#references). | ||
In the published [research][article], the sought activity/fitness characteristic is the thermostability of proteins from a recombination library designed from parental cytochrome P450's. This thermostability is measured using the T50 metric (temperature at which 50% of a protein is irreversibly denatured after 10 mins of incubation, ranging from 39.2 to 64.4 degrees C), which we want to maximise [[1]](#references). | ||
|
||
Two additional <strong>custom-built</strong> softwares were created alongside `pySAR` - [`aaindex`][aaindex] and [`protpy`][protpy]. The `aaindex` software package is used for parsing the amino acid index which is a database of numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids [[2]](#references). `protpy` is used for calculating a series of protein physiochemical, biochemical and structural protein descriptors. Both of these software packages are integrated into `pySAR` but can also be used individually for their respective purposes. | ||
|
||
**A quick Colab notebook demo of `pySAR` is available [here][demo]. There is also a Medium article that dives deeper into SARs and the `pySAR` software itself, available [here][medium].** | ||
|
||
Requirements | ||
============ | ||
* [Python][python] >= 3.8 | ||
|
@@ -521,19 +528,6 @@ Issues | |
====== | ||
Any issues, errors or bugs can be raised via the [Issues](https://github.com/amckenna41/pySAR/issues) tab in the repository. | ||
|
||
Tests | ||
===== | ||
To run all tests, from the main `pySAR` repo folder run: | ||
``` | ||
python3 -m unittest discover tests | ||
``` | ||
|
||
To run tests for specific module, from the main `pySAR` repo folder run: | ||
``` | ||
python -m unittest tests.MODULE_NAME -v | ||
-v: verbose output flag | ||
``` | ||
|
||
Contact | ||
======= | ||
If you have any questions or comments, please contact [email protected] or raise an issue on the [Issues][Issues] tab. <br><br> | ||
|
@@ -579,4 +573,5 @@ DOI: 10.1021/acs.jcim.0c00073 <br><br> | |
[demo]: https://colab.research.google.com/drive/1hxtnf8i4q13fB1_2TpJFimS5qfZi9RAo?usp=sharing | ||
[Issues]: https://github.com/amckenna41/pySAR/issues | ||
[license]: https://github.com/amckenna41/pySAR/blob/master/LICENSE | ||
[config]: https://github.com/amckenna41/pySAR/blob/master/CONFIG.md | ||
[config]: https://github.com/amckenna41/pySAR/blob/master/CONFIG.md | ||
[medium]: https://ajmckenna69.medium.com/pysar-a3de9f71733f |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
""" pySAR software metadata. """ | ||
__name__ = 'pySAR' | ||
__version__ = "2.4.2" | ||
__version__ = "2.4.3" | ||
__description__ = 'A Python package used to analysis Sequence Activity Relationships (SARs) of protein sequences and their mutants using Machine Learning.' | ||
__author__ = 'AJ McKenna: https://github.com/amckenna41' | ||
__authorEmail__ = '[email protected]' | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.