Skip to content

Commit

Permalink
v2.4.2 - bug fixes, feature updates, unit tests, ci/cd updates
Browse files Browse the repository at this point in the history
  • Loading branch information
amckenna41 committed Nov 16, 2023
1 parent 2606428 commit 8e9c0dc
Show file tree
Hide file tree
Showing 18 changed files with 46 additions and 41 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/deploy_pypi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
runs-on: ubuntu-latest #platform: [ubuntu-latest, macos-latest, windows-latest]
strategy:
matrix:
python-version: [3.8] #deploying using one Python version on 1 runner
python-version: [3.9] #deploying using one Python version on 1 runner
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
Expand All @@ -38,7 +38,7 @@ jobs:
run: |
python3 setup.py sdist bdist_wheel
twine check dist/*
twine upload dist/*
twine upload dist/* --verbose
env:
TWINE_USERNAME: __token__
TWINE_PASSWORD: ${{ secrets.PYPI_TOKEN }}
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/deploy_test_pypi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:
runs-on: ubuntu-latest #platform: [ubuntu-latest, macos-latest, windows-latest]
strategy:
matrix:
python-version: [3.8] #deploying using one Python version on 1 runner
python-version: [3.9] #deploying using one Python version on 1 runner
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
Expand All @@ -39,7 +39,7 @@ jobs:
run: |
python3 setup.py sdist bdist_wheel
twine check dist/*
twine upload --repository testpypi dist/*
twine upload --repository testpypi dist/* --verbose
env:
TWINE_USERNAME: __token__
TWINE_PASSWORD: ${{ secrets.TEST_PYPI_TOKEN }}
Expand Down
10 changes: 5 additions & 5 deletions CONFIG.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,13 +52,13 @@ These config files offer a more straightforward way of making any changes to the
"sequence_order_coupling_number":
{
"lag": 30,
"distance_matrix": "schneider-wrede-physiochemical-distance-matrix.json"
"distance_matrix": "schneider-wrede.json"
},
"quasi_sequence_order":
{
"lag": 30,
"weight": 0.1,
"distance_matrix": "schneider-wrede-physiochemical-distance-matrix.json"
"distance_matrix": "schneider-wrede.json"
},
"pseudo_amino_acid_composition":
{
Expand Down Expand Up @@ -101,7 +101,7 @@ These config files offer a more straightforward way of making any changes to the
## Below is an explanation of each of the parameters within the JSON config files:

**Dataset Parameters:**
* `dataset[dataset]` - name of dataset.
* `dataset[dataset]` - path of dataset.
* `dataset[sequence_col]` - name of sequence column in dataset holding protein sequences, if left blank 'sequence' will be used by default.
* `dataset[activity]` - name of protein activity column in dataset being studied.

Expand All @@ -121,11 +121,11 @@ These config files offer a more straightforward way of making any changes to the
* `descriptors[ctd][all]` - if True then all 7 of the available physiochemical descriptors will be used when calculating the CTD descriptors. Each proeprty generates 21 features so using all properties will output 147 features. Only 1 property used by default.

* `descriptors[sequence_order_coupling_number][maxlag]` - maximum lag; length of the protein must be not less than maxlag.
* `descriptors[sequence_order_coupling_number][distance_matrix]` - physiochemical distance matrix for calculating sequence order coupling number.
* `descriptors[sequence_order_coupling_number][distance_matrix]` - physiochemical distance matrix name for calculating sequence order coupling number.

* `descriptors[quasi_sequence_order][maxlag]` - maximum lag; length of the protein must be not less than maxlag.
* `descriptors[quasi_sequence_order][weight]` - weighting factor to use when calculating descriptor.
* `descriptors[quasi_sequence_order][distance_matrix]` - path to physiochemical distance matrix for calculating quasi sequence order.
* `descriptors[quasi_sequence_order][distance_matrix]` - physiochemical distance matrix name for calculating quasi sequence order.

* `descriptors[pseudo_amino_acid_composition][lambda]` - lambda parameter that reflects the rank correlation and should be a non-negative integer and not larger than the length of the protein sequence.
* `descriptors[pseudo_amino_acid_composition][weight]` - weighting factor to use when calculating descriptor.
Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,8 @@ Two additional <strong>custom-built</strong> softwares were created alongside `p
Requirements
============
* [Python][python] >= 3.8
* [aaindex][aaindex] >= 1.1.1
* [protpy][protpy] >= 1.1.10
* [aaindex][aaindex] >= 1.1.2
* [protpy][protpy] >= 1.2.0
* [numpy][numpy] >= 1.24.2
* [pandas][pandas] >= 1.5.3
* [scikit-learn][sklearn] >= 1.2.1
Expand Down Expand Up @@ -531,6 +531,7 @@ python3 -m unittest discover tests
To run tests for specific module, from the main `pySAR` repo folder run:
```
python -m unittest tests.MODULE_NAME -v
-v: verbose output flag
```

Contact
Expand Down
7 changes: 6 additions & 1 deletion TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -268,4 +268,9 @@ To Do List:
- [X] Recalculate and reupload descriptors_thermostability.csv.
- [X] Add info about the colunns and dimensions of each descriptors in pre-calculated csv file - fix Issue.
- [X] When calculating all descriptors (get_all_descriptors(export=True)), add some sort of print/tracking functionality.
- [X] Double check all links in readme.
- [X] Double check all links in readme.
- [ ] Add dimensions of each dataset to https://github.com/amckenna41/pySAR/tree/master/example_datasets.
- [ ] Go over references in descriptors module - refer to protpy.
- [X] Update distance matrices in configs - test once protpy published.
- [ ] Add link to medium article.
- [X] Update aaindex version on readme.
4 changes: 2 additions & 2 deletions config/absorption.json
Original file line number Diff line number Diff line change
Expand Up @@ -45,13 +45,13 @@
"sequence_order_coupling_number":
{
"lag": 30,
"distance_matrix": "schneider-wrede-physiochemical-distance-matrix.json"
"distance_matrix": "schneider-wrede"
},
"quasi_sequence_order":
{
"lag": 30,
"weight": 0.1,
"distance_matrix": "schneider-wrede-physiochemical-distance-matrix.json"
"distance_matrix": "schneider-wrede"
},
"pseudo_amino_acid_composition":
{
Expand Down
4 changes: 2 additions & 2 deletions config/enantioselectivity.json
Original file line number Diff line number Diff line change
Expand Up @@ -45,13 +45,13 @@
"sequence_order_coupling_number":
{
"lag": 30,
"distance_matrix": "schneider-wrede-physiochemical-distance-matrix.json"
"distance_matrix": "schneider-wrede"
},
"quasi_sequence_order":
{
"lag": 30,
"weight": 0.1,
"distance_matrix": "schneider-wrede-physiochemical-distance-matrix.json"
"distance_matrix": "schneider-wrede"
},
"pseudo_amino_acid_composition":
{
Expand Down
4 changes: 2 additions & 2 deletions config/localization.json
Original file line number Diff line number Diff line change
Expand Up @@ -45,13 +45,13 @@
"sequence_order_coupling_number":
{
"lag": 30,
"distance_matrix": "schneider-wrede-physiochemical-distance-matrix.json"
"distance_matrix": "schneider-wrede"
},
"quasi_sequence_order":
{
"lag": 30,
"weight": 0.1,
"distance_matrix": "schneider-wrede-physiochemical-distance-matrix.json"
"distance_matrix": "schneider-wrede"
},
"pseudo_amino_acid_composition":
{
Expand Down
4 changes: 2 additions & 2 deletions config/thermostability.json
Original file line number Diff line number Diff line change
Expand Up @@ -45,13 +45,13 @@
"sequence_order_coupling_number":
{
"lag": 30,
"distance_matrix": "schneider-wrede-physiochemical-distance-matrix.json"
"distance_matrix": "schneider-wrede"
},
"quasi_sequence_order":
{
"lag": 30,
"weight": 0.1,
"distance_matrix": "schneider-wrede-physiochemical-distance-matrix.json"
"distance_matrix": "schneider-wrede"
},
"pseudo_amino_acid_composition":
{
Expand Down
6 changes: 3 additions & 3 deletions pySAR/__init__.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
""" pySAR software metadata. """
__name__ = 'pySAR'
__version__ = "2.4.1"
__version__ = "2.4.2"
__description__ = 'A Python package used to analysis Sequence Activity Relationships (SARs) of protein sequences and their mutants using Machine Learning.'
__author__ = 'AJ McKenna, https://github.com/amckenna41'
__author__ = 'AJ McKenna: https://github.com/amckenna41'
__authorEmail__ = '[email protected]'
__maintainer__ = "AJ McKenna"
__license__ = 'MIT'
__url__ = 'https://github.com/amckenna41/pySAR'
__download_url__ = "https://github.com/amckenna41/pySAR/archive/refs/heads/main.zip"
__status__ = "Production"
__keywords__ = ["bioinformatics", "protein engineering", "python", "pypi", "machine learning", \
"directed evolution", "drug discovery", "sequence activity relationships", "SAR", "aaindex", "protein descriptors"]
"directed evolution", "drug discovery", "sequence activity relationships", "SAR", "aaindex", "protpy", "protein descriptors"]
__test_suite__ = "tests"
6 changes: 4 additions & 2 deletions pySAR/descriptors.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,15 +88,17 @@ class Descriptors():
Pseudo-Amino Acid Composition. PROTEINS: Structure, Function, and
Genetics, 2001, 43: 246-255.
[10] Kuo-Chen Chou. Using amphiphilic pseudo amino acid composition to predict enzyme
subfamily classes. Bioinformatics, 2005,21,10-19.
subfamily classes. Bioinformatics, 2005,21,10-19.
[11] J. Shen et al., “Predicting protein-protein interactions based only on sequences
information,” Proc. Natl. Acad. Sci. U. S. A., vol. 104, no. 11, pp. 4337–4341, 2007.
[12] Gisbert Schneider and Paul Wrede. The Rational Design of Amino Acid Sequences
by Artifical Neural Networks and Simulated Molecular Evolution: Do Novo Design
of an Idealized Leader Cleavge Site. Biophys Journal, 1994, 66, 335-344.
[13] Grantham, R. (1974-09-06). "Amino acid difference formula to help explain protein
[13] Grantham, R. (1974-09-06). "Amino acid difference formula to help explain protein
evolution". Science. 185 (4154): 862–864. Bibcode:1974Sci...185..862G.
doi:10.1126/science.185.4154.862. ISSN 0036-8075. PMID 4843792. S2CID 35388307.
[14] B. Hollas, “An analysis of the autocorrelation descriptor for molecules,” J. Math. Chem.,
vol. 33, no. 2, pp. 91–101, 2003.
"""
def __init__(self, config_file="", protein_seqs=None, **kwargs):

Expand Down
3 changes: 0 additions & 3 deletions pySAR/pyDSP.py
Original file line number Diff line number Diff line change
Expand Up @@ -451,9 +451,6 @@ def consensus_freq(self, freqs):
if (freqs.ndim == 2 and freqs.shape[1] != 2):
raise ValueError("Only one protein sequence should be passed into the function: {}.".format(freqs))

print(self.max_freq(freqs)[0])
print(self.num_seqs)
print((self.max_freq(freqs)[0])/self.num_seqs)
# CF = PP/N ( peak position/length of largest protein in dataset)
CF = (self.max_freq(freqs)[0])/self.num_seqs
return CF
Expand Down
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[metadata]
name = PySAR
version = 2.4.1
version = 2.4.2
description = Analysing Sequence Activity Relationships (SARs) of protein sequences and their mutants using Machine Learning.
author = AJ McKenna
author_email = [email protected]
Expand Down
4 changes: 2 additions & 2 deletions tests/test_config/test_absorption.json
Original file line number Diff line number Diff line change
Expand Up @@ -45,13 +45,13 @@
"sequence_order_coupling_number":
{
"lag": 30,
"distance_matrix": "schneider-wrede-physiochemical-distance-matrix.json"
"distance_matrix": "schneider-wrede"
},
"quasi_sequence_order":
{
"lag": 30,
"weight": 0.1,
"distance_matrix": "schneider-wrede-physiochemical-distance-matrix.json"
"distance_matrix": "schneider-wrede"
},
"pseudo_amino_acid_composition":
{
Expand Down
4 changes: 2 additions & 2 deletions tests/test_config/test_enantioselectivity.json
Original file line number Diff line number Diff line change
Expand Up @@ -45,13 +45,13 @@
"sequence_order_coupling_number":
{
"lag": 30,
"distance_matrix": "schneider-wrede-physiochemical-distance-matrix.json"
"distance_matrix": "schneider-wrede"
},
"quasi_sequence_order":
{
"lag": 30,
"weight": 0.1,
"distance_matrix": "schneider-wrede-physiochemical-distance-matrix.json"
"distance_matrix": "schneider-wrede"
},
"pseudo_amino_acid_composition":
{
Expand Down
4 changes: 2 additions & 2 deletions tests/test_config/test_localization.json
Original file line number Diff line number Diff line change
Expand Up @@ -45,13 +45,13 @@
"sequence_order_coupling_number":
{
"lag": 30,
"distance_matrix": "schneider-wrede-physiochemical-distance-matrix.json"
"distance_matrix": "schneider-wrede"
},
"quasi_sequence_order":
{
"lag": 30,
"weight": 0.1,
"distance_matrix": "schneider-wrede-physiochemical-distance-matrix.json"
"distance_matrix": "schneider-wrede"
},
"pseudo_amino_acid_composition":
{
Expand Down
4 changes: 2 additions & 2 deletions tests/test_config/test_thermostability.json
Original file line number Diff line number Diff line change
Expand Up @@ -45,13 +45,13 @@
"sequence_order_coupling_number":
{
"lag": 30,
"distance_matrix": "schneider-wrede-physiochemical-distance-matrix.json"
"distance_matrix": "schneider-wrede"
},
"quasi_sequence_order":
{
"lag": 30,
"weight": 0.1,
"distance_matrix": "schneider-wrede-physiochemical-distance-matrix.json"
"distance_matrix": "schneider-wrede"
},
"pseudo_amino_acid_composition":
{
Expand Down
8 changes: 4 additions & 4 deletions tests/test_pySAR.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,14 +60,14 @@ def setUp(self):
if not (os.path.isdir(self.test_output_folder)):
os.makedirs(self.test_output_folder)

@unittest.skip("Skipping metadata tests.")
# @unittest.skip("Skipping metadata tests.")
def test_pySAR_metadata(self):
""" Testing correct pySAR version and metadata. """
self.assertEqual(pysar_.__version__, "2.4.1",
self.assertEqual(pysar_.__version__, "2.4.2",
"pySAR version is not correct, got: {}.".format(pysar_.__version__))
self.assertEqual(pysar_.__name__, "pySAR",
"pySAR software name is not correct, got: {}.".format(pysar_.__name__))
self.assertEqual(pysar_.__author__, "AJ McKenna, https://github.com/amckenna41",
self.assertEqual(pysar_.__author__, "AJ McKenna: https://github.com/amckenna41",
"pySAR author is not correct, got: {}.".format(pysar_.__author__))
self.assertEqual(pysar_.__authorEmail__, "[email protected]",
"pySAR author email is not correct, got: {}.".format(pysar_.__authorEmail__))
Expand All @@ -83,7 +83,7 @@ def test_pySAR_metadata(self):
"pySAR maintainer is not correct, got: {}.".format(pysar_.__license__))
self.assertEqual(pysar_.__keywords__, ["bioinformatics", "protein engineering", "python", \
"pypi", "machine learning", "directed evolution", "drug discovery", "sequence activity relationships", \
"SAR", "aaindex", "protein descriptors"], "pySAR keywords is not correct, got: {}.".format(pysar_.__keywords__))
"SAR", "aaindex", "protpy", "protein descriptors"], "pySAR keywords is not correct, got: {}.".format(pysar_.__keywords__))

def test_pySAR(self):
""" Testing pySAR intialisation process and associated methods & attributes. """
Expand Down

0 comments on commit 8e9c0dc

Please sign in to comment.