Skip to content

Commit

Permalink
v2.4.1 - bug fixes, config updates, unit tests, docs
Browse files Browse the repository at this point in the history
  • Loading branch information
amckenna41 committed Nov 8, 2023
1 parent 127f365 commit 2606428
Show file tree
Hide file tree
Showing 36 changed files with 2,449 additions and 2,423 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/build_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
runs-on: ubuntu-latest #platform: [ubuntu-latest, macos-latest, windows-latest]
strategy:
matrix:
python-version: ["3.7", "3.8", "3.9", "3.10"] #testing on multiple python versions
python-version: ["3.8", "3.9", "3.10"] #testing on multiple python versions
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
Expand Down
18 changes: 9 additions & 9 deletions CONFIG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Config file parameters <a name="TOP"></a>

pySAR works via configuration files that contain the plethora of parameters and variables available for the full pySAR pipeline. The config files are in JSON format and broken into 4 different subsections: "dataset", "model", "descriptors", and "pyDSP". "dataset" outlines parameters to do with the dataset, "model" consists of all ML model related parameters, "descriptors" specifies what protein physiochemical/structural descriptors to use and the metaparameters for some protein descriptors and "pyDSP" is all parameters related to any of the DSP functionalities in pySAR. <br>
`pySAR` works mainly via JSON configuration files. There are many different customisable parameters for the functionalities in `pySAR` including the metaparameters of some of the available protein descriptors, all Digital Signal Processing (DSP) parameters in the `pyDSP` module, the type of regression model to use and parameters specific to the dataset - a description of each parameter is available in the example below.

Example configuration file for thermostability.json used in research:
These config files offer a more straightforward way of making any changes to the `pySAR` pipeline. The names of **All** the parameters as listed in the example config files must remain unchanged, only the value of each parameter should be changed, any parameters not being used can be set to <em>null</em>. Additionally, you can pass in the individual parameter names and values to the `pySAR` and `Encoding` classes when numerically encoding the protein sequences via **kwargs**. An example of the config file used in my research project ([thermostability.json](https://github.com/amckenna41/pySAR/blob/master/config/thermostability.json)), with all of the available parameters, can be seen below.

```json
{
Expand Down Expand Up @@ -114,10 +114,10 @@ Example configuration file for thermostability.json used in research:
* `descriptors[descriptors_csv]` - path to csv file of pre-calculated descriptor values of a dataset, saves time having to recalculate the features each time.

* `descriptors[moreaubroto_autocorrelation][lag] / descriptors[moran_autocorrelation][lag] / descriptors[geary_autocorrelation][lag]` - The maximum lag value for each of the autocorrelation descriptors. If invalid value input then a default of 30 is used.
* `descriptors[moreaubroto_autocorrelation][properties] / descriptors[moran_autocorrelation][properties] / descriptors[geary_autocorrelation][properties]` - List of protein physiochemical and structural descriptors used in the calculation of each of the autocorrelation descriptors, properties must be a lit of their AAIndex number/accession number. There must be a least 1 property value input.
* `descriptors[moreaubroto_autocorrelation][properties] / descriptors[moran_autocorrelation][properties] / descriptors[geary_autocorrelation][properties]` - List of protein physiochemical and structural descriptors used in the calculation of each of the autocorrelation descriptors, properties must be a list of their AAIndex number/accession numbers. There must be a least 1 property value input.
* `descriptors[moreaubroto_autocorrelation][normalize] / descriptors[moran_autocorrelation][normalize] / descriptors[geary_autocorrelation][normalize]` - rescale/normalize Autocorrelation values into range of 0-1.

* `descriptors[ctd][property]` - list of 1 or more physiochemical properties to use when calculating CTD descriptors. List of available input properties: If no properties input then hydrophobicity used by default.
* `descriptors[ctd][property]` - list of 1 or more physiochemical properties to use when calculating CTD descriptors. List of available input properties: hydrophobicity, normalized_vdwv, polarity, charge, secondary_struct, solvent_accessibility, polarizability. If no properties input then hydrophobicity used by default.
* `descriptors[ctd][all]` - if True then all 7 of the available physiochemical descriptors will be used when calculating the CTD descriptors. Each proeprty generates 21 features so using all properties will output 147 features. Only 1 property used by default.

* `descriptors[sequence_order_coupling_number][maxlag]` - maximum lag; length of the protein must be not less than maxlag.
Expand All @@ -127,17 +127,17 @@ Example configuration file for thermostability.json used in research:
* `descriptors[quasi_sequence_order][weight]` - weighting factor to use when calculating descriptor.
* `descriptors[quasi_sequence_order][distance_matrix]` - path to physiochemical distance matrix for calculating quasi sequence order.

* `descriptors[pseudo_amino_acid_composition][lambda]` - lamda parameter that reflects the rank correlation and should be a non-negative integer and not larger than the length of the protein sequence.
* `descriptors[pseudo_amino_acid_composition][lambda]` - lambda parameter that reflects the rank correlation and should be a non-negative integer and not larger than the length of the protein sequence.
* `descriptors[pseudo_amino_acid_composition][weight]` - weighting factor to use when calculating descriptor.
* `descriptors[pseudo_amino_acid_composition][properties]` - 1 or more amino acid index properties from the AAI database used for calculating the sequence-order.

* `descriptors[amphiphilic_pseudo_amino_acid_composition][lambda]` - lamda parameter that reflects the rank correlation and should be a non-negative integer and not larger than the length of the protein sequence.
* `descriptors[amphiphilic_pseudo_amino_acid_composition][lambda]` - lambda parameter that reflects the rank correlation and should be a non-negative integer and not larger than the length of the protein sequence.
* `descriptors[amphiphilic_pseudo_amino_acid_composition][weight]` - weighting factor to use when calculating descriptor.

**DSP Parameters:**
* `pyDSP[use_dsp]` - whether or not to apply Digital Signal Processing (DSP) techniques to the features passed into the model. If true, the values of the next DSP parameters will be applied to the features.
* `pyDSP[spectrum]` - which frequency output to use from the generated types of signals from DSP to use e.g power, absolute, imaginery, real.
* `pyDSP[window][type]` - convolutional window to apply to the signal output, pySAR supports: hamming, blackman, blackmanharris, gaussian, bartlett, kaiser, barthann, bohman, chebwin, cosine, exponential, flattop, hann, boxcar, hanning, nuttall, parzen, triang, tukey.
* `pyDSP[filter][type]` - window filter to apply to the signal output, pySAR supports: savgol, medfilt, symiirorder1, lfilter, hilbert.
* `pyDSP[spectrum]` - which frequency output/informational spectra to use from the generated types of signals from DSP to use e.g power, absolute, imaginery, real.
* `pyDSP[window][type]` - convolutional window to apply to the signal output, pySAR supports: hamming, blackman, blackmanharris, gaussian, bartlett, kaiser, barthann, bohman, chebwin, cosine, exponential, flattop, hann, boxcar, hanning, nuttall, parzen, triang and tukey.
* `pyDSP[filter][type]` - window filter to apply to the signal output, pySAR supports: savgol, medfilt, symiirorder1, lfilter and hilbert.

[Back to top](#TOP)
Loading

0 comments on commit 2606428

Please sign in to comment.