To Do List:

Add Config files instead of individual parameters in function
Add Manifest file?
Add License & Releases
Add distance matrices to SOCNum
Add coveralls
Change descriptor functions to accept sequences themselves.
Add references for all descriptors
Look over and add some encoding unit tests.
Update version in init
Look into pssm: https://nanx.me/protr/reference/extractPSSM.html
Change AAP parameter in descriptors.
Split up CTD to C, T and D.
Add lag arg to Autocorrelation descriptors.
Descriptors accewpt AAI code for property names.
Add DPC and TPC lists to files.
Correct supplementary_materials
Add example comments to func definitions
Installation from source in readme
Parse descriptors into embedded JSON format
Seperate pySAR and descriptors into different softwares??
Change ProDSP to pyDSP
Update comments in funcs to that of DCBLSTM_PSP
Add config file to README
Change Explained Var to Explained Variance
Add : to parameters in functions
Change descriptor comments to account for each descriptor func working per sequence.
Change workflow name from deploy_to_testpypi -> deploy_testpypi
Move parse functions to bottom of file
Change lambda (key word) to lamda
Remove parameter instances of class Object in comments?
Add filter/window/convolution metaparameters to config
Remove 'Description' from comments of functions
Ensure outputs are Series not DF's
Lower case input before searching DB in func get_record_from_name
Change get_record_from_name to get_record_from_desc
For each descriptor function, if input is single string (single seq) return
Add wiki to pySAR
Catch TypeError on each descriptor func (add typerror to descriptor tests)
Create a 2nd branch with results in it
Swap any 'pySAR' for self.pySAR_module_path = os.path.dirname(os.path.abspath(sys.modules[self.module].file))
I think AAI_descriptor encoding func is adding empty descriptor DF's onto AAI indices
Amp pseudo aacomp descriptor func.
Add print parameter to encoding functions.
Add readthedocs badge -
Update AAIndex module to use similar setup as aaindex package - removeing aaindex.py and test_aaindex.py files
- Add images to pypi - https://stackoverflow.com/questions/41983209/how-do-i-add-images-to-a-pypi-readme-that-works-on-github
Add emojis to readme
read the docs
Upload bandit, package safety check etc to GCP bucket
Add software dev report?
Make demo youtube vid.
Add future section to readme.
Read the docs (.readthedocs.yml)
Front-end react app.
Check variable naming conventions (https://peps.python.org/pep-0008/)
Check output of bandit and flake8 check.
Add updates.md file which outlines changes bw versions.
Remove travis.ci workflows.
Add maintainer and keywords to setup.py.
Remove travis ci from readme.
Remove some python versions from setup.py.
Fix "How to Cite" section to display citation on multiple lines.
Add info about different branch in repo, maybe even remove it and move ot Google Drive.
Correct naming in setup.cfg.
Add "python3 -m twine upload --repository testpypi dist/*" to test pypi workflow.
Change "secrets.PY_PI..." to "secrets.PYPI...".
Remove .Ds_Store and any pycache from github upload.
Update build/test/deploy workflow syntax, look at iso3166-updates workflow for reference.
Add comments to .circle/workflow.
Add spacing in between individual references, see if it improves readability, revert if not.
Include range of values for each activity value in example_datasets/readme.
Finish Config.md. Mention in pySAR/readme that explanation of all params available in config.md.
Update readme examples of software to include updated import structure of.
Fix badges on main repo.
Can remove aaindex files from pySAR/pySAR/data as these are included in aaindex package now.
Remove license.txt from pySAR/pySAR
Could remove aaindex1 & aaindex1.json file from tests/test_data.
Remove .coveralls.yml
Move pySAR demo to Google Colab.
In descriptors/readme go into more detail about what each descriptor does.
Add documentation section in readme.
Add Assertion message to some unit tests.
Change mentions of "descriptor_paramters" to "descriptor_properties".
Results branch - remove everything but results dir.
Be able to pass in JSON object to Encoding object. Parse objects in JSON that are present, all don't have to be but most important ones need to be there.
Change Get record from AAIndex database section in readme to show new way aaindex is imported.
Update refs on main readme to actually include authors/titles of works.
Remove travis ci badge from readme.
JSON object of descriptors can be passed into Descriptors class.
Add path-ignore keywords to GitHub Action.
Reorder software metadata in setup.py to be in order of main func, create description var.
Add download_url to setup.py - url of zipped package.
In some unit tests, may need to use self.assertAlmostEqual instead of self.assertEqual.
Remove 'get_' from functions.
Make self.params & other dicts in pySAR accessible via dot notation.
Remove all camel casing function names/vars, change to underscores and lowercase (https://peps.python.org/pep-0008/#function-and-variable-names).
Usage example using fasta import function/module.
Use Map class to allow for config file and parameters accessible via dot notation.
In config files, change 'comp' to 'composition'
Change protpy.aa_composition -> protpy.amino_acid_composition
Add normalize parameter to each autocorrelation func & config.
Change all references of normalized_moreaubroto_autocorrelation to moreaubroto_autocorrelation
Change 'Amp' -> 'amphiphilic' - config.md
Change seq_order_... -> sequence_order... , quasi_seq_order -> quasi_sequence_order in config.md.
Mention custom built aaindex & protpy softwares.
Change sec_struct to secondary_struct.
Add list of test cases to comments in each test suite.
Change explainedVar -> explained_var.
Test dtype of columns in output dataframes.
Test correct naming convention for descriptor dataframe names using regex.
Mention lag is similar to gap between 2 amino acids.
Add all SOCN to configs.
For Pseudo AAC, default hard-coded props of hydrophobicity, hydrophilicity and residue mass are used. User can input custom aaindex1 property codes, set prooperty config key to blank.
Add Amphiphilic to config files and MD - inputs of lamda, weight and properties.
Add space between key and val in configs.
Prepend 'ctd_' to the ctd descriptors names, attributes and function names.
Add reference numbers to comments in descriptor functions - double check existing ones are correct, reorder them.
Change QSOrder to QSO.
Rewrite APAAComp descriptor comments to mention its dimensions change with lamda.
Wrap all if statements in brackets.
Remove convolution from pyDSP and config.
Move Map class to utils.
Ensure all functions have Parameters and Returns in the comments, even if they are None
Add filter function to pyDSP.
Change self.spectra to self.spectrum in pyDSP.
In function comments change default = X to default=
For pseudo and amp composition, only test on 1 dataset as takes to long with all of them.
Double check output datatype of encoding functions, Series or DF?
Unit test for get_aai_encoding func, pass in list of multiple indices, test that dimension of output is 2 * length of sequences.
Remove string section of get_aai_encoding and just cast input to list if its a string.
Error when passing in single index into encode_aai (N, A, K, H, 9, 2, 0, 1, 0, 2).
Unit test dtypes of each column in aai encoding - ['Index', 'Category', 'R2', 'RMSE', 'MSE', 'RPD', 'MAE', 'Explained Variance']
Test dtypes of activity column outputs.
Need to change max_lag to lag in protpy.quasi_sequence_order when new protpy software releases.
In get_descriptor_encoding function, parse input as list, seperate multiple descriptors on comma.
Potential issue when concatenating conjoint triad descriptor output with aa_indices descriptor as column names may clash.
Change aaindex column names from incrementing numbers to - "aa_1", "aa_2" ...
Update aai encoding unit tests to take new naming convention into consideration.
Ensure output from encoding funcs is DF not a Series.
Remove "Getting X Descriptor" etc?
Rename software from pySAR -> pysar.
Go over import_descriptors func.
In test_descriptors, check if double import of descriptors module is needed.
Mention aaindex and protpy in readme of pysar.
Calcualte descriptor values for each example dataset and upload to repo, using default params in config, if file size not too big.
Replace descriptors csv with updated csv.
Rerun get all descriptors func on colab to take into account new conjoint triad and CTD column names.
If ["ctd"]["all"] = true this calculates ALL CTD descriptors for all 7 properties, if not true then CTD descriptors are calculated individually.
Remove ctd_comp, distr, trans descriptors, just use parent CTD descriptors and slice from it.
Python unit tests using ctd with 1 property, and using all properties, check dimensions - 21 vs 147 (147/21=7). 21 dimensions per property. 3 C, 3 T, 15 D.
Add spaces to test config files.
SOCN tests with distance matrix in config empty & non-empty, different SOCN functions.
def quasi_sequence_order() - dimesnion (1,lag). def quasi_sequence_order_all() - dimension (1,lag*2)
Test descriptor import function.
If no extension on config param, dataset and or descriptors csv param then apend csv to it.
Input dataset can be in txt or csv form.
On encoding.descriptor_encoding(), parameters in comments is empty:
Add all pre-calculated descriptors csv to example_datasets.
Double check difflib/closeness_matches on acitivty and sequence cols.
Remove get_seqs from pySAR.
Change self.activity -> self.activity_col, set self.activity to the actual column data.
Change all references to config_path to config_file, including dsp_config.
in aai_encoding func in Encoding, reorder columns such that MAE is before RPD.
when testing desc and aai + desc endoing, use test config with and without pre-calcualted descriptors csv.
Ensure example_datasets isnt in software packaging.
Pretty print json when printing parameters in Encoding functions.
Sort by for RMSE and MSE incorrect, smallest values should be first, largest values last. Sort asc instead of sort desc.
If None or empty params input to encoding functions then use all aai and or descriptors, if not then raise value error if invalid desc or aai input.
Passing in invalid_test_desc5 = "invalid_descriptor_name" to descriptor_encoding func should return value error or similar.
Conjoint triad and CTD cols overlapping when importing.
CTD Transition getting replaced by Distribution - double check. CTD_T not in exported CSV.
Rerun get all descriptors for each dataset.
Use python venv to run unit tests.
desc_config input parameter can be a filepath to the config file - descriptors.py
CTD columns are repeating twice in output csv.
Remove property key from ctd_comp, ctd_distr, ctd_trans and from config.md
Double check concatenated AAI columns have prefix aai_. Test this in test_encoding unit tests.
Only generate and or upload coverage report for one Python version in workflows.
Input X and Y into Model class, initialise in constructor.
Change test_size param to test_split.
Best params is empty when outputting hyperparameter results. Use default params if params in config is {}.
In Encoding output Change AAI Indices -> 1 to Using AAI Indices -> 1, Descriptors -> 1 to Using Descriptors -> 1
When reading in descriptor name, lowercase, if there's spaces, seperate with underscores.
Add results from research folder to Google Drive, mention in Research Article section. Mention pre-calculated descriptors from same section.
Remove 2 distance matrices from pySAR/data, now a part of protpy package.
Remove manifest file after removal of pySAR/data.
Upload pySAR demo as ppt rather than .key.
Double check what happens when dict not passed into Map class, should error be rasied? Reflect change in aaindex.
Remove get_protein module and references to it.
Add circleci badge back into repo now that it's sorta working.
In hyperparameter tuning results change CV to Number of cross-validation folds etc.
Less verbose output for hyperparameter tuning.
str of Desscriptor class displays all descriptor names and shapes.
Remove "descriptors" from config, move csv param to "desc_properties", rename desc_properties -> descriptors.
Organise config, newline for [] and {}.
Change all references of lamda to lambda.
Remove cutoff index.
Unit test desc_combo in test_descriptor
Remove desc_counter and aai_counter.
If less than 10 AAI Indices or Descriptors being encoded then print out else dont. Slight error when erroneauous index input this still outputs. Also model_parameters is empty.
Finish encoding terminal outputs from desc and aai + desc.
Check columns generated from aai_encoding follow format aai_X.
Unit test columns follow format aai_X...
In utils.save_results, double check that input parameter doesnt already have an extension on it.
Complete test_model feature_selection unit tests.
Remove rfft from pyDSP.
Finish window and filter unit tests pyDSP.
pyDSP encode_seqs(), window <> window_type
for aai_desc_encoding in pySAR.py, check list of indices is split up into str.
Test export of results: test output folder is created, import csv, double check columns, length etc, delete folder.
Add output folder arg to encoding functions.
Remove create output dir function in utils.
Incorporate output_folder into unit tests for encoding and pysar.
Unit test that plot png exists in output folder.
In unit tests for hyperparameter_tuning, pass in parameter grid rather than just parameters themselves.
If calculating only 1 descriptor then could remove the progress bar, not really needed.
Add bibtex citation into its own txt file.
Is .coveralls.yml needed.
In encoding text output, split list of aai indices and descriptors into new line if they exceed the number of characters in "#" line.
Use text wrapper for model_parameters
test_window in pyDSP, testing all window inputs from config, same with test_filter.
Can pass in dict of parameters directly into class input parameters instead of just filename.
Add results from research - https://drive.google.com/drive/folders/1AO71jZ7-uZDJXlHT_F3baAs09Tww5cum?usp=share_link
Is all_desc parameter in config needed?
Dont think model_params in model.py is working.
chebwin.code.co_varnames is bringing up extra parameters that aren't in source code, maybe use another dict. co_varnames brings back all the varnames not actual input parameters. Use inspect.getargspec(chebwin).args
Need to update the above for model_parameters in model.py
Remove file=sys.stdout from tqdm Encoding function, test if it works. Add mininterval=30 to tqdms.
Combine with.selfassertRaises value/type error into one test unit rather than seperating.
Update "Get record from AAIndex database:" in readme.
Run vulture library to search package and remove any unused code/vars.
Go through all unit tests, any tests that are wrapped in with.selfAssertRaises()... , remove var assignment and just call function.

Back to top

Mention that individual descriptors are explaiend in the protpy package. Mention protpy in pySar demo.
Use **kwargs in class contstructor to be able to pass in specific parameter values, override the config file, if applicable.
Change all config files to not use_dsp by default.
Change all comment underlining from "------" to "=======".
Unit tests that include passing keyword args into classes.
In encoding class, remove minintervals from tqdm.
In descriptors module, remove "Getting descriptor"...
After encoding, when outputting parameters, ",".join() on list of descriptors, currently the [] are being output as well.
Change self.seq_len to self.sequence_length.
Passing in comma seperated string into PySAR.encode_descriptor func takes the last descr mentioned in it (e,g conjoint_triad_geary_auto_descriptor = "conjoint_triad, geary_autocorrelation", will take geary_auto).
Individual encode functions in pySAR and Encoding class should accept a string or list of descriptors/aai indices.
Add my own paper reference to References section on readme.
For pysar.encode_descriptor, pysar.encode_aai and pysar.encode_aai_descriptor functions, there doesnt seem to be any functionality to support list of indices and or descriptors atm.
Encoding functions in pySAR used for concatenating multiple descriptors etc.
Encoding functions in Encoding used for encoding multiple descriptors seperately.
For descriptor concatenations, maybe have a concat flag that if set to True will concat the multiple descriptors inoput.
Read over and update comments.
In encoding.py functions, if the same index/descriptor is put in twice, ensure it isn't duplicated.
Order indices alphabetically.
Some test outputs when displaying list of parameters have "invalid_aaindex_code" or "invalid_descriptor_name"
Disable tqdm using disbale flag if less than 5 or so AAI indices being calcualted.
Return error if invalid aai indices/descriptors - don't print out parameters text if invalid.
Go over files and folders in pypi package, remove tests.
Add feature space dimensions - add unit tests.
After encoding in pysar.py check class variables have been set.
aai_indices = ["MUNV940104", "ZASB820101"] / aai_descriptor_encoding = pysar.encode_aai_descriptor(aai_indices=aai_indices, descriptors="sequence_order_coupling_number") - puts Index output in [].
encoding = Encoding(config_file='enantioselectivity.json', use_dsp=False) - should not bring up DSP parameters.
Remove textwrapper, change to textwrap.fill
Reorder parameters, have test split at bottom fo encoding parameters text
Add config file to list of parameters in output.
Mention number of tests and test cases in /tests readme - 51 tests, 6 test cases.
Recalculate and reupload descriptors_thermostability.csv.
Add info about the colunns and dimensions of each descriptors in pre-calculated csv file - fix Issue.
When calculating all descriptors (get_all_descriptors(export=True)), add some sort of print/tracking functionality.
Double check all links in readme.
Add dimensions of each dataset to https://github.com/amckenna41/pySAR/tree/master/example_datasets.
Go over references in descriptors module - refer to protpy.
Update distance matrices in configs - test once protpy published.
Add link to medium article.
Update aaindex version on readme.
Add elapsed time for each case study - calculating protein descriptors on demo.
readthedocs(https://github.com/MartinThoma/propy3/tree/master).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TODO.md

TODO.md

Files

TODO.md

Latest commit

History

TODO.md

File metadata and controls