Skip to content

Latest commit

 

History

History
278 lines (277 loc) · 19.9 KB

TODO.md

File metadata and controls

278 lines (277 loc) · 19.9 KB

To Do List:

  • Add Config files instead of individual parameters in function
  • Add Manifest file?
  • Add License & Releases
  • Add distance matrices to SOCNum
  • Add coveralls
  • Change descriptor functions to accept sequences themselves.
  • Add references for all descriptors
  • Look over and add some encoding unit tests.
  • Update version in init
  • Look into pssm: https://nanx.me/protr/reference/extractPSSM.html
  • Change AAP parameter in descriptors.
  • Split up CTD to C, T and D.
  • Add lag arg to Autocorrelation descriptors.
  • Descriptors accewpt AAI code for property names.
  • Add DPC and TPC lists to files.
  • Correct supplementary_materials
  • Add example comments to func definitions
  • Installation from source in readme
  • Parse descriptors into embedded JSON format
  • Seperate pySAR and descriptors into different softwares??
  • Change ProDSP to pyDSP
  • Update comments in funcs to that of DCBLSTM_PSP
  • Add config file to README
  • Change Explained Var to Explained Variance
  • Add : to parameters in functions
  • Change descriptor comments to account for each descriptor func working per sequence.
  • Change workflow name from deploy_to_testpypi -> deploy_testpypi
  • Move parse functions to bottom of file
  • Change lambda (key word) to lamda
  • Remove parameter instances of class Object in comments?
  • Add filter/window/convolution metaparameters to config
  • Remove 'Description' from comments of functions
  • Ensure outputs are Series not DF's
  • Lower case input before searching DB in func get_record_from_name
  • Change get_record_from_name to get_record_from_desc
  • For each descriptor function, if input is single string (single seq) return
  • Add wiki to pySAR
  • Catch TypeError on each descriptor func (add typerror to descriptor tests)
  • Create a 2nd branch with results in it
  • Swap any 'pySAR' for self.pySAR_module_path = os.path.dirname(os.path.abspath(sys.modules[self.module].file))
  • I think AAI_descriptor encoding func is adding empty descriptor DF's onto AAI indices
  • Amp pseudo aacomp descriptor func.
  • Add print parameter to encoding functions.
  • Add readthedocs badge - Documentation Status
  • Update AAIndex module to use similar setup as aaindex package - removeing aaindex.py and test_aaindex.py files
  • - Add images to pypi - https://stackoverflow.com/questions/41983209/how-do-i-add-images-to-a-pypi-readme-that-works-on-github
  • Add emojis to readme
  • read the docs
  • Upload bandit, package safety check etc to GCP bucket
  • Add software dev report?
  • Make demo youtube vid.
  • Add future section to readme.
  • Read the docs (.readthedocs.yml)
  • Front-end react app.
  • Check variable naming conventions (https://peps.python.org/pep-0008/)
  • Check output of bandit and flake8 check.
  • Add updates.md file which outlines changes bw versions.
  • Remove travis.ci workflows.
  • Add maintainer and keywords to setup.py.
  • Remove travis ci from readme.
  • Remove some python versions from setup.py.
  • Fix "How to Cite" section to display citation on multiple lines.
  • Add info about different branch in repo, maybe even remove it and move ot Google Drive.
  • Correct naming in setup.cfg.
  • Add "python3 -m twine upload --repository testpypi dist/*" to test pypi workflow.
  • Change "secrets.PY_PI..." to "secrets.PYPI...".
  • Remove .Ds_Store and any pycache from github upload.
  • Update build/test/deploy workflow syntax, look at iso3166-updates workflow for reference.
  • Add comments to .circle/workflow.
  • Add spacing in between individual references, see if it improves readability, revert if not.
  • Include range of values for each activity value in example_datasets/readme.
  • Finish Config.md. Mention in pySAR/readme that explanation of all params available in config.md.
  • Update readme examples of software to include updated import structure of.
  • Fix badges on main repo.
  • Can remove aaindex files from pySAR/pySAR/data as these are included in aaindex package now.
  • Remove license.txt from pySAR/pySAR
  • Could remove aaindex1 & aaindex1.json file from tests/test_data.
  • Remove .coveralls.yml
  • Move pySAR demo to Google Colab.
  • In descriptors/readme go into more detail about what each descriptor does.
  • Add documentation section in readme.
  • Add Assertion message to some unit tests.
  • Change mentions of "descriptor_paramters" to "descriptor_properties".
  • Results branch - remove everything but results dir.
  • Be able to pass in JSON object to Encoding object. Parse objects in JSON that are present, all don't have to be but most important ones need to be there.
  • Change Get record from AAIndex database section in readme to show new way aaindex is imported.
  • Update refs on main readme to actually include authors/titles of works.
  • Remove travis ci badge from readme.
  • JSON object of descriptors can be passed into Descriptors class.
  • Add path-ignore keywords to GitHub Action.
  • Reorder software metadata in setup.py to be in order of main func, create description var.
  • Add download_url to setup.py - url of zipped package.
  • In some unit tests, may need to use self.assertAlmostEqual instead of self.assertEqual.
  • Remove 'get_' from functions.
  • Make self.params & other dicts in pySAR accessible via dot notation.
  • Remove all camel casing function names/vars, change to underscores and lowercase (https://peps.python.org/pep-0008/#function-and-variable-names).
  • Usage example using fasta import function/module.
  • Use Map class to allow for config file and parameters accessible via dot notation.
  • In config files, change 'comp' to 'composition'
  • Change protpy.aa_composition -> protpy.amino_acid_composition
  • Add normalize parameter to each autocorrelation func & config.
  • Change all references of normalized_moreaubroto_autocorrelation to moreaubroto_autocorrelation
  • Change 'Amp' -> 'amphiphilic' - config.md
  • Change seq_order_... -> sequence_order... , quasi_seq_order -> quasi_sequence_order in config.md.
  • Mention custom built aaindex & protpy softwares.
  • Change sec_struct to secondary_struct.
  • Add list of test cases to comments in each test suite.
  • Change explainedVar -> explained_var.
  • Test dtype of columns in output dataframes.
  • Test correct naming convention for descriptor dataframe names using regex.
  • Mention lag is similar to gap between 2 amino acids.
  • Add all SOCN to configs.
  • For Pseudo AAC, default hard-coded props of hydrophobicity, hydrophilicity and residue mass are used. User can input custom aaindex1 property codes, set prooperty config key to blank.
  • Add Amphiphilic to config files and MD - inputs of lamda, weight and properties.
  • Add space between key and val in configs.
  • Prepend 'ctd_' to the ctd descriptors names, attributes and function names.
  • Add reference numbers to comments in descriptor functions - double check existing ones are correct, reorder them.
  • Change QSOrder to QSO.
  • Rewrite APAAComp descriptor comments to mention its dimensions change with lamda.
  • Wrap all if statements in brackets.
  • Remove convolution from pyDSP and config.
  • Move Map class to utils.
  • Ensure all functions have Parameters and Returns in the comments, even if they are None
  • Add filter function to pyDSP.
  • Change self.spectra to self.spectrum in pyDSP.
  • In function comments change default = X to default=
  • For pseudo and amp composition, only test on 1 dataset as takes to long with all of them.
  • Double check output datatype of encoding functions, Series or DF?
  • Unit test for get_aai_encoding func, pass in list of multiple indices, test that dimension of output is 2 * length of sequences.
  • Remove string section of get_aai_encoding and just cast input to list if its a string.
  • Error when passing in single index into encode_aai (N, A, K, H, 9, 2, 0, 1, 0, 2).
  • Unit test dtypes of each column in aai encoding - ['Index', 'Category', 'R2', 'RMSE', 'MSE', 'RPD', 'MAE', 'Explained Variance']
  • Test dtypes of activity column outputs.
  • Need to change max_lag to lag in protpy.quasi_sequence_order when new protpy software releases.
  • In get_descriptor_encoding function, parse input as list, seperate multiple descriptors on comma.
  • Potential issue when concatenating conjoint triad descriptor output with aa_indices descriptor as column names may clash.
  • Change aaindex column names from incrementing numbers to - "aa_1", "aa_2" ...
  • Update aai encoding unit tests to take new naming convention into consideration.
  • Ensure output from encoding funcs is DF not a Series.
  • Remove "Getting X Descriptor" etc?
  • Rename software from pySAR -> pysar.
  • Go over import_descriptors func.
  • In test_descriptors, check if double import of descriptors module is needed.
  • Mention aaindex and protpy in readme of pysar.
  • Calcualte descriptor values for each example dataset and upload to repo, using default params in config, if file size not too big.
  • Replace descriptors csv with updated csv.
  • Rerun get all descriptors func on colab to take into account new conjoint triad and CTD column names.
  • If ["ctd"]["all"] = true this calculates ALL CTD descriptors for all 7 properties, if not true then CTD descriptors are calculated individually.
  • Remove ctd_comp, distr, trans descriptors, just use parent CTD descriptors and slice from it.
  • Python unit tests using ctd with 1 property, and using all properties, check dimensions - 21 vs 147 (147/21=7). 21 dimensions per property. 3 C, 3 T, 15 D.
  • Add spaces to test config files.
  • SOCN tests with distance matrix in config empty & non-empty, different SOCN functions.
  • def quasi_sequence_order() - dimesnion (1,lag). def quasi_sequence_order_all() - dimension (1,lag*2)
  • Test descriptor import function.
  • If no extension on config param, dataset and or descriptors csv param then apend csv to it.
  • Input dataset can be in txt or csv form.
  • On encoding.descriptor_encoding(), parameters in comments is empty:
  • Add all pre-calculated descriptors csv to example_datasets.
  • Double check difflib/closeness_matches on acitivty and sequence cols.
  • Remove get_seqs from pySAR.
  • Change self.activity -> self.activity_col, set self.activity to the actual column data.
  • Change all references to config_path to config_file, including dsp_config.
  • in aai_encoding func in Encoding, reorder columns such that MAE is before RPD.
  • when testing desc and aai + desc endoing, use test config with and without pre-calcualted descriptors csv.
  • Ensure example_datasets isnt in software packaging.
  • Pretty print json when printing parameters in Encoding functions.
  • Sort by for RMSE and MSE incorrect, smallest values should be first, largest values last. Sort asc instead of sort desc.
  • If None or empty params input to encoding functions then use all aai and or descriptors, if not then raise value error if invalid desc or aai input.
  • Passing in invalid_test_desc5 = "invalid_descriptor_name" to descriptor_encoding func should return value error or similar.
  • Conjoint triad and CTD cols overlapping when importing.
  • CTD Transition getting replaced by Distribution - double check. CTD_T not in exported CSV.
  • Rerun get all descriptors for each dataset.
  • Use python venv to run unit tests.
  • desc_config input parameter can be a filepath to the config file - descriptors.py
  • CTD columns are repeating twice in output csv.
  • Remove property key from ctd_comp, ctd_distr, ctd_trans and from config.md
  • Double check concatenated AAI columns have prefix aai_. Test this in test_encoding unit tests.
  • Only generate and or upload coverage report for one Python version in workflows.
  • Input X and Y into Model class, initialise in constructor.
  • Change test_size param to test_split.
  • Best params is empty when outputting hyperparameter results. Use default params if params in config is {}.
  • In Encoding output Change AAI Indices -> 1 to Using AAI Indices -> 1, Descriptors -> 1 to Using Descriptors -> 1
  • When reading in descriptor name, lowercase, if there's spaces, seperate with underscores.
  • Add results from research folder to Google Drive, mention in Research Article section. Mention pre-calculated descriptors from same section.
  • Remove 2 distance matrices from pySAR/data, now a part of protpy package.
  • Remove manifest file after removal of pySAR/data.
  • Upload pySAR demo as ppt rather than .key.
  • Double check what happens when dict not passed into Map class, should error be rasied? Reflect change in aaindex.
  • Remove get_protein module and references to it.
  • Add circleci badge back into repo now that it's sorta working.
  • In hyperparameter tuning results change CV to Number of cross-validation folds etc.
  • Less verbose output for hyperparameter tuning.
  • str of Desscriptor class displays all descriptor names and shapes.
  • Remove "descriptors" from config, move csv param to "desc_properties", rename desc_properties -> descriptors.
  • Organise config, newline for [] and {}.
  • Change all references of lamda to lambda.
  • Remove cutoff index.
  • Unit test desc_combo in test_descriptor
  • Remove desc_counter and aai_counter.
  • If less than 10 AAI Indices or Descriptors being encoded then print out else dont. Slight error when erroneauous index input this still outputs. Also model_parameters is empty.
  • Finish encoding terminal outputs from desc and aai + desc.
  • Check columns generated from aai_encoding follow format aai_X.
  • Unit test columns follow format aai_X...
  • In utils.save_results, double check that input parameter doesnt already have an extension on it.
  • Complete test_model feature_selection unit tests.
  • Remove rfft from pyDSP.
  • Finish window and filter unit tests pyDSP.
  • pyDSP encode_seqs(), window <> window_type
  • for aai_desc_encoding in pySAR.py, check list of indices is split up into str.
  • Test export of results: test output folder is created, import csv, double check columns, length etc, delete folder.
  • Add output folder arg to encoding functions.
  • Remove create output dir function in utils.
  • Incorporate output_folder into unit tests for encoding and pysar.
  • Unit test that plot png exists in output folder.
  • In unit tests for hyperparameter_tuning, pass in parameter grid rather than just parameters themselves.
  • If calculating only 1 descriptor then could remove the progress bar, not really needed.
  • Add bibtex citation into its own txt file.
  • Is .coveralls.yml needed.
  • In encoding text output, split list of aai indices and descriptors into new line if they exceed the number of characters in "#" line.
  • Use text wrapper for model_parameters
  • test_window in pyDSP, testing all window inputs from config, same with test_filter.
  • Can pass in dict of parameters directly into class input parameters instead of just filename.
  • Add results from research - https://drive.google.com/drive/folders/1AO71jZ7-uZDJXlHT_F3baAs09Tww5cum?usp=share_link
  • Is all_desc parameter in config needed?
  • Dont think model_params in model.py is working.
  • chebwin.code.co_varnames is bringing up extra parameters that aren't in source code, maybe use another dict. co_varnames brings back all the varnames not actual input parameters. Use inspect.getargspec(chebwin).args
  • Need to update the above for model_parameters in model.py
  • Remove file=sys.stdout from tqdm Encoding function, test if it works. Add mininterval=30 to tqdms.
  • Combine with.selfassertRaises value/type error into one test unit rather than seperating.
  • Update "Get record from AAIndex database:" in readme.
  • Run vulture library to search package and remove any unused code/vars.
  • Go through all unit tests, any tests that are wrapped in with.selfAssertRaises()... , remove var assignment and just call function.

Back to top

  • Mention that individual descriptors are explaiend in the protpy package. Mention protpy in pySar demo.
  • Use **kwargs in class contstructor to be able to pass in specific parameter values, override the config file, if applicable.
  • Change all config files to not use_dsp by default.
  • Change all comment underlining from "------" to "=======".
  • Unit tests that include passing keyword args into classes.
  • In encoding class, remove minintervals from tqdm.
  • In descriptors module, remove "Getting descriptor"...
  • After encoding, when outputting parameters, ",".join() on list of descriptors, currently the [] are being output as well.
  • Change self.seq_len to self.sequence_length.
  • Passing in comma seperated string into PySAR.encode_descriptor func takes the last descr mentioned in it (e,g conjoint_triad_geary_auto_descriptor = "conjoint_triad, geary_autocorrelation", will take geary_auto).
  • Individual encode functions in pySAR and Encoding class should accept a string or list of descriptors/aai indices.
  • Add my own paper reference to References section on readme.
  • For pysar.encode_descriptor, pysar.encode_aai and pysar.encode_aai_descriptor functions, there doesnt seem to be any functionality to support list of indices and or descriptors atm.
  • Encoding functions in pySAR used for concatenating multiple descriptors etc.
  • Encoding functions in Encoding used for encoding multiple descriptors seperately.
  • For descriptor concatenations, maybe have a concat flag that if set to True will concat the multiple descriptors inoput.
  • Read over and update comments.
  • In encoding.py functions, if the same index/descriptor is put in twice, ensure it isn't duplicated.
  • Order indices alphabetically.
  • Some test outputs when displaying list of parameters have "invalid_aaindex_code" or "invalid_descriptor_name"
  • Disable tqdm using disbale flag if less than 5 or so AAI indices being calcualted.
  • Return error if invalid aai indices/descriptors - don't print out parameters text if invalid.
  • Go over files and folders in pypi package, remove tests.
  • Add feature space dimensions - add unit tests.
  • After encoding in pysar.py check class variables have been set.
  • aai_indices = ["MUNV940104", "ZASB820101"] / aai_descriptor_encoding = pysar.encode_aai_descriptor(aai_indices=aai_indices, descriptors="sequence_order_coupling_number") - puts Index output in [].
  • encoding = Encoding(config_file='enantioselectivity.json', use_dsp=False) - should not bring up DSP parameters.
  • Remove textwrapper, change to textwrap.fill
  • Reorder parameters, have test split at bottom fo encoding parameters text
  • Add config file to list of parameters in output.
  • Mention number of tests and test cases in /tests readme - 51 tests, 6 test cases.
  • Recalculate and reupload descriptors_thermostability.csv.
  • Add info about the colunns and dimensions of each descriptors in pre-calculated csv file - fix Issue.
  • When calculating all descriptors (get_all_descriptors(export=True)), add some sort of print/tracking functionality.
  • Double check all links in readme.
  • Add dimensions of each dataset to https://github.com/amckenna41/pySAR/tree/master/example_datasets.
  • Go over references in descriptors module - refer to protpy.
  • Update distance matrices in configs - test once protpy published.
  • Add link to medium article.
  • Update aaindex version on readme.
  • Add elapsed time for each case study - calculating protein descriptors on demo.
  • readthedocs(https://github.com/MartinThoma/propy3/tree/master).