To Do List:
- Add Config files instead of individual parameters in function
- Add Manifest file?
- Add License & Releases
- Add distance matrices to SOCNum
- Add coveralls
- Change descriptor functions to accept sequences themselves.
- Add references for all descriptors
- Look over and add some encoding unit tests.
- Update version in init
- Look into pssm: https://nanx.me/protr/reference/extractPSSM.html
- Change AAP parameter in descriptors.
- Split up CTD to C, T and D.
- Add lag arg to Autocorrelation descriptors.
- Descriptors accewpt AAI code for property names.
- Add DPC and TPC lists to files.
- Correct supplementary_materials
- Add example comments to func definitions
- Installation from source in readme
- Parse descriptors into embedded JSON format
- Seperate pySAR and descriptors into different softwares??
- Change ProDSP to pyDSP
- Update comments in funcs to that of DCBLSTM_PSP
- Add config file to README
- Change Explained Var to Explained Variance
- Add : to parameters in functions
- Change descriptor comments to account for each descriptor func working per sequence.
- Change workflow name from deploy_to_testpypi -> deploy_testpypi
- Move parse functions to bottom of file
- Change lambda (key word) to lamda
- Remove parameter instances of class Object in comments?
- Add filter/window/convolution metaparameters to config
- Remove 'Description' from comments of functions
- Ensure outputs are Series not DF's
- Lower case input before searching DB in func get_record_from_name
- Change get_record_from_name to get_record_from_desc
- For each descriptor function, if input is single string (single seq) return
- Add wiki to pySAR
- Catch TypeError on each descriptor func (add typerror to descriptor tests)
- Create a 2nd branch with results in it
- Swap any 'pySAR' for self.pySAR_module_path = os.path.dirname(os.path.abspath(sys.modules[self.module].file))
- I think AAI_descriptor encoding func is adding empty descriptor DF's onto AAI indices
- Amp pseudo aacomp descriptor func.
- Add print parameter to encoding functions.
- Add readthedocs badge -
- Update AAIndex module to use similar setup as aaindex package - removeing aaindex.py and test_aaindex.py files
- - Add images to pypi - https://stackoverflow.com/questions/41983209/how-do-i-add-images-to-a-pypi-readme-that-works-on-github
- Add emojis to readme
- read the docs
- Upload bandit, package safety check etc to GCP bucket
- Add software dev report?
- Make demo youtube vid.
- Add future section to readme.
- Read the docs (.readthedocs.yml)
- Front-end react app.
- Check variable naming conventions (https://peps.python.org/pep-0008/)
- Check output of bandit and flake8 check.
- Add updates.md file which outlines changes bw versions.
- Remove travis.ci workflows.
- Add maintainer and keywords to setup.py.
- Remove travis ci from readme.
- Remove some python versions from setup.py.
- Fix "How to Cite" section to display citation on multiple lines.
- Add info about different branch in repo, maybe even remove it and move ot Google Drive.
- Correct naming in setup.cfg.
- Add "python3 -m twine upload --repository testpypi dist/*" to test pypi workflow.
- Change "secrets.PY_PI..." to "secrets.PYPI...".
- Remove .Ds_Store and any pycache from github upload.
- Update build/test/deploy workflow syntax, look at iso3166-updates workflow for reference.
- Add comments to .circle/workflow.
- Add spacing in between individual references, see if it improves readability, revert if not.
- Include range of values for each activity value in example_datasets/readme.
- Finish Config.md. Mention in pySAR/readme that explanation of all params available in config.md.
- Update readme examples of software to include updated import structure of.
- Fix badges on main repo.
- Can remove aaindex files from pySAR/pySAR/data as these are included in aaindex package now.
- Remove license.txt from pySAR/pySAR
- Could remove aaindex1 & aaindex1.json file from tests/test_data.
- Remove .coveralls.yml
- Move pySAR demo to Google Colab.
- In descriptors/readme go into more detail about what each descriptor does.
- Add documentation section in readme.
- Add Assertion message to some unit tests.
- Change mentions of "descriptor_paramters" to "descriptor_properties".
- Results branch - remove everything but results dir.
- Be able to pass in JSON object to Encoding object. Parse objects in JSON that are present, all don't have to be but most important ones need to be there.
- Change Get record from AAIndex database section in readme to show new way aaindex is imported.
- Update refs on main readme to actually include authors/titles of works.
- Remove travis ci badge from readme.
- JSON object of descriptors can be passed into Descriptors class.
- Add path-ignore keywords to GitHub Action.
- Reorder software metadata in setup.py to be in order of main func, create description var.
- Add download_url to setup.py - url of zipped package.
- In some unit tests, may need to use self.assertAlmostEqual instead of self.assertEqual.
- Remove 'get_' from functions.
- Make self.params & other dicts in pySAR accessible via dot notation.
- Remove all camel casing function names/vars, change to underscores and lowercase (https://peps.python.org/pep-0008/#function-and-variable-names).
- Usage example using fasta import function/module.
- Use Map class to allow for config file and parameters accessible via dot notation.
- In config files, change 'comp' to 'composition'
- Change protpy.aa_composition -> protpy.amino_acid_composition
- Add normalize parameter to each autocorrelation func & config.
- Change all references of normalized_moreaubroto_autocorrelation to moreaubroto_autocorrelation
- Change 'Amp' -> 'amphiphilic' - config.md
- Change seq_order_... -> sequence_order... , quasi_seq_order -> quasi_sequence_order in config.md.
- Mention custom built aaindex & protpy softwares.
- Change sec_struct to secondary_struct.
- Add list of test cases to comments in each test suite.
- Change explainedVar -> explained_var.
- Test dtype of columns in output dataframes.
- Test correct naming convention for descriptor dataframe names using regex.
- Mention lag is similar to gap between 2 amino acids.
- Add all SOCN to configs.
- For Pseudo AAC, default hard-coded props of hydrophobicity, hydrophilicity and residue mass are used. User can input custom aaindex1 property codes, set prooperty config key to blank.
- Add Amphiphilic to config files and MD - inputs of lamda, weight and properties.
- Add space between key and val in configs.
- Prepend 'ctd_' to the ctd descriptors names, attributes and function names.
- Add reference numbers to comments in descriptor functions - double check existing ones are correct, reorder them.
- Change QSOrder to QSO.
- Rewrite APAAComp descriptor comments to mention its dimensions change with lamda.
- Wrap all if statements in brackets.
- Remove convolution from pyDSP and config.
- Move Map class to utils.
- Ensure all functions have Parameters and Returns in the comments, even if they are None
- Add filter function to pyDSP.
- Change self.spectra to self.spectrum in pyDSP.
- In function comments change default = X to default=
- For pseudo and amp composition, only test on 1 dataset as takes to long with all of them.
- Double check output datatype of encoding functions, Series or DF?
- Unit test for get_aai_encoding func, pass in list of multiple indices, test that dimension of output is 2 * length of sequences.
- Remove string section of get_aai_encoding and just cast input to list if its a string.
- Error when passing in single index into encode_aai (N, A, K, H, 9, 2, 0, 1, 0, 2).
- Unit test dtypes of each column in aai encoding - ['Index', 'Category', 'R2', 'RMSE', 'MSE', 'RPD', 'MAE', 'Explained Variance']
- Test dtypes of activity column outputs.
- Need to change max_lag to lag in protpy.quasi_sequence_order when new protpy software releases.
- In get_descriptor_encoding function, parse input as list, seperate multiple descriptors on comma.
- Potential issue when concatenating conjoint triad descriptor output with aa_indices descriptor as column names may clash.
- Change aaindex column names from incrementing numbers to - "aa_1", "aa_2" ...
- Update aai encoding unit tests to take new naming convention into consideration.
- Ensure output from encoding funcs is DF not a Series.
- Remove "Getting X Descriptor" etc?
- Rename software from pySAR -> pysar.
- Go over import_descriptors func.
- In test_descriptors, check if double import of descriptors module is needed.
- Mention aaindex and protpy in readme of pysar.
- Calcualte descriptor values for each example dataset and upload to repo, using default params in config, if file size not too big.
- Replace descriptors csv with updated csv.
- Rerun get all descriptors func on colab to take into account new conjoint triad and CTD column names.
- If ["ctd"]["all"] = true this calculates ALL CTD descriptors for all 7 properties, if not true then CTD descriptors are calculated individually.
- Remove ctd_comp, distr, trans descriptors, just use parent CTD descriptors and slice from it.
- Python unit tests using ctd with 1 property, and using all properties, check dimensions - 21 vs 147 (147/21=7). 21 dimensions per property. 3 C, 3 T, 15 D.
- Add spaces to test config files.
- SOCN tests with distance matrix in config empty & non-empty, different SOCN functions.
- def quasi_sequence_order() - dimesnion (1,lag). def quasi_sequence_order_all() - dimension (1,lag*2)
- Test descriptor import function.
- If no extension on config param, dataset and or descriptors csv param then apend csv to it.
- Input dataset can be in txt or csv form.
- On encoding.descriptor_encoding(), parameters in comments is empty:
- Add all pre-calculated descriptors csv to example_datasets.
- Double check difflib/closeness_matches on acitivty and sequence cols.
- Remove get_seqs from pySAR.
- Change self.activity -> self.activity_col, set self.activity to the actual column data.
- Change all references to config_path to config_file, including dsp_config.
- in aai_encoding func in Encoding, reorder columns such that MAE is before RPD.
- when testing desc and aai + desc endoing, use test config with and without pre-calcualted descriptors csv.
- Ensure example_datasets isnt in software packaging.
- Pretty print json when printing parameters in Encoding functions.
- Sort by for RMSE and MSE incorrect, smallest values should be first, largest values last. Sort asc instead of sort desc.
- If None or empty params input to encoding functions then use all aai and or descriptors, if not then raise value error if invalid desc or aai input.
- Passing in invalid_test_desc5 = "invalid_descriptor_name" to descriptor_encoding func should return value error or similar.
- Conjoint triad and CTD cols overlapping when importing.
- CTD Transition getting replaced by Distribution - double check. CTD_T not in exported CSV.
- Rerun get all descriptors for each dataset.
- Use python venv to run unit tests.
- desc_config input parameter can be a filepath to the config file - descriptors.py
- CTD columns are repeating twice in output csv.
- Remove property key from ctd_comp, ctd_distr, ctd_trans and from config.md
- Double check concatenated AAI columns have prefix aai_. Test this in test_encoding unit tests.
- Only generate and or upload coverage report for one Python version in workflows.
- Input X and Y into Model class, initialise in constructor.
- Change test_size param to test_split.
- Best params is empty when outputting hyperparameter results. Use default params if params in config is {}.
- In Encoding output Change AAI Indices -> 1 to Using AAI Indices -> 1, Descriptors -> 1 to Using Descriptors -> 1
- When reading in descriptor name, lowercase, if there's spaces, seperate with underscores.
- Add results from research folder to Google Drive, mention in Research Article section. Mention pre-calculated descriptors from same section.
- Remove 2 distance matrices from pySAR/data, now a part of protpy package.
- Remove manifest file after removal of pySAR/data.
- Upload pySAR demo as ppt rather than .key.
- Double check what happens when dict not passed into Map class, should error be rasied? Reflect change in aaindex.
- Remove get_protein module and references to it.
- Add circleci badge back into repo now that it's sorta working.
- In hyperparameter tuning results change CV to Number of cross-validation folds etc.
- Less verbose output for hyperparameter tuning.
- str of Desscriptor class displays all descriptor names and shapes.
- Remove "descriptors" from config, move csv param to "desc_properties", rename desc_properties -> descriptors.
- Organise config, newline for [] and {}.
- Change all references of lamda to lambda.
- Remove cutoff index.
- Unit test desc_combo in test_descriptor
- Remove desc_counter and aai_counter.
- If less than 10 AAI Indices or Descriptors being encoded then print out else dont. Slight error when erroneauous index input this still outputs. Also model_parameters is empty.
- Finish encoding terminal outputs from desc and aai + desc.
- Check columns generated from aai_encoding follow format aai_X.
- Unit test columns follow format aai_X...
- In utils.save_results, double check that input parameter doesnt already have an extension on it.
- Complete test_model feature_selection unit tests.
- Remove rfft from pyDSP.
- Finish window and filter unit tests pyDSP.
- pyDSP encode_seqs(), window <> window_type
- for aai_desc_encoding in pySAR.py, check list of indices is split up into str.
- Test export of results: test output folder is created, import csv, double check columns, length etc, delete folder.
- Add output folder arg to encoding functions.
- Remove create output dir function in utils.
- Incorporate output_folder into unit tests for encoding and pysar.
- Unit test that plot png exists in output folder.
- In unit tests for hyperparameter_tuning, pass in parameter grid rather than just parameters themselves.
- If calculating only 1 descriptor then could remove the progress bar, not really needed.
- Add bibtex citation into its own txt file.
- Is .coveralls.yml needed.
- In encoding text output, split list of aai indices and descriptors into new line if they exceed the number of characters in "#" line.
- Use text wrapper for model_parameters
- test_window in pyDSP, testing all window inputs from config, same with test_filter.
- Can pass in dict of parameters directly into class input parameters instead of just filename.
- Add results from research - https://drive.google.com/drive/folders/1AO71jZ7-uZDJXlHT_F3baAs09Tww5cum?usp=share_link
- Is all_desc parameter in config needed?
- Dont think model_params in model.py is working.
- chebwin.code.co_varnames is bringing up extra parameters that aren't in source code, maybe use another dict. co_varnames brings back all the varnames not actual input parameters. Use inspect.getargspec(chebwin).args
- Need to update the above for model_parameters in model.py
- Remove file=sys.stdout from tqdm Encoding function, test if it works. Add mininterval=30 to tqdms.
- Combine with.selfassertRaises value/type error into one test unit rather than seperating.
- Update "Get record from AAIndex database:" in readme.
- Run vulture library to search package and remove any unused code/vars.
- Go through all unit tests, any tests that are wrapped in with.selfAssertRaises()... , remove var assignment and just call function.
- Mention that individual descriptors are explaiend in the protpy package. Mention protpy in pySar demo.
- Use **kwargs in class contstructor to be able to pass in specific parameter values, override the config file, if applicable.
- Change all config files to not use_dsp by default.
- Change all comment underlining from "------" to "=======".
- Unit tests that include passing keyword args into classes.
- In encoding class, remove minintervals from tqdm.
- In descriptors module, remove "Getting descriptor"...
- After encoding, when outputting parameters, ",".join() on list of descriptors, currently the [] are being output as well.
- Change self.seq_len to self.sequence_length.
- Passing in comma seperated string into PySAR.encode_descriptor func takes the last descr mentioned in it (e,g conjoint_triad_geary_auto_descriptor = "conjoint_triad, geary_autocorrelation", will take geary_auto).
- Individual encode functions in pySAR and Encoding class should accept a string or list of descriptors/aai indices.
- Add my own paper reference to References section on readme.
- For pysar.encode_descriptor, pysar.encode_aai and pysar.encode_aai_descriptor functions, there doesnt seem to be any functionality to support list of indices and or descriptors atm.
- Encoding functions in pySAR used for concatenating multiple descriptors etc.
- Encoding functions in Encoding used for encoding multiple descriptors seperately.
- For descriptor concatenations, maybe have a concat flag that if set to True will concat the multiple descriptors inoput.
- Read over and update comments.
- In encoding.py functions, if the same index/descriptor is put in twice, ensure it isn't duplicated.
- Order indices alphabetically.
- Some test outputs when displaying list of parameters have "invalid_aaindex_code" or "invalid_descriptor_name"
- Disable tqdm using disbale flag if less than 5 or so AAI indices being calcualted.
- Return error if invalid aai indices/descriptors - don't print out parameters text if invalid.
- Go over files and folders in pypi package, remove tests.
- Add feature space dimensions - add unit tests.
- After encoding in pysar.py check class variables have been set.
- aai_indices = ["MUNV940104", "ZASB820101"] / aai_descriptor_encoding = pysar.encode_aai_descriptor(aai_indices=aai_indices, descriptors="sequence_order_coupling_number") - puts Index output in [].
- encoding = Encoding(config_file='enantioselectivity.json', use_dsp=False) - should not bring up DSP parameters.
- Remove textwrapper, change to textwrap.fill
- Reorder parameters, have test split at bottom fo encoding parameters text
- Add config file to list of parameters in output.
- Mention number of tests and test cases in /tests readme - 51 tests, 6 test cases.
- Recalculate and reupload descriptors_thermostability.csv.
- Add info about the colunns and dimensions of each descriptors in pre-calculated csv file - fix Issue.
- When calculating all descriptors (get_all_descriptors(export=True)), add some sort of print/tracking functionality.
- Double check all links in readme.
- Add dimensions of each dataset to https://github.com/amckenna41/pySAR/tree/master/example_datasets.
- Go over references in descriptors module - refer to protpy.
- Update distance matrices in configs - test once protpy published.
- Add link to medium article.
- Update aaindex version on readme.
- Add elapsed time for each case study - calculating protein descriptors on demo.
- readthedocs(https://github.com/MartinThoma/propy3/tree/master).