Incorporate an evaluate function into model controller & a number of other refactoring steps towards a training and evaluation pipeline. #25

kathyxchen · 2018-04-19T13:59:32Z

This pull request is still in progress. It will address multiple issues, which I will continue to update in this post.

After this pull request, I will avoid addressing multiple changes in a pull request as much as possible. I'll also go back to the pull request model that I started initially (submitting pull requests from a fork of the repository, rather than a branch in the FunctionLab repo).

Changes made:

Issue Update intervals sampler to be able to accept even numbered bin/window inputs. #22: intervals sampler now accepts inputs that are either both even or both odd (both being the sequence length and the center bin to predict genomic features).
Merged parameters.yml and paths.yml into a single file. I also removed the processing code for IntervalsSampler in selene.py to ensure that we are truly making the Sampler modules dynamically configurable just from the YAML file.
Added a PyTorch module that allows users to train sequence-level (usually DNA, like DeepSEA) models that are not strand specific. This module wraps the model that a user specifies and will take the mean or the max of the forward and reverse strand predictions made by the model inforward.
Issue model_train.py performance monitoring should be more flexible/modular #3
Track the genomic coordinates for the intervals sampler test set.
Issue speed up in silico mutagenesis #23: prelim implementation of in silico mutagenesis in this PR. I still need to document it, do benchmarking, and consider what happens with double/triple mutagenesis. (In first release may just note that we only support single mutagenesis).

…ove the odd seq length/bin length requirement in online sampler

…nce.py

… is implemented

…n strand specific module wrapper to use when sequence-level models can group predictions from the forward & reverse seqs

kathyxchen

To-do's for myself.

kathyxchen · 2018-04-19T14:26:01Z

config_examples/parameters.yml

@@ -1,25 +1,37 @@
 ---
+model: {
+    non_strand_specific_module: True,


need to include a mode = "mean" or "max" parameter

kathyxchen · 2018-04-19T14:26:36Z

config_examples/parameters.yml

    test_holdout: [8, 9],
    validation_holdout: [6, 7],
    random_seed: 127,
    sequence_length: 1001,
    center_bin_to_predict: 201,
-    default_threshold: 0.5,
+    feature_thresholds: 0.5,


check that the proper handling for this input is in IntervalsSampler (or OnlineSampler).

kathyxchen · 2018-04-19T14:27:34Z

models/non_strand_specific_module.py

+    def __init__(self, model, mode="mean"):
+        super(NonStrandSpecific, self).__init__()
+
+        print(mode)


remove print statements

kathyxchen · 2018-04-19T14:28:21Z

models/non_strand_specific_module.py

+
+    def forward(self, input):
+
+        reverse_input = flip(


This is based on the assumption that the sequence is encoded in such a way that we can just "flip" the indices in the matrix and get the reverse sequence. I'll need to document that encoding

kathyxchen · 2018-04-19T14:28:46Z

selene.py

-                            [default: False]
-    --verbosity=<level>     Logging verbosity level (0=WARN, 1=INFO, 2=DEBUG)
-                            [default: 1]
+    <config-yml>            Model-specific parameters


Improve the documentation here.

kathyxchen · 2018-04-19T14:29:36Z

selene/model_predict.py

@@ -1,36 +1,109 @@
-"""TODO: nothing in this file works right now. Please do not use/review


This file will be updated soon. It was originally implemented to account for the forward & reverse predictions but we are now just incorporating forward/reverse strand predictions directly into the model architecture.

… abstracted out

… specific module

kathyxchen added 8 commits April 15, 2018 12:58

update model controller and public-facing functions in sequences

b4020a8

starting on functions in model predict

225d889

fix merge conflict from master. this commit also contains code to rem…

10f95bf

…ove the odd seq length/bin length requirement in online sampler

adjust the style for config examples

5a68187

updated some of the static public facing functions to belong to seque…

f6f3fca

…nce.py

update the base_to_index map in genome based on how sequence encoding…

b36c764

… is implemented

merged paths and parameters.yml into a single config file, added a no…

894eb82

…n strand specific module wrapper to use when sequence-level models can group predictions from the forward & reverse seqs

update the model controller to handle logger initialization

773e87b

kathyxchen mentioned this pull request Apr 19, 2018

Check for a strand column in the tabix-indexed file #26

Open

kathyxchen commented Apr 19, 2018

View reviewed changes

evancofer mentioned this pull request Apr 20, 2018

Heatmap visualization of in silico mutagenesis and variant effect prediction for a single genomic feature. #27

Closed

kathyxchen added 9 commits April 21, 2018 16:27

add a save_datasets param in samplers

486e470

updating selene.py based on the new parameter input file

43eb02f

update the parameters file with non strand specific module params

18325f9

update the genomic features file with the feature thresholds function…

aff1a28

… abstracted out

remove the yaml file reading from utils

a45610d

add get_reverse_encoding function for now

2be8c01

remove the predict files for now

fc1da8d

update model train with the performance monitor

946936d

add a predict module and bug fixes to performance metrics, non strand…

d738481

… specific module

kathyxchen merged commit 671269e into master Apr 23, 2018

kathyxchen deleted the train-and-eval branch May 4, 2018 18:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incorporate an evaluate function into model controller & a number of other refactoring steps towards a training and evaluation pipeline. #25

Incorporate an evaluate function into model controller & a number of other refactoring steps towards a training and evaluation pipeline. #25

Uh oh!

kathyxchen commented Apr 19, 2018 •

edited

Loading

Uh oh!

kathyxchen left a comment

Uh oh!

kathyxchen Apr 19, 2018

Uh oh!

kathyxchen Apr 19, 2018

Uh oh!

kathyxchen Apr 19, 2018

Uh oh!

kathyxchen Apr 19, 2018

Uh oh!

kathyxchen Apr 19, 2018

Uh oh!

kathyxchen Apr 19, 2018

Uh oh!

Uh oh!

		@@ -1,36 +1,109 @@
		"""TODO: nothing in this file works right now. Please do not use/review

Incorporate an evaluate function into model controller & a number of other refactoring steps towards a training and evaluation pipeline. #25

Incorporate an evaluate function into model controller & a number of other refactoring steps towards a training and evaluation pipeline. #25

Uh oh!

Conversation

kathyxchen commented Apr 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kathyxchen left a comment

Choose a reason for hiding this comment

Uh oh!

kathyxchen Apr 19, 2018

Choose a reason for hiding this comment

Uh oh!

kathyxchen Apr 19, 2018

Choose a reason for hiding this comment

Uh oh!

kathyxchen Apr 19, 2018

Choose a reason for hiding this comment

Uh oh!

kathyxchen Apr 19, 2018

Choose a reason for hiding this comment

Uh oh!

kathyxchen Apr 19, 2018

Choose a reason for hiding this comment

Uh oh!

kathyxchen Apr 19, 2018

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kathyxchen commented Apr 19, 2018 •

edited

Loading