Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate an evaluate function into model controller & a number of other refactoring steps towards a training and evaluation pipeline. #25

Merged
merged 17 commits into from
Apr 23, 2018

Conversation

kathyxchen
Copy link
Collaborator

@kathyxchen kathyxchen commented Apr 19, 2018

This pull request is still in progress. It will address multiple issues, which I will continue to update in this post.

After this pull request, I will avoid addressing multiple changes in a pull request as much as possible. I'll also go back to the pull request model that I started initially (submitting pull requests from a fork of the repository, rather than a branch in the FunctionLab repo).

Changes made:

  • Issue Update intervals sampler to be able to accept even numbered bin/window inputs. #22: intervals sampler now accepts inputs that are either both even or both odd (both being the sequence length and the center bin to predict genomic features).
  • Merged parameters.yml and paths.yml into a single file. I also removed the processing code for IntervalsSampler in selene.py to ensure that we are truly making the Sampler modules dynamically configurable just from the YAML file.
  • Added a PyTorch module that allows users to train sequence-level (usually DNA, like DeepSEA) models that are not strand specific. This module wraps the model that a user specifies and will take the mean or the max of the forward and reverse strand predictions made by the model inforward.
  • Issue model_train.py performance monitoring should be more flexible/modular #3
  • Track the genomic coordinates for the intervals sampler test set.
  • Issue speed up in silico mutagenesis #23: prelim implementation of in silico mutagenesis in this PR. I still need to document it, do benchmarking, and consider what happens with double/triple mutagenesis. (In first release may just note that we only support single mutagenesis).

Copy link
Collaborator Author

@kathyxchen kathyxchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To-do's for myself.

@@ -1,25 +1,37 @@
---
model: {
non_strand_specific_module: True,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to include a mode = "mean" or "max" parameter

test_holdout: [8, 9],
validation_holdout: [6, 7],
random_seed: 127,
sequence_length: 1001,
center_bin_to_predict: 201,
default_threshold: 0.5,
feature_thresholds: 0.5,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check that the proper handling for this input is in IntervalsSampler (or OnlineSampler).

def __init__(self, model, mode="mean"):
super(NonStrandSpecific, self).__init__()

print(mode)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove print statements


def forward(self, input):

reverse_input = flip(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is based on the assumption that the sequence is encoded in such a way that we can just "flip" the indices in the matrix and get the reverse sequence. I'll need to document that encoding

[default: False]
--verbosity=<level> Logging verbosity level (0=WARN, 1=INFO, 2=DEBUG)
[default: 1]
<config-yml> Model-specific parameters
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve the documentation here.

@@ -1,36 +1,109 @@
"""TODO: nothing in this file works right now. Please do not use/review
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file will be updated soon. It was originally implemented to account for the forward & reverse predictions but we are now just incorporating forward/reverse strand predictions directly into the model architecture.

@kathyxchen kathyxchen merged commit 671269e into master Apr 23, 2018
@kathyxchen kathyxchen deleted the train-and-eval branch May 4, 2018 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant