You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is something I'd like to consider implementing, in the hopes that it would be used in one of the examples for the paper. It might not be necessary, but I do want to discuss it. @evancofer what do you think we'd need to train a model on RNA data? Only w.r.t a potential RNA Sequence class for now. (When you have coordinates data, do you also get the full sequence from a FASTA file?)
The text was updated successfully, but these errors were encountered:
This differs slightly depending on whether we want mRNA, pre-mRNA, and so on. However, as long as we use transcript or gene coordinates, things are simple. If we want mRNAs then the simplest solution is to just use a distinct FASTA file that just lists transcripts with "transcript" instead of "chrom", and uses coordinates within the transcript. This doesn't really require altering the genome type significantly. If we wanted pre-mRNA, we just include intronic regions in the FASTA file of genes.
The real difficulty occurs when we want to use genomic coordinates and not just gene coordinates. In this case, we have to keep the gene definitions as well as the genome in memory. We then transform the genomic coordinates into gene coordinates on the fly. This seems like it would require a fast coordinate or interval map, so that we can randomly access a coordinate or region and pull out the gene definition required.
This is something I'd like to consider implementing, in the hopes that it would be used in one of the examples for the paper. It might not be necessary, but I do want to discuss it. @evancofer what do you think we'd need to train a model on RNA data? Only w.r.t a potential RNA Sequence class for now. (When you have coordinates data, do you also get the full sequence from a FASTA file?)
The text was updated successfully, but these errors were encountered: