Forensic analysis tool useful in backwards computing information from next-generation sequencing data and annotating splice junctions.
Explore the docs »
Request Feature
·
Report Bug
·
⭐ Consider starring the repo! ⭐
Notice:
ngsderive
is largely a forensic analysis tool useful in backwards computing information from next-generation sequencing data. Notably, most results are provided as a 'best guess' — the tool does not claim 100% accuracy and results should be considered with that understanding. An exception would be thejunction-annotation
tool which analyzes more concrete evidence than the other tools.
The following attributes can be guessed using ngsderive:
- Illumina Instrument. Infer which Illumina instrument was used to generate the data by matching against known instrument and flowcell naming patterns. Each guess comes with a confidence score.
- RNA-Seq Strandedness. Infer from the data whether RNA-Seq data was generated using a Stranded-Forward, Stranded-Reverse, or Unstranded protocol.
- Pre-trimmed Read Length. Compute the distribution of read lengths in the file and attempt to guess what the original read length of the experiment was.
- PHRED Score Encoding. Infers which encoding scheme was used to store PHRED scores as ASCII characters.
- Junction Annotation. Annotates splice junctions as novel, partial novel, or known in comparison to a reference gene model.
You can install ngsderive using the Python Package Index (PyPI).
pip install ngsderive
If you are interested in contributing to the code, please first review our CONTRIBUTING.md document.
To bootstrap a development environment, please use the following commands.
# Clone the repository
git clone [email protected]:stjudecloud/ngsderive.git
cd ngsderive
# Install the project using poetry
poetry install
ngsderive provides a (currently patchy) set of tests — both unit and end-to-end.
py.test
Contributions, issues and feature requests are welcome!
Feel free to check issues page. You can also take a look at the contributing guide.
This project is licensed as follows:
- All code related to the
instrument
subcommand is licensed under the AGPL v2.0. This is not due to any strict requirement, but out of deference to some code I drew inspiration from (and copied patterns from), the decision was made to license this code consistently. - The rest of the project is licensed under the MIT License - see the LICENSE.md file for details.
Copyright © 2020 St. Jude Cloud Team.