Skip to content

MuhammedHasan/gencode_utr_fix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gencode UTR fix

Gencode GTF does not differentiate UTR as 5' and 3' UTR but annotates all of them as UTR unlike Ensembl GTF which annotates UTR as five_prime_utr and three_prime_utr. Thus, gencode annotation creates difficulty while studying UTR type-specific processes such as alternative polyadenylation.

This package fixes UTR features in the third columns of Gencode GTF by converting UTR annotation into five_prime_utr and three_prime_utr similar to Ensembl. Package compares the location of UTR with CDS in GTF and annotates UTRs as five_prime_utr if UTR is located before CDS and three_prime_utr if UTR is located after CDS.

Setup

pip install cython
pip install -e git+https://github.com/MuhammedHasan/gencode_utr_fix.git#egg=gencode_utr_fix

Run

gencode_utr_fix --input_gtf gencode.v29.annotation.gtf --output_gtf gencode.v29.annotation_utr.gtf

Test

pytest tests/

Cite

This package is based on pyranges and designed for lapa so cite the PyRanges and LAPA if you are using this package for research:

PyRanges: http://dx.doi.org/10.1093/bioinformatics/btz615

LAPA: https://www.biorxiv.org/content/10.1101/2022.11.08.515683v1

Releases

No releases published

Packages

No packages published

Languages