This app will take a comma-delimited file as an argument and column number as an input and print a file with Ensembl gene name has been converted to HUGO gene name.
- Use
git clone https://github.com/zmyyy3/trgn510_assignment4
to clone this application into your server. - "Homo_sapiens.GRCh37.75.gtf" is the reference file used to build a dictionary. You need to download this file by using
wget http://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/Homo_sapiens.GRCh37.75.gtf.gz
, and unzip this file by usinggunzip Homo_sapiens.GRCh37.75.gtf.gz
. - In the repository, "expres.anal.hugo.example.csv" is an example result of a unit test. If you want to take a unit test, you could download "expres.anal.csv" by using
wget https://github.com/davcraig75/unit/expres.anal.csv
, and run this application by usingpython ensg2hugo.py -f2 expres.anal.csv > expres.anal.hugo.csv
. Then you will get a file named "expres.anal.hugo.csv", and you can check if the "expres.anal.hugo.csv" is the same as "expres.anal.hugo.example.csv". - For the command
python ensg2hugo.py -f2 expres.anal.csv > expres.anal.hugo.csv
, "-f[0-9]" is an option for column number. In the example, we used command "-f2" because the Ensembl gene name is in the second column, and "2" means picking up second column. If there is no "-f", the first column will be picked up. Therefore, you can according to your file to change the column number with this option"-f[0-9]".
The dictionary in this application is not comprehensive, and some Ensembl gene names in your file may not have matched HUGO gene names, so it will show "Unknown" in the column of "gene_id".
- git
- wget
- sys
- re
- pandas
- csv