-
Notifications
You must be signed in to change notification settings - Fork 56
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
8 changed files
with
3,498 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
function gff_to_gene_name(infos, name_field) | ||
if infos == nil or #infos == 0 then | ||
return nil | ||
end | ||
local result = {} | ||
|
||
for i=1,#infos do | ||
info = infos[i] | ||
local s, e = string.find(info, name_field .. "=") | ||
if s ~= nil then | ||
name = info:sub(e + 1) | ||
s, e = string.find(name, ";") | ||
if e == nil then | ||
e = name:len() | ||
else | ||
e = e - 1 | ||
end | ||
|
||
result[name:sub(0, e)] = 1 | ||
end | ||
end | ||
local keys = {} | ||
for k, v in pairs(result) do | ||
keys[#keys+1] = k | ||
end | ||
return table.concat(keys,",") | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
[[annotation]] | ||
file="sample.gff3.gz" | ||
columns=[9] | ||
names=["xgene"] | ||
ops=["lua:gff_to_gene_name(vals, 'gene_name')"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
Annotating with GFF | ||
=================== | ||
|
||
GFF often contains the location of genes and features. | ||
It has a column that contains key=value pairs like the INFO | ||
field in a VCF. `vcfanno` does not support these natively, | ||
but one can access particular fields using lua. This is an example of | ||
how to do this. | ||
|
||
Given a variant in `sample.vcf` like: | ||
``` | ||
22 1000 DEL000SUR N <DEL> . PASS END=1051060 | ||
``` | ||
|
||
And genomic features annotated in `genes.gff3.gz`: | ||
|
||
``` | ||
22 ensembl_havana gene 69091 70008 . + . ID=ENST00000335137;geneID=ENSG00000186092;gene_name=OR4F5 | ||
22 ensembl mRNA 182393 184158 . + . ID=ENST00000624431;geneID=ENSG00000279928;gene_name=FO538757.3 | ||
22 ensembl mRNA 184923 195411 . - . ID=ENST00000623834;geneID=ENSG00000279457;gene_name=FO538757.2 | ||
22 ensembl mRNA 184925 195411 . - . ID=ENST00000623083;geneID=ENSG00000279457;gene_name=FO538757.2 | ||
22 ensembl mRNA 184927 200322 . - . ID=ENST00000624735;geneID=ENSG00000279457;gene_name=FO538757.2 | ||
22 ensembl_havana mRNA 450740 451678 . - . ID=ENST00000426406;geneID=ENSG00000278566;gene_name=OR4F29 | ||
22 ensembl_havana mRNA 685716 686654 . - . ID=ENST00000332831;geneID=ENSG00000273547;gene_name=OR4F16 | ||
22 havana mRNA 924880 939291 . + . ID=ENST00000420190;geneID=ENSG00000187634;gene_name=SAMD11 | ||
22 havana mRNA 925150 935793 . + . ID=ENST00000437963;geneID=ENSG00000187634;gene_name=SAMD11 | ||
... | ||
``` | ||
|
||
We can write a conf file [gff_sv.conf](https://github.com/brentp/vcfanno/blob/master/docs/examples/gff_sv/gff_sv.conf) | ||
``` | ||
[[annotation]] | ||
file="genes.gff3.gz" | ||
columns=[9] | ||
names=["gene"] | ||
ops=["lua:gff_to_gene_name(vals, 'gene_name')"] | ||
``` | ||
|
||
This will extract the full text of the 9th column. Then we can use some lua to get just the `gene_name` field. | ||
The lua uses a table so each observed gene is reported only once. | ||
This is in [gff_sv.lua](https://github.com/brentp/vcfanno/blob/master/docs/examples/gff_sv/gff_sv.lua) | ||
|
||
|
||
Then we can run `vcfanno` as: | ||
``` | ||
vcfanno -lua gff_sv.lua gff_sv.conf sample.vcf | ||
``` | ||
|
||
and `gene=FO538757.3,OR4F29,FO538757.2,OR4F16,OR4F5` is added to the INFO. The lua allows the user to do more complex things like only reporting genes, or taking uniq values. |
Oops, something went wrong.