-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Dear authors,
Thank you for creating this tool. I was testing this tool in the following way: I created a metagenomic dataset consisting of 13 different salmonella strains with an abundance of 0.001 each, and one E.coli strain at 0.987. All strain reference genomes were downloaded from NCBI. When profiling this dataset with Kraken2 and bracken, I find some false positive species entering the data, and some deviations in abundance, however, the output is pretty good.
For snipe, I added the accession numbers and strain designation to dict target and dict template when they were not present for the REC module. I used the default clostridium genome in filter.fna and I added the paths to the reference genomes as targets for the MAP module.
I get the following output. I added the columns genome and truth, for the used genome and known input abundance that I simulated. The initial best hit output is copied from the output of the ID module, which is pretty close to the truth. However, after running REC, I do not see an improved in the output, or I am misunderstanding the output or way to run Snipe. Could you provide guidance on how to use Snipe to reduce false positive hits? Or, should I change how I run Snipe, and add more target or filter genomes? Here I only used the genomes I know to be in my simulated sample as targets.
Genome | truth | Initial Best Hit | Rectified Final Guess | Final Guess | Rectified Probability | SSR Aligned Reads | Rectified Abundance | Initial Abundance | Final Best Hit | Final Best Hit Read Numbers |
---|---|---|---|---|---|---|---|---|---|---|
E.coli strain A | 0.987 | 0.98406 | 0.00000 | 0.98717 | 0.00000 | 0.00000 | 0.00000 | 0.98718 | 0.98718 | 6581197.00000 |
Salmonella strain 1 | 0.001 | 0.00133 | 0.00000 | 0.00671 | 0.00000 | 0.00000 | 0.00000 | 0.00723 | 0.00723 | 48178.00000 |
Salmonella strain 2 | 0.001 | 0.00125 | 0.00000 | 0.00166 | 0.00000 | 0.00000 | 0.00000 | 0.00175 | 0.00175 | 11669.00000 |
Salmonella strain 3 | 0.001 | 0.00124 | 0.00004 | 0.00004 | 1.00000 | 6529.00000 | 0.00004 | 0.00004 | 0.00004 | 261.00000 |
Salmonella strain 4 | 0.001 | 0.00123 | 0.00000 | 0.00027 | 0.00000 | 0.00000 | 0.00000 | 0.00018 | 0.00018 | 1201.00000 |
Salmonella strain 5 | 0.001 | 0.00123 | 0.00000 | 0.00018 | 0.00000 | 0.00000 | 0.00000 | 0.00015 | 0.00015 | 999.00000 |
Salmonella strain 6 | 0.001 | 0.00120 | 0.00221 | 0.00221 | 1.00000 | 6529.00000 | 0.00180 | 0.00180 | 0.00180 | 12030.00000 |
Salmonella strain 7 | 0.001 | 0.00120 | 0.00001 | 0.00001 | 1.00000 | 6529.00000 | 0.00001 | 0.00001 | 0.00001 | 48.00000 |
Salmonella strain 8 | 0.001 | 0.00117 | 0.00000 | 0.00001 | 0.00000 | 0.00000 | 0.00000 | 0.00001 | 0.00001 | 82.00000 |
Salmonella strain 9 | 0.001 | 0.00117 | 0.00113 | 0.00113 | 1.00000 | 6529.00000 | 0.00115 | 0.00115 | 0.00115 | 7664.00000 |
Salmonella strain 10 | 0.001 | 0.00117 | 0.00042 | 0.00042 | 1.00000 | 6529.00000 | 0.00032 | 0.00032 | 0.00032 | 2104.00000 |
Salmonella strain 11 | 0.001 | 0.00116 | 0.00000 | 0.00000 | 1.00000 | 6529.00000 | 0.00000 | 0.00000 | 0.00000 | 24.00000 |
Salmonella strain 12 | 0.001 | 0.00115 | 0.00000 | 0.00002 | 0.00000 | 0.00000 | 0.00000 | 0.00002 | 0.00002 | 139.00000 |