Implementation of additional processing steps in postoga following current TOGA dir structure #141
Replies: 9 comments
-
no problem. I have tar.gz a full TOGA run with the temp dir and everything else, and this will be on https://genome.senckenberg.de/download/forAlejandroTOGAdir/ in a minute. Thx |
Beta Was this translation helpful? Give feedback.
-
I believe worth documenting too https://github.com/hillerlab/TOGA/wiki/Output-directory-structure |
Beta Was this translation helpful? Give feedback.
-
Just finished all the new changes to postoga. I wanted to quickly note that the bug in #136 about
Is now corrected. It did not have anything to do with postoga now follows the most recent dir structure pointed out by Bogdan. Also, new features have been added, including: 1) ortholog length distributions, 2) orthology score distributions, and 3) pseudo-BUSCO completeness analysis.
I also have tried to fix the nomenclature problem (some users that prefer entrez IDs or gene names instead of ensembl IDs). A --source flag has now been introduced to ask the users if they want to use an ensembl, entrez or gene_names background. This option maps to your Ancestral placental DB (now with gene_names and entrez IDs) and also with the BUSCO DBs (also with gene_names and entrez IDs). I think this wraps up most of the changes. However, is probable that some error/bugs might have slipped through my tests. All the possible feedback will be very appreciated! Best, |
Beta Was this translation helpful? Give feedback.
-
Fantastic. We will give it a try. |
Beta Was this translation helpful? Give feedback.
-
One question. How do you determine which BUSCO ortholog corresponds to a TOGA annotated gene? |
Beta Was this translation helpful? Give feedback.
-
Thanks for this question. There are some caveats here, that is why I called "pseudo-BUSCO":
Hope this clarify your question. Any feedback will be amazing to hear! Alejandro |
Beta Was this translation helpful? Give feedback.
-
It seems more appropriate to move this issue to the discussions section and give it a more suitable name. :) |
Beta Was this translation helpful? Give feedback.
-
Got it. Alejandro intersects BUSCO gene IDs (Ensembl) with the Ensembl IDs that TOGA annotated. Makes sense. Thx for explaining it. I agree, this should be documented and quickly explained, likely on the wiki. |
Beta Was this translation helpful? Give feedback.
-
Sweet! With this update postoga outputs both: ancestral placental (bartplot with % of ancestral classes and scatterplot with inactivated vs missing ancestral sequences, just like in your paper) and BUSCO, a quick overview of both gives you a solid initial interpretation of TOGA results. I can make myself some time to write that and specify all features, recommendations and stuff. Please let me know if you agree with that (this would also be in the README?). Also, I wanted to note that the last postoga release had some problems with dependencies (at least when I tried it in some VMs here). Solved that and build a conda env .yml to make it work easier. @MichaelHiller I send you an email yesterday I think, please let me know if you read it. |
Beta Was this translation helpful? Give feedback.
-
Hi @MichaelHiller and @kirilenkobm,
I am working in the next steps of postoga. Some of the previous reported problems ocurred because I was taking the old output dir structure. Would it be possible for you to share a random output dir (human-to-some species; preferred mammal)?. I do not have access to an HPC anymore, so I can't run TOGA at the time.
Hope everything is ok,
Alejandro
Beta Was this translation helpful? Give feedback.
All reactions