Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train Augustus and Train Snap need more flexibility in handling annotation data #6623

Open
wm75 opened this issue Dec 12, 2024 · 2 comments
Open

Comments

@wm75
Copy link
Contributor

wm75 commented Dec 12, 2024

Currently, both wrappers require maker and unconditionally run the maker2zff script with default settings over the annotation gff with the aim of only retaining high-quality annotations for training.

This approach is suboptimal in several ways:

  1. The default behavior of maker2zff is to filter features based on qi and aed attribute values if the feature states maker in its source column. When source is not maker no filtering is performed.

    All of this happens behind the scene without telling the user who doesn't know that maker and non-maker gffs are treated differently.

  2. The built-in filtering means the user cannot decide to apply less strict criteria than the default ones (unless they know about the secret workaround to disable default filtering by removing maker from the gff source column).

  3. If default filtering results in all features getting eliminated augustus and snap training fails with hard to diagnose errors.
    Augustus example: Error: training set file jwd05e/76659517/working/genome.gff3 has neither Genbank nor GFF nor FASTA format! from which it is very hard to deduce that genome.gff3 is the filtered intermediate file and that it's simply empty.

Suggestion:
Offer a separate wrapper for maker2zff with full control over settings and only suggest to filter the input gff in the downstream tools.

@wm75
Copy link
Contributor Author

wm75 commented Dec 12, 2024

@abretaud @rlibouba what do you think?

@abretaud
Copy link
Contributor

Yeah I think these training tools would need some love, I remember implementing it for use within a maker workflow, but something more agnostic would be much better.

I don't have much bandwidth at the moment to work on it, maybe Romane could help at some point. Of course if anyone proposes something I'd be happy to review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants