Enhancements in vfdb_parser.py for VFDB full dataset support #320
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, when using the
getref vfbd_full (...)
command downloading the full VFDB dataset, it is not possible to proceed withariba preparef (...)
using the resulting reference data without manual changes to both the .fa and the .tsv files. This is, because the reference data set contains several pitfalls not adressed yet:ariba prepareref
.The modifications proposed here adress all shortcomings mentioned above.
Furthermore, the xls-derived metadata from VFDB explaining function and mechanism of a respective virulence gene (VFs.xls.gz, see VFs description file on VFDB download page) are included into the metadata.tsv derived from vfdb_parser to allow a more comprehensive view of the ariba variant calling results for working with VFDB.
Thank you for considering to merge for a future release.