Skip to content

Database maintenance workflow (curated genomes, reps, pangenome ranks, merged) #3782

@ccbaumler

Description

@ccbaumler

Pull request that @bluegenes and I have been working on and off over the last year to make database generation much easier.
sourmash-bio/database-releases#6

This workflow creates a database from the current assembly_summary NCBI information. This includes removing the redacted genome assemblies while updated the genomes to the current version in NCBI.

There is a workflow for updating GTDB and Genbank domain databases with a config file for workflow variables.

Do you think it would be worth the diskspace for there to be databases for:

  1. genomes
  2. reps
  3. species
  4. genus
  5. family
  6. everything merged

I've built all the ranktables for gtdb at a species level and I think there should be an easy way of concatenating the species level ranktables into any high rank so there is no need to duplicate them. That said, should I create a ranktable directory for every species database to have existing for easy comparisons?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions