You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NEAT parallelism should eventually make this not needed, but as a measure that will be easier to implement, we can offer a utility that will break up large genomes and reassemble them at the end. These could be add-ons that run beside NEAT for now, integrated later, or superceded by multi-threading, if possible.
We'd need two scripts:
Splitting script: breaks genome up by chromosome, or into large chunks of reads into unique fasta files. We would want to have user input the run configuration file and how to break it up (by chrom, by size (and if so, what size (712kb default or something))). Then the program would run, produce a folder with a set of input files and a set of configuration files matching those. Each file would get a unique name, be a valid FASTA file, have a name that could be reassembled back into the original chromosome name field. This may require a guidance document/index of some kind. For the FASTA, it would need to create some overlap segments in each file, so that reads don't have hard boundaries.
Perhaps another utility that can scan the above run folder and start an instance of NEAT for each? Would be tricky to manage with an unknown cluster to run on. Maybe better left to the user.
Stitching script: Joins the fastq/vcf/bam output from the split files back together. Note that a script like this existed in NEAT 2.0 somewhere. Basically, it would use the guidance document output from splitting, or just the order of the files, and stitch together all the output files. In the end we would want one master fastq, one master golden bam, one master golden vcf with all the output from the previous steps included.
The text was updated successfully, but these errors were encountered:
NEAT parallelism should eventually make this not needed, but as a measure that will be easier to implement, we can offer a utility that will break up large genomes and reassemble them at the end. These could be add-ons that run beside NEAT for now, integrated later, or superceded by multi-threading, if possible.
We'd need two scripts:
The text was updated successfully, but these errors were encountered: