-
Notifications
You must be signed in to change notification settings - Fork 52
Open
Description
Hello,
I'm running this workflow:
param (
// S3 path to 10x folder
tenx string
// Full s3 file location to put the sourmash signature
output string
// Size of kmer(s) to use
ksizes = "21,33,51"
// choose number of hashes as 1/scaled of input k-mers
scaled = 0
// Number of kmer hashes to use
num_hashes = 1000
// Calculate protein signature
protein = true
// Calculate DNA signature
dna = true
// Number of processes
processes = 8
// Name of the bam file in the tenx folder
BAM_FILENAME = "possorted_genome_bam.bam"
// Name of the single-column barcodes file in the tenx folder
BARCODES = "barcodes.tsv"
)
// Instantiate the system modules "files" (system modules begin
// with $), assigning its instance to the "files" identifier. To
// view the documentation for this module, run "reflow doc
// $/files".
val files = make("$/files")
val dirs = make("$/dirs")
sourmash := make("./sourmash.rf")
// bam2fastx Docker image
val bam2fastx = "czbiohub/bam2fastx"
// Compute a minhash signature for a sample
@requires(cpu := 4, mem := 16*GiB, disk := 4*GiB)
func TenXBamToFasta(tenx dir) = {
outdir := exec(image := bam2fastx) (output dir) {"
bam2fastx fasta {{tenx}} --all-cells-in-one-file --output {{output}}
"}
val (fasta, _) = dirs.Pick(outdir, "*.fasta")
// Return single fasta
fasta
}
// Instantiate Go system module "strings"
val strings = make("$/strings")
@requires(cpu := 1, mem := 16*GiB)
val Main = {
val tenx_folder = dir(tenx)
val (bam, _) = dirs.Pick(tenx_folder, "*.bam")
val (bai, _) = dirs.Pick(tenx_folder, "*.bai")
val (barcodes, _) = dirs.Pick(tenx_folder, BARCODES)
val renamed = map([(BAM_FILENAME, bam),
(BAM_FILENAME + ".bai", bai),
(BARCODES, barcodes)])
val minimal_tenx_dir = dirs.Make(renamed)
fasta := TenXBamToFasta(minimal_tenx_dir)
reads := [fasta]
singleton := false
sourmash_sketch := sourmash.Compute(reads, scaled, ksizes, protein,
dna, singleton)
files.Copy(sourmash_sketch, output)
}The data gets transferred just fine but then the reflow run command claims the job is running and yet the reflow ps command shows it is initializing. Who is right? I've been stuck at the "initalizing" phase for many hours for this file, this is just a fresh example to show the inputs.
Below is a screenshot of the output from this command:
reflow -log=debug -cache=off run -trace /home/olga/code/kmer-hashing/sourmash/maca/10x_spleen_kidney/../../../reflow/sourmash_compute_10x.rf -tenx s3://czbiohub-maca/10x_data/10X_P4_7 -output s3://olgabot-maca/10x/sourmash_compute/ksizes=21,27,33,51_num_hashes=5000/Spleen_10X_P4_7.sig -ksizes 21,27,33,51 -num_hashes 5000
Thank you!
Warmest,
Olga
Metadata
Metadata
Assignees
Labels
No labels
