Skip to content

"Initializing" takes forever #103

@olgabot

Description

@olgabot

Hello,

I'm running this workflow:

param (
    // S3 path to 10x folder
    tenx string

    // Full s3 file location to put the sourmash signature
    output string

    // Size of kmer(s) to use
    ksizes = "21,33,51"

    // choose number of hashes as 1/scaled of input k-mers
    scaled = 0

    // Number of kmer hashes to use
    num_hashes = 1000

    // Calculate protein signature
    protein = true

    // Calculate DNA signature
    dna = true

    // Number of processes
    processes = 8

    // Name of the bam file in the tenx folder
    BAM_FILENAME = "possorted_genome_bam.bam"

    // Name of the single-column barcodes file in the tenx folder
    BARCODES = "barcodes.tsv"
)

// Instantiate the system modules "files" (system modules begin
// with $), assigning its instance to the "files" identifier. To
// view the documentation for this module, run "reflow doc
// $/files".
val files = make("$/files")
val dirs = make("$/dirs")


sourmash := make("./sourmash.rf")


// bam2fastx Docker image
val bam2fastx = "czbiohub/bam2fastx"


// Compute a minhash signature for a sample
@requires(cpu := 4, mem := 16*GiB, disk := 4*GiB)
func TenXBamToFasta(tenx dir) = {
    outdir := exec(image := bam2fastx) (output dir) {"
            bam2fastx fasta {{tenx}} --all-cells-in-one-file --output {{output}}
    "}

    val (fasta, _) = dirs.Pick(outdir, "*.fasta")

    // Return single fasta
    fasta
}



// Instantiate Go system module "strings"
val strings = make("$/strings")



@requires(cpu := 1, mem := 16*GiB)
val Main = {
    val tenx_folder = dir(tenx)
    val (bam, _) = dirs.Pick(tenx_folder, "*.bam")
    val (bai, _) = dirs.Pick(tenx_folder, "*.bai")
    val (barcodes, _) = dirs.Pick(tenx_folder, BARCODES)

    val renamed = map([(BAM_FILENAME, bam), 
        (BAM_FILENAME + ".bai", bai), 
        (BARCODES, barcodes)])
    val minimal_tenx_dir = dirs.Make(renamed)

    fasta := TenXBamToFasta(minimal_tenx_dir)
    reads := [fasta]

    singleton := false

    sourmash_sketch := sourmash.Compute(reads, scaled, ksizes, protein, 
        dna, singleton)
    files.Copy(sourmash_sketch, output)
}

The data gets transferred just fine but then the reflow run command claims the job is running and yet the reflow ps command shows it is initializing. Who is right? I've been stuck at the "initalizing" phase for many hours for this file, this is just a fresh example to show the inputs.

Below is a screenshot of the output from this command:

reflow -log=debug -cache=off run -trace /home/olga/code/kmer-hashing/sourmash/maca/10x_spleen_kidney/../../../reflow/sourmash_compute_10x.rf -tenx s3://czbiohub-maca/10x_data/10X_P4_7 -output s3://olgabot-maca/10x/sourmash_compute/ksizes=21,27,33,51_num_hashes=5000/Spleen_10X_P4_7.sig -ksizes 21,27,33,51 -num_hashes 5000

screen shot 2019-02-08 at 8 20 07 am

Thank you!
Warmest,
Olga

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions