Skip to content

Read failures: Future timed out after [60 seconds] #7726

@jeremylp2

Description

@jeremylp2

We're running into pipeline execution failing on our SGE cluster often due to errors like this.
Failed to read_int(<file_to_read>) (reason 1 of 1): "Future timed out after [60 seconds]."

Happens with cromwell 87 and 88.

This happens for a range of different tasks, but only sometimes; pipelines that throw this error work ok often. Even given the same input data, a pipeline might hit this error one time and run through without issue on the next attempt. There's probably some correspondence with disk traffic on our system, but I'm hoping there's some way to get cromwell to just wait longer for disk reads, or to pile on queued disk reads more slowly if that's the case.

We've tried adjusting system.io.command-backpressure-staleness quite a bit because of information in issue #4057, but that hasn't helped. We've also tried adjusting downward system.io.throttle.number-of-requests (per 100 seconds) in hopes of lowering I/O burden and ramping system.io.timeout.default and system.io.timeout.copy way up. All to no avail. I also started experimenting with adding parameters to change various akka toolkit timeouts in the akka stanza of the config, but since they're not among things specified in https://github.com/broadinstitute/cromwell/blob/develop/core/src/main/resources/reference.conf I'm not sure if those are even picked up by cromwell.

Anyway, is there anything else you can suggest? For the adjustments we've already tried, like system.io.command-backpressure-staleness, are there some values you'd recommend? Our only working solution that avoids this so far has been to set an unreasonably low concurrent-job-limit for SGE in our cromwell config; that avoids this error by intrinsically reducing the number of possible I/O operations, but it also dramatically reduces the speed of our pipelines.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions