Skip to content

clarify implementation of --seconds-between-status-checks and --max-status-checks-per-secon #319

Open
@dlaehnemann

Description

@dlaehnemann

Both @cmeesters and I currently aren't 100% sure how this works and we should thus investigate in some more detail, to make sure this works as intended and is thoroughly documented. Here's previous discussion of this on a docs pull request:

@dlaehnemann I removed the TODO items from the status check section (see: 4cfac88): the plugin uses the build-in rate limiter (see the main code, in this pr line 400 async with self.status_rate_limiter:), this obeys the --seconds-between-status-checks and --max-status-checks-per-second options, does it not?

That status_rate_limiter just uses the --max-status-checks-per-second option, see here:
https://github.com/snakemake/snakemake-interface-executor-plugins/blob/bfbe3e4650b97c2fd868f36b740f55cd7c8c1bcb/snakemake_interface_executor_plugins/executors/remote.py#L90

And this will only limit how many sacct commands are issued per second within a round of status check attempts. So the default for this should probably rather be some lower fraction like 0.2, to make it wait 5 seconds before re-attempting the sacct.

But from what I can see in the snakemake code, the default is actually 100.0:
https://github.com/snakemake/snakemake/blob/91cc3ef0f24b1b0e96cb751cdc806cad84491425/src/snakemake/settings/types.py#L420

Or 10, if we trust the rendered CLI docs:
https://snakemake.readthedocs.io/en/stable/executing/cli.html#snakemake.cli-get_argument_parser-behavior

Both of these would be rapid firing sacct at the slurm system as many times as you attempt a status check. So maybe instead put something like max_status_checks_per_second=0.2 here:

As for the --seconds-between-status-checks, I think this is initialized here:
https://github.com/snakemake/snakemake-interface-executor-plugins/blob/bfbe3e4650b97c2fd868f36b740f55cd7c8c1bcb/snakemake_interface_executor_plugins/executors/remote.py#L280

And we update it here, resetting it to 40 seconds whenever any jobs are finished:

So I think we should at least reset it to the command-line value. And we should double-check that simply setting this value will have the intended effect of waiting. Because I don't understand the code calling this in the interface. There's first the sleep function, that uses this:
https://github.com/snakemake/snakemake-interface-executor-plugins/blob/bfbe3e4650b97c2fd868f36b740f55cd7c8c1bcb/snakemake_interface_executor_plugins/executors/remote.py#L269

And this sleep function is in turn only used here:
https://github.com/snakemake/snakemake-interface-executor-plugins/blob/bfbe3e4650b97c2fd868f36b740f55cd7c8c1bcb/snakemake_interface_executor_plugins/executors/remote.py#L191

And eventually it is used here, within the threading.Thread() call:
https://github.com/snakemake/snakemake-interface-executor-plugins/blob/bfbe3e4650b97c2fd868f36b740f55cd7c8c1bcb/snakemake_interface_executor_plugins/executors/remote.py#L83

So I guess this somehow uses the threading core python module to wait for jobs and update the active_jobs list, but I can't seem to wrap my head around how it does this. But it probably works as intended and I simply don't understand it... 😅

Originally posted by @dlaehnemann in #237 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions