Description
Both @cmeesters and I currently aren't 100% sure how this works and we should thus investigate in some more detail, to make sure this works as intended and is thoroughly documented. Here's previous discussion of this on a docs pull request:
@dlaehnemann I removed the TODO items from the status check section (see: 4cfac88): the plugin uses the build-in rate limiter (see the main code, in this pr line 400
async with self.status_rate_limiter:
), this obeys the--seconds-between-status-checks
and--max-status-checks-per-second
options, does it not?
That status_rate_limiter
just uses the --max-status-checks-per-second
option, see here:
https://github.com/snakemake/snakemake-interface-executor-plugins/blob/bfbe3e4650b97c2fd868f36b740f55cd7c8c1bcb/snakemake_interface_executor_plugins/executors/remote.py#L90
And this will only limit how many sacct
commands are issued per second within a round of status check attempts. So the default for this should probably rather be some lower fraction like 0.2, to make it wait 5 seconds before re-attempting the sacct
.
But from what I can see in the snakemake code, the default is actually 100.0
:
https://github.com/snakemake/snakemake/blob/91cc3ef0f24b1b0e96cb751cdc806cad84491425/src/snakemake/settings/types.py#L420
Or 10
, if we trust the rendered CLI docs:
https://snakemake.readthedocs.io/en/stable/executing/cli.html#snakemake.cli-get_argument_parser-behavior
Both of these would be rapid firing sacct
at the slurm system as many times as you attempt a status check. So maybe instead put something like max_status_checks_per_second=0.2
here:
As for the --seconds-between-status-checks
, I think this is initialized here:
https://github.com/snakemake/snakemake-interface-executor-plugins/blob/bfbe3e4650b97c2fd868f36b740f55cd7c8c1bcb/snakemake_interface_executor_plugins/executors/remote.py#L280
And we update it here, resetting it to 40 seconds whenever any jobs are finished:
So I think we should at least reset it to the command-line value. And we should double-check that simply setting this value will have the intended effect of waiting. Because I don't understand the code calling this in the interface. There's first the sleep function, that uses this:
https://github.com/snakemake/snakemake-interface-executor-plugins/blob/bfbe3e4650b97c2fd868f36b740f55cd7c8c1bcb/snakemake_interface_executor_plugins/executors/remote.py#L269
And this sleep function is in turn only used here:
https://github.com/snakemake/snakemake-interface-executor-plugins/blob/bfbe3e4650b97c2fd868f36b740f55cd7c8c1bcb/snakemake_interface_executor_plugins/executors/remote.py#L191
And eventually it is used here, within the threading.Thread()
call:
https://github.com/snakemake/snakemake-interface-executor-plugins/blob/bfbe3e4650b97c2fd868f36b740f55cd7c8c1bcb/snakemake_interface_executor_plugins/executors/remote.py#L83
So I guess this somehow uses the threading
core python module to wait for jobs and update the active_jobs list, but I can't seem to wrap my head around how it does this. But it probably works as intended and I simply don't understand it... 😅
Originally posted by @dlaehnemann in #237 (comment)