Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sacctmgr permission issue #177

Open
jansen-ista opened this issue Dec 10, 2024 · 3 comments
Open

sacctmgr permission issue #177

jansen-ista opened this issue Dec 10, 2024 · 3 comments

Comments

@jansen-ista
Copy link

Software Versions
snakemake 8.25.3
snakemake-executor-plugin-slurm 0.11.2
slurm-wlm 22.05.8

Describe the bug
On the cluster of my institute sacctmgr is not accessible for general users. Since the plugin uses sacctmgr to determine if the provided slurm account is valid, this leads to a crash because no account is returned by the command (the command also does not fail, it just does not return anything).

Logs
No SLURM account given, trying to guess.
Guessed SLURM account: itgrp
WorkflowError:
The given account itgrp appears to be invalid. Available accounts:

Additional context
After talking to colleagues, it is unfortunately not possible to give normal users access to the command.
However my colleague found a workaround. We currently use a version modified by hand on our cluster, which is changed as follows.
The following is the function how it is currently implemented in the plugin:

def test_account(self, account):
        """
        tests whether the given account is registered, raises an error, if not
        """
        cmd = f'sacctmgr -n -s list user "{os.environ["USER"]}" format=account%256'
        try:
            accounts = subprocess.check_output(
                cmd, shell=True, text=True, stderr=subprocess.PIPE
            )
        except subprocess.CalledProcessError as e:
            raise WorkflowError(
                f"Unable to test the validity of the given or guessed SLURM account "
                f"'{account}' with sacctmgr: {e.stderr}"
            )

        # The set() has been introduced during review to eliminate
        # duplicates. They are not harmful, but disturbing to read.
        accounts = set(_.strip() for _ in accounts.split("\n") if _)

        if account not in accounts:
            raise WorkflowError(
                f"The given account {account} appears to be invalid. Available "
                f"accounts:\n{', '.join(accounts)}"
            )

By using the sshare command we are able to circumvent this.

def test_account(self, account):
        """
        tests whether the given account is registered, raises an error, if not
        """
        cmd = f'sshare -U --format Account --noheader'
        try:
            accounts = subprocess.check_output(
                cmd, shell=True, text=True, stderr=subprocess.PIPE
            )
        except subprocess.CalledProcessError as e:
            raise WorkflowError(
                f"Unable to test the validity of the given or guessed SLURM account "
                f"'{account}' with sshare: {e.stderr}"
            )

        # The set() has been introduced during review to eliminate
        # duplicates. They are not harmful, but disturbing to read.
        accounts = set(_.strip() for _ in accounts.split("\n") if _)

        if account not in accounts:
            raise WorkflowError(
                f"The given account {account} appears to be invalid. Available "
                f"accounts:\n{', '.join(accounts)}"
            )

Would it be possible to add this as an alternative/backup validation option in the plugin?

Also why is the validity of the slurm account checked in the first place?
Wouldn't it be more efficient to just leave this to SLURM itself?

Thank you very much in advance for your help and I'm happy to help in any way I can if you have questions.

@cmeesters
Copy link
Member

Thank you for your issue report and the input!

We strive to test and evaluate the plugin thoroughly. It should run on every cluster with a SLURM batch system. But admins have a lot of phantasy to tailor “their” system, as it appears.

... it is unfortunately not possible to give normal users access to the command.

That is, of course, nonsense. sacctmgr is a standard command to evaluate access to resources, e.g. certain partitions, and part of the SLURM distribution.

Would it be possible to add this as an alternative/backup validation option in the plugin?

Why not? I will certainly insist on testing with sacctmgr and would not be surprised if after a total switch to sshare yet another user would turn up and report that sshare is not working as intended. Then again, why not first test one way and — upon failure — the other way? Will you file a PR, or should I do it?

Also why is the validity of the slurm account checked in the first place?
Wouldn't it be more efficient to just leave this to SLURM itself?

This stems from the experience, that with many users some will inevitably smuggle a typo into account and/or partition settings (happened to me, too). It is part of a growing defence mechanism: A user should always know where an error is originating. To just report the submission error is not enough for some. We are not where we want to be in terms of usability and ease, but this is definitively some minor bit.

@cmeesters
Copy link
Member

found the time to do a PR - note that the CI seems currently broken. This needs to be fixed, before we will have a new release.

@jansen-ista
Copy link
Author

Hi @cmeesters thanks for the quick response and fix.

I can talk again to my colleagues on why this is actually disabled. They had some valid reasons, which I think also had to do with sacctmgr putting more load on the slurm database, but I'm not sure since I'm no expert on SLURM myself.

Another option here might be to actually use some kind of if else statement to fall back to sshare in case sacctmgr doesn't return anything, or use both and concat the result.

Thanks again for your help, I'll report back once I have tested the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants