Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use binary name for slurm step accounting #59

Open
mattaezell opened this issue Jan 17, 2025 · 3 comments
Open

Use binary name for slurm step accounting #59

mattaezell opened this issue Jan 17, 2025 · 3 comments

Comments

@mattaezell
Copy link

When I spindle srun, Slurm accounting captures two steps: spindle_be and spindle_bootstrap instead of the normal single entry that has the executable name. It would be nice if Spindle could use the -J option to the srun launcher to set the executable name as the job name.

@mplegendre
Copy link
Member

We could do that. I'll add it to the queue. It would help get rid of the spindle_bootstrap, thought you'll still see the spindle_be.

Longer term I want to get a PR for JSC merged that would allow for using spindle as a slurm plugin. That would solve both those problems.

@mattaezell
Copy link
Author

Thanks for the quick response! I'm not familiar with JSC. We are building the spank plugin but haven't enabled/tested it yet. Are there known issues? I saw #52 but don't fully understand the impact yet.

@mplegendre
Copy link
Member

JSC = Juelich Supercomputing Center. They submitted the #52 PR.

The existing spank plugin kind of works, but isn't reliable enough to be used in production. There's a lot of corner cases where it fails. When I originally wrote it spank was missing a lot of capabilities that spindle wanted, and there's some very bad hacks in spindle's spank plugin to compensate. Since then slurm/spank has progressed, and #52 looks to clean a lot of that up.

The catch is that I lack a good test environment for slurm plugins, which I need before merging #52. I need to build one (probably through virtual cluster of slurm containers). But that's a big enough task that it's remained at the back of my queue behind El Capitan work for a while now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants