Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slurm support #77

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from
Draft

Slurm support #77

wants to merge 13 commits into from

Conversation

samuelcwilliams
Copy link
Collaborator

Added support for Slurm to uit.py.

@samuelcwilliams samuelcwilliams requested review from sdc50 and araglu March 17, 2025 14:20
@samuelcwilliams samuelcwilliams self-assigned this Mar 17, 2025
@samuelcwilliams samuelcwilliams marked this pull request as draft March 17, 2025 14:22
@samuelcwilliams
Copy link
Collaborator Author

@sdc50 @araglu here's the draft pull request. I'll address the issues in the pipeline in a little bit, but in the meantime I'd appreciate any feedback y'all have!

@araglu
Copy link
Collaborator

araglu commented Mar 17, 2025

I think some of these changes will also need to be made in async_client.py. It has very similar code to uit.py but in an async context. I noticed our app is calling the async get_raw_queue_stats(), so it did not get your changes.

@samuelcwilliams
Copy link
Collaborator Author

Will do! I started on it a while back, but ran into some bugs with what I already had in uit.py so I reverted the changes so I could start fresh once I had those bugs worked out.

I'll get those updates rolled in ASAP!

@samuelcwilliams samuelcwilliams marked this pull request as ready for review March 21, 2025 19:03
Comment on lines +256 to +257
self.scheduler = NODE_TYPES[self.system]["scheduler"]
self.commands = COMMANDS[self.scheduler]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right spot for these two lines? I get an error on them whenever I try to submit a new job or delete that job.

In uit.py, these lines are in prepare_connect() which is called by both sync and async connect(), so async_client.py might not need these lines here.

Copy link
Collaborator Author

@samuelcwilliams samuelcwilliams Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I guess I didn't realize it was calling prepare_connect() from the regular client. I'll make the changes ASAP

@araglu
Copy link
Collaborator

araglu commented Mar 24, 2025

The async submit() is still using a hard-coded "qsub" command in async_client.py:590.

Edit: the error message from qsub is:

sbatch: error: invalid partition specified: background
sbatch: error: Batch job submission failed: Invalid partition name specified

I thought using the qsub compatibility command would work until we get pyuit to write out a Slurm script, but I can't get it to work, even with other queues.

@samuelcwilliams samuelcwilliams marked this pull request as draft March 25, 2025 14:57
@araglu
Copy link
Collaborator

araglu commented Mar 25, 2025

The "Max Wall Time" and "Max Processes" don't look right to me. Wall time said 10080 when I was expecting it in a 04:00:00 format. The processes looked more like number of nodes when I'm used to seeing number of cores.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants