Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect MaxRSS/Elapsed from sacct when using SLURM? #16

Open
kendonB opened this issue Nov 17, 2017 · 1 comment
Open

Collect MaxRSS/Elapsed from sacct when using SLURM? #16

kendonB opened this issue Nov 17, 2017 · 1 comment

Comments

@kendonB
Copy link

kendonB commented Nov 17, 2017

I imagine a wide problem in using HPC systems is not knowing how much memory or walltime to allocate. I imagine this not knowing the right amounts (of memory especially) causes huge amounts of wasted resources on HPC systems. I know it does in my workflow.

SLURM returns MaxRSS/Elapsed value for each job after completion which future.batchtools could store in .future/20171118_083108-ebBlfz/.

I'm eventually imagining a function exported from drake that could report on these somehow.

@HenrikBengtsson
Copy link
Collaborator

That's a great idea. Yes, it seems to be a common problem - using way more or way less resources than requested is inefficient for both the user and the cluster utilization. I agree that one solution is to provide users with feedback (memory and processing time).

For futures in general, I hope to be able gather some of these stats using R itself (and therefore for all types of futures). I'm planning to add some basic support for this throughout the board, cf. futureverse/future#59.

However, for more system-specific information, like the Slurm stats your mentioning, that obviously has to be implemented by the more specific future classes. And for such, I think they should probably be part of batchtools itself and then future.batchtools could provide a way to access/present it.

/cc @mllg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants