Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluator column for future_lapply parallelism? #540

Closed
kendonB opened this issue Oct 9, 2018 · 6 comments
Closed

evaluator column for future_lapply parallelism? #540

kendonB opened this issue Oct 9, 2018 · 6 comments

Comments

@kendonB
Copy link
Contributor

kendonB commented Oct 9, 2018

Would be great to be able to specify types of workers for future_lapply parallelism. I can imagine having 9 types of workers with low, medium, and high memory and wall time.

@wlandau
Copy link
Member

wlandau commented Oct 9, 2018

See #169, #259, and futureverse/future#172. I will reopen #169 if/when future supports this functionality.

@wlandau wlandau closed this as completed Oct 9, 2018
@kendonB
Copy link
Contributor Author

kendonB commented Oct 9, 2018

Interesting that there's no general way to tell if a future is both "unresolved" and hasn't somehow failed. Though, it might be exceedingly difficult to tell either way if a future has failed in general. The minimal API would have to allow something like "unresolved and running", "unresolved and failed", "unresolved and hasn't started yet", and "unresolved and I can't tell what's actually happening/what actually happened".

Another issue here is that many workflows that might use this approach would end up with some types of workers existing but not active for long periods of time. future parallelism fits more naturally with computing using different resources. I will keep trying that as performance improvements are made.

@wlandau
Copy link
Member

wlandau commented Oct 12, 2018

Interesting that there's no general way to tell if a future is both "unresolved" and hasn't somehow failed. Though, it might be exceedingly difficult to tell either way if a future has failed in general. The minimal API would have to allow something like "unresolved and running", "unresolved and failed", "unresolved and hasn't started yet", and "unresolved and I can't tell what's actually happening/what actually happened".

Yeah, it has been hard to work around this limitation. make(parallelism = "future") does manually try to detect when jobs crash but the futures are resolved.

future parallelism fits more naturally with computing using different resources. I will keep trying that as performance improvements are made.

True. However, for "future_lapply" parallelism, you can set a worker column in your workflow plan data frame to select the preferred worker of each target. Maybe we can think about heterogeneous persistent workers when future.apply is capable. clustermq does its own load balancing, so I am not sure if clustermq-based heterogenous workers would make sense for drake.

@kendonB
Copy link
Contributor Author

kendonB commented Oct 12, 2018

Confused by this:

clustermq does its own load balancing, so I am not sure if clustermq-based heterogenous workers would make sense for drake.

The motivation here isn't load balancing - it's about matching memory / CPU / walltime resources to targets without waste. clustermq doesn't do its own resource management

Or do you mean you wouldn't be able to get clustermq to allocate properly?

@wlandau
Copy link
Member

wlandau commented Oct 12, 2018

Sorry, my point was that I am not sure we will be able to assign targets to specific clustermq workers. Because clustermq does it's own load balancing, I would expect that to be inside the black box. And for the sake of good load balancing, that can be a good thing sometimes. cc @mschubert.

@mschubert
Copy link

mschubert commented Oct 12, 2018

Right now, a worker just reports it's ready and is then assigned the next target.

In principle, it could signal ready with the resources it has available and hence only gets fitting targets assigned. However, for now, all worker are homogeneous so there's no real upside of handling this just yet.

It's a possible extension. - Please file an issue if you've got a good use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants