Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent workers for the *apply backends #289

Closed
wlandau opened this issue Feb 28, 2018 · 8 comments
Closed

Persistent workers for the *apply backends #289

wlandau opened this issue Feb 28, 2018 · 8 comments

Comments

@wlandau
Copy link
Member

wlandau commented Feb 28, 2018

I tried this once before and failed, but from what I learned from #227, I think it is possible after all. The master process can communicate with workers with a special workers namespace in the cache. If we succeed, we may not need a callr backend (#278), though we should keep the existing schedule for the future backend for cases where workers do not have cache access.

@wlandau
Copy link
Member Author

wlandau commented Feb 28, 2018

Best part: we could totally get rid of staged parallelism this way and be left with a minimal-overhead solution.

@krlmlr
Copy link
Collaborator

krlmlr commented Feb 28, 2018

I thought a bit about message passing. We don't need very elaborate functionality, just post, receive, and wait for R objects. On the other hand, a storr namespace will require file system access for all workers. I wonder if we could use an established library like MPI for sending commands to workers and receiving results.

@wlandau
Copy link
Member Author

wlandau commented Feb 28, 2018

Message passing is certainly the appropriate paradigm here. Would posting allow us to send entire targets to master? Otherwise, I think the workers need cache access anyway.

@wlandau wlandau removed their assignment Feb 28, 2018
@wlandau
Copy link
Member Author

wlandau commented Feb 28, 2018

All the *apply backends already assume cache access, so that is something else we may need to fix.

@krlmlr
Copy link
Collaborator

krlmlr commented Feb 28, 2018

Even if the worker reads a file created by the master or by some other worker, it can be viewed as a form of "posting" a message. A message can be a blob of arbitrary size. We just don't want to block the master until a worker has read the data, which is why I used this term.

  1. The master posts a job description (command + inputs) and assigns it to a worker.
  2. A worker receives the data, does its job and posts the reply back to the master.
  3. The master receives the reply and posts a new job.
  4. Occasionally, worker or master need to wait for new data to receive.

MPI should be able to handle this, I wonder if that's the best solution though.

@wlandau
Copy link
Member Author

wlandau commented Feb 28, 2018

  1. The master receives the reply...

Is the master receiving the value of the target itself?

MPI should be able to handle this, I wonder if that's the best solution though.

Yeah, it seems like Rmpi might be its own separate backend if we go that direction. I am not sure how mclapply() workers, for example, would be able to take advantage of MPI-style message passing.

@wlandau
Copy link
Member Author

wlandau commented Feb 28, 2018

Another thing: before each target is built, the environment should be pruned in order to make sure dependencies are loaded and targets we don't need anymore are unloaded. In order to know what we can safely unload, each worker needs to know which targets the other workers are building. This information is easy to communicate using the file system, and it should also be possible with message passing.

@wlandau
Copy link
Member Author

wlandau commented Feb 28, 2018

Closing because I think we should move this thread to #285. Persistence is a whole new scheduling paradigm for drake, and I think it is an excellent opportunity to begin a separate scheduling package.

@wlandau wlandau closed this as completed Feb 28, 2018
@wlandau wlandau mentioned this issue Feb 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants