Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved parallel graph algorithm #226

Closed
wlandau opened this issue Feb 4, 2018 · 2 comments
Closed

Improved parallel graph algorithm #226

wlandau opened this issue Feb 4, 2018 · 2 comments

Comments

@wlandau
Copy link
Member

wlandau commented Feb 4, 2018

The problem

The current staged parallel scheduling algorithm is inefficient: each parallel stage is only as fast as its slowest job, and there is overhead in spinning up every stage. The algorithm should greedily search forward like Make does.

Proposed solution

Walk through the entire graph with one call to the backend's lapply()-like function. Each worker walks through its own topologically-sorted list of targets. At each step of the walk, the worker makes a decision about the target.

Already started by another worker Not already started
Dependencies ready Skip the target and move to the next one. Lock the target and build it.
Dependencies not ready Throw an informative error. Lock the target and run the sleep loop.

Sleep loop

  1. Sleep for some small length of time.
  2. Make a fresh decision about the target (i.e. choose a cell in the above table).

Topological sort

Usually, there are several perfectly usable topological orders (ones that put targets after all their dependencies), so a sort should be chosen that puts highest-priority targets first. The parallel_stages() function could help with this: priorities could be resolved within each stage if there is a priority column in the workflow plan.

Target locking

I think this mechanism should be its own special file system inside .drake/lock/. Each target gets its own lock. Yes, this will increase the number of tiny files in the cache, but it will be lot faster than a central lock. Why not use the existing storr cache?

  1. It's safer: the user may be using a cache that is not threadsafe.
  2. Packages that specialize in file locking should be faster. When it comes to locking, every bit of speed matters because of the possibility of race conditions.

Post-issue wrap-up

  • When this issue is fixed, the documentation should stop talking about parallel stages.
  • The predict_runtime() and rate_limiting_times() should be explained differently in the help files and vignettes.
@wlandau wlandau changed the title Improve parallel graph algorithm Improved parallel graph algorithm Feb 4, 2018
@wlandau
Copy link
Member Author

wlandau commented Feb 4, 2018

On second thought, we should use a special "locks" namespace in the cache.

  • Parallelism assumes the cache is threadsafe anyway.
  • storr is fast.
  • We need this namespace to be small so we can efficiently tell which targets are running.

Unfortunately, this will increase the number of tiny files, but the more I think about it, the more it makes sense.

@wlandau
Copy link
Member Author

wlandau commented Feb 4, 2018

@krlmlr As we discussed, closing in favor of #227.

@wlandau wlandau closed this as completed Feb 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant