Improved parallel graph algorithm #226

wlandau · 2018-02-04T01:03:20Z

The problem

The current staged parallel scheduling algorithm is inefficient: each parallel stage is only as fast as its slowest job, and there is overhead in spinning up every stage. The algorithm should greedily search forward like Make does.

Proposed solution

Walk through the entire graph with one call to the backend's lapply()-like function. Each worker walks through its own topologically-sorted list of targets. At each step of the walk, the worker makes a decision about the target.

	Already started by another worker	Not already started
Dependencies ready	Skip the target and move to the next one.	Lock the target and build it.
Dependencies not ready	Throw an informative error.	Lock the target and run the sleep loop.

Sleep loop

Sleep for some small length of time.
Make a fresh decision about the target (i.e. choose a cell in the above table).

Topological sort

Usually, there are several perfectly usable topological orders (ones that put targets after all their dependencies), so a sort should be chosen that puts highest-priority targets first. The parallel_stages() function could help with this: priorities could be resolved within each stage if there is a priority column in the workflow plan.

Target locking

I think this mechanism should be its own special file system inside .drake/lock/. Each target gets its own lock. Yes, this will increase the number of tiny files in the cache, but it will be lot faster than a central lock. Why not use the existing storr cache?

It's safer: the user may be using a cache that is not threadsafe.
Packages that specialize in file locking should be faster. When it comes to locking, every bit of speed matters because of the possibility of race conditions.

Post-issue wrap-up

When this issue is fixed, the documentation should stop talking about parallel stages.
The predict_runtime() and rate_limiting_times() should be explained differently in the help files and vignettes.

The text was updated successfully, but these errors were encountered:

wlandau · 2018-02-04T03:19:48Z

On second thought, we should use a special "locks" namespace in the cache.

Parallelism assumes the cache is threadsafe anyway.
storr is fast.
We need this namespace to be small so we can efficiently tell which targets are running.

Unfortunately, this will increase the number of tiny files, but the more I think about it, the more it makes sense.

wlandau · 2018-02-04T06:15:32Z

@krlmlr As we discussed, closing in favor of #227.

wlandau added type: new feature difficulty: advanced topic: performance status: priority internals labels Feb 4, 2018

wlandau changed the title ~~Improve parallel graph algorithm~~ Improved parallel graph algorithm Feb 4, 2018

wlandau closed this as completed Feb 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved parallel graph algorithm #226

Improved parallel graph algorithm #226

wlandau commented Feb 4, 2018

wlandau commented Feb 4, 2018

wlandau commented Feb 4, 2018

Improved parallel graph algorithm #226

Improved parallel graph algorithm #226

Comments

wlandau commented Feb 4, 2018

The problem

Proposed solution

Sleep loop

Topological sort

Target locking

Post-issue wrap-up

wlandau commented Feb 4, 2018

wlandau commented Feb 4, 2018