Remove all non-clustermq parallel backends? #561

wlandau · 2018-10-27T17:14:21Z

Background

Currently, drake offers a multitude of parallel computing backends.

parallelism_choices()
##  [1] "clustermq"            "clustermq_staged"     "future"              
##  [4] "future_lapply"        "future_lapply_staged" "hasty"               
##  [7] "Makefile"             "mclapply"             "mclapply_staged"     
## [10] "parLapply"            "parLapply_staged"

Over the last two years, the number of backends grew and grew because I had so much to learn about high-performance computing. And I still do. But now, users have unnecessarily complicated choices to make, and it is difficult to develop and maintain so many backends.

The more I learn, the more "clustermq" seems like the best of all worlds for drake.

The clustermq package can deploy locally with the "multicore" option and remotely to most common schedulers.
Overhead is very low, even comparable to drake's non-staged multicore backends. Thanks to clustermq, initialization, interprocess communication, and load balancing appear very fast.
We may not need future after all. Yes, the future ecosystem is amazingly powerful and flexible, and BACKEND: ClusterMQ as a new backend futureverse/future#204 could potentially even provide staged clustermq-based parallelism. However, I do wonder about a couple things.
1. How much value do future's non-clustermq backends still provide here? Is there still a need for batchtools-based HPC?
2. For directed acyclic graphs (DAGs) of non-embarrassingly-parallel jobs, it is important to have full access to a pool of automatically load-balanced persistent clustermq workers so drake can submit new jobs as soon as their dependencies finish and a worker becomes available. (Relevant: Support clustermq as backend for drake mschubert/clustermq#86 (comment) and Support clustermq as backend for drake mschubert/clustermq#86 (comment).) Does future allow this?

Proposal

For drake version 7.0.0 – which I hope to release in the first half of 2019 – let's think about removing all parallelism choices except "clustermq". (And let's keep "hasty" mode too, which is just an oversimplified a clone of "clustermq". It's not much code, and it's a great sandbox for benchmarking).

Benefits

Removing superfluous parallel backends will simplify the user experience. Users will no longer be overwhelmed by all the parallel computing choices and having to figure out which one is right for them. In addition, the code base and test suite will be considerably smaller, simpler, leaner, cleaner, faster, easier to maintain, more reliable, and more attractive to potential collaborators.

Your help

My goals for late 2018 are to

Assess the feasibility of this change.
If the change is a good idea, ensure the prerequisites for development are in place.

I would sincerely appreciate any input, advice, help, and participation you are willing to lend.

How will this affect you as a user?

Do you rely on other backends? Having problems with make(parllelism = "clustermq")? Let's talk. I will personally help you transition.

What am I missing?

Are there use cases that are inherently more difficult for clustermq than the other backends? The existing backends have different strengths and weaknesses, and I want to leave time for discussion before assuming clustermq is a one-size-fits-all solution.

Related issues

From a development perspective, the chief obstacles seem to be

Multicore parallelism on Windows.
Timeouts on computing clusters (Working with HPC time limits #349 and Redeploy workers that time out too soon? mschubert/clustermq#101).
In certain situations, increased overhead relative to local staged parallelism.

Here is a more exhaustive list of relevant issues.

cc

These are some of the folks involved in earlier discussions about drake's high-performance computing. Please feel free to add more. I probably forgot many.

The text was updated successfully, but these errors were encountered:

mschubert · 2018-10-27T18:08:00Z

Rather than using the current clustermq backend, does it make sense to come up with a solution for efficient distribution/caching of data shared between multiple calls and move drake in that direction? (I actually thought about contacting you for a possible R Consortium application, but had too much other work to fit it in this round) Happy to discuss this in more detail when I'm back from vacation/traveling in 3 weeks.

wlandau · 2018-10-28T14:04:43Z

I would love to discuss the details when you are back from vacation. I will actually be on a vacation myself from November 15 through 26, so maybe late November or early December would be a good time to follow up.

My own preference at the moment is to pursue both options simultaneously. I know, this is the sort of thinking that that gave drake too many parallel backends, but I see value in both, and I am not sure they are mutually exclusive. What you describe seems like a higher-risk, higher-reward effort that could replace clustermq in drake further down the road. In the nearer future, it is trivially easy on the implementation side to get rid of superfluous non-clustermq backends, and I believe this would solve immediate problems in development, testing, collaboration, and general ease of use.

Also, #384, #498, and this article could be relevant to your idea to the extent that storage is related to caching.

idavydov · 2018-11-02T11:18:27Z

I started using drake with batchtools mainly because zeromq is an extra dependency. It turned out that we have it installed on the cluster. So eventually I switched to clustermq, and it seems to work great.

In my opinion, the only problem with having clustermq only is the fact that some clusters might not have zeromq installed. On top of that, even if you manage to compile zeromq from sources, if you have some organization-wide installation of Rstudio, it might be difficult to have all the environment variables setup correctly to use a locally-installed version.

Another consideration is the following: persistent workers are great but for large clusters, it is often considered a good practice to have shorter jobs rather than longer ones. So in case of large drake plans and clusters with a high load, it is probably good to keep an option of having transient workers. Maybe, even exploiting cluster dependency management.

Perhaps, the simplest way to do this is through Makefiles?

bart1 · 2018-11-02T12:27:16Z

A quick note from my side, I just gave it a try. I needed to install zeromq from source which was surprisingly easy due to having no dependency issues. Afterwards I got the example running without any trouble (on slurm).

not knowing much about how clustermq works I do not know how it deals with longer jobs. Since jobs on the cluster have a limited run time, and it seems clustermq reuses existing instances sometimes I would probably want to limit this behaviour to start new jobs. To avoid jobs being killed by the cluster. I guess the reuse argument to clustermq::workers is meant for that.

An other question I have is if there is an option to run multiple jobs in one or more larger jobs cluster jobs. Previous I have used one 64 core tasks to run many different tasks.

I guess both these concerns can be addressed by updating the template file to call workers with reuse=FALSE and make multiple parallel calls to workers using something like mclapply(1:64, function(x){clustermq("{master}")}) in the slurm template (untested)

HenrikBengtsson · 2018-11-02T22:30:05Z

A quick note from my side, I just gave it a try. I needed to install zeromq from source which was surprisingly easy due to having no dependency issues.

I can second this. I've verified that installing ZeroMQ as a non-privileged user from source on a RHEL 6.6(!) cluster went smoothly following the (traditional) installation instruction;

curl -L -O https://github.com/zeromq/libzmq/releases/download/v4.2.5/zeromq-4.2.5.tar.gz
tar xvf zeromq-4.2.5.tar.gz
./configure --prefix=/path/to/zeromq-4.2.5
make
make install

After this, 'rzmq' and 'clustermq' installed like a charm on a fresh R 3.5.1 setup.

Lots of HPC cluster run these older versions of RHEL/CentOS. I think there are folks out there stuck with RHEL 5 as well. Being able to support those is important. ("You need to update your OS", is not useful feedback for users on such environment.)

wlandau · 2018-11-04T13:31:10Z

Thanks for chiming in. From your perspective, it sounds like the main issues are (1) persistent workers and (2) compatibility edge cases with ZeroMQ. mschubert/clustermq#101 is solvable, but to Henrik's point, OS portability is key. All this makes me think we might keep the "future" backend as well as "clustermq" and "hasty". Those three backends are fewer than the existing 11, and together they cover a lot.

wlandau · 2018-11-04T13:33:51Z

@idavydov

Perhaps, the simplest way to do this is through Makefiles?

make(parallelism = "Makefile") was actually drake's first parallel backend. In fact, my original intention was to offload to GNU Make as much as possible. Now, however, I think drake has grown independent of Make and we no longer need the "Makefile" backend.

violetcereza · 2018-11-04T17:51:28Z

I haven't been using drake much recently but as I said in #126, I used Makefile parallelism because I really liked the ability to kill jobs (from htop) without affecting my main R session. This worked with my workflow, where I rapidly iterate on my scripts. I'm not sure if this is a reasonable request, but I'm wondering if clustermq has a means for easily viewing the status of various jobs and killing them.

wlandau · 2018-11-04T18:06:32Z

With clustermq and its persistent workers, you unfortunately would not be able to kill individual jobs. But with the "future" backend (which is back on the table) you might, especially if there is some way to expose and broadcast suggestive job names in batchtools template files.

Related: futureverse/future#93

krlmlr · 2018-11-07T09:50:13Z

I love the idea of fewer choices and a "universally good" solution, haven't had the chance to try clustermq yet.

wlandau · 2018-11-09T14:20:46Z

@dapperjapper, in 1ae4528, I supplied the target name to the label argument of future(). So for those of us who use computing clusters and future.batchtools, as long as we use job.name in our batchtools template files, we see the names of our targets in job-monitoring utilities like qstat. As for local multisession/multicore parallelism, I do not know if it is possible to post informative job names that get picked up by htop.

wlandau · 2018-11-09T14:25:41Z

Related: HenrikBengtsson/future.batchtools#8, futureverse/future#96.

wlandau · 2018-11-11T22:03:35Z

@dapperjapper, using HenrikBengtsson/future.callr@a877479 and 1ae4528, you can now see the targets corresponding to processes in htop. Example: futureverse/future#96 (comment). We are one step closer to the safe removal of Makefile parallelism.

violetcereza · 2018-11-14T15:53:06Z

Fantastic work! Thank you!

HenrikBengtsson · 2018-12-26T04:38:04Z

future with batchtools or multicore takes far too long to get going to be useful for large sets of quick jobs.

Admitting I've been out of the loop for a while, but didn't drake have a "future.apply" backend for the purpose of doing achieving balancing (on top of the core Future API provided by future)?

kendonB · 2018-12-26T04:44:22Z

Yes. It's the "future_lapply_staged" option, which has worked pretty well for embarrassing parallel jobs. I may still lobby Will to support it as an option.

mschubert · 2018-12-26T14:38:42Z

Or one team of persistent heterogeneous workers and appropriate load balancing to accommodate. I think this may be a good clustermq issue.

This would need the whole infrastructure of assigning each call a memory and time cost, constructing dependency graphs, and then distributing tasks according to maximize resource usage.

This, to me, sounds rather like a job for a workflow package than a scheduler interface package, so I'm unlikely to support this directly.

But I'm happy to adapt the worker API for e.g. drake to use it that way, or come up with a good glue like #561 (comment) suggests (I haven't forgotten about this).

Admitting I've been out of the loop for a while, but didn't drake have a "future.apply" backend for the purpose of doing achieving balancing (on top of the core Future API provided by future)?

I also have to read up on how the different future approaches work in detail, and consider putting futureverse/future#204 a bit higher up on the priority list (or is anyone interested in taking a stab at this?).

wlandau · 2018-12-27T01:37:01Z

It may be time for drake to allow custom/external parallel backends. I am thinking about make(parallelism = your_scheduler_function), where your_scheduler_function() takes a drake_config() object and checks/builds all the targets. We could take the current internal run_future_lapply_staged() and put it in it's own package, similarly to how packages future.batchtools and future.callr extend future. We could also make extension easier by standardizing and exposing drake functions for building, checking, and storing targets. With all the changes planned for 7.0.0, I think we are coming up on the right time. Implementation may not start immediately, though. My January will be packed with other stuff, including 2 conferences.

wlandau · 2018-12-27T02:30:08Z

#561 (comment) could also pave the way for #575.

wlandau · 2019-01-03T13:16:41Z

Re #561 (comment), point well taken. drake needs to keep track of which targets need which resources, and of course, the timing of how the targets are deployed. However, I think drake may need help from clustermq in order to spawn a pool of heterogenous workers, and for each target, exclude workers that do not meet the resource requirements.

At the interface level, it would be great to combine futureverse/future#204 and https://ropenscilabs.github.io/drake-manual/hpc.html#the-resources-column-for-transient-workers. make(parallelism = "future") already recruits the optional resources argument of future.batchtools evaluators. And depending on efficiency, futureverse/future#204 might allow drake to fold make(parallelism = "clustermq") right into make(parallelism = "future"). The simplicity would be really nice.

wlandau · 2019-01-03T13:31:01Z

wlandau · 2019-01-03T21:05:57Z

After another look at the code base, I no longer think it is a good idea to officially support custom backends in external packages because it would require exposing too many sensitive internals. That said, I will still open a back door for experimentation: make(parallelism = your_scheduler_function). Caveats:

This is really a sandbox similar to hasty mode, so drake will always throw a warning.
To get at the necessary internals, you will need to use :::.

I think this approach could

Make it easier for others to help with Bare-bones scheduling and caching #575.
Help advanced HPC users aggressively optimize scheduling for their computing resources.

wlandau · 2019-01-03T22:03:00Z

Let's externally offload the unofficial backends through this backdoor: hasty mode and "future_lapply_staged" parallelism.

Ref: #561

wlandau · 2019-01-04T02:52:56Z

I believe all the immediate issues are fixed. @kendonB, "future_lapply_staged" parallelism is now available through https://github.com/wlandau/drake.future.lapply.staged. Likewise, hasty mode is offloaded to https://github.com/wlandau/drake.hasty. @mschubert, let's follow up separately about #561 (comment) and futureverse/future#204.

HenrikBengtsson · 2019-01-04T06:27:22Z

With the risk of being "fluffy", here are some quick comments and clarifications related to the future framework and what I think is the gist of this thread:

The idea of the Future Core API is to provide a unified framework:
- that provides a minimal ("atomic") set of build blocks - no more no less - for evaluating R code "anywhere" and asynchronously
- where the Future class provides a "container" holding an R expression and its dependencies
- where Future objects can be passed on to R processes running "anywhere"
- when I look at our existing parallel/distributed backends, at their their very core, they are all implementing their own version of a future()/resolved()/value() API; conceptually they could implement the same lightweight API at this level. Not claiming it will ever happen, but the design goal of the future framework is such that packages such as 'parallel', 'foreach', 'batchtools', 'clustermq' etc could implement their own future backends and then their higher-level functions will build on top of these common core parallelization building blocks
- Higher-level use cases from tools like drake greatly helps to identify/narrow in on what the atomic set of building blocks should be
I'm hoping to get to a future.clustermq backend "soon-ish";
- I wanted to get a working draft of future.tests first to help the validation (it's in a decent shape now)
- Note that a future.clustermq backend will actually "peel off" the workload balancing that clustermq has built in, which means a vanilla future.clustermq backend might not do what you need. Why? This is because the Future Core API (as defined/implemented in the 'future' package) does not have a concept of workload balancing (but see next).
Specifying and requesting computational resources needed to evaluate a particular computational container ("future") is not obvious:
- We need some type of standard and I don't think it exists, i.e. it needs to be identified and developed carefully
- HPC schedulers provide some framework and terminology for this
- Different schedulers does not fully agree on how to specify this, e.g. for some you specify memory needed for the whole job whereas for others per slot in a job. Some of these can be encapsulated in a unifying API, but some may be unique to certain schedulers and needs
It's on my long-term roadmap to make it possible to merge multiple Future objects:
- the hope is to identify (and eventually provide) fundamental buildings block for doing workload balancing ("chunking")
- for example, imagine a for loop over 100 iterations creating 100 separate futures. If we could merge these into 10 new futures (sharing common globals and dependencies etc), we have effectively achieved the most basic type of workload balancing ("chunking") as we see in mclapply(), parLapply(), future_lapply(), etc.
- in order to do this, I need to decouple the Future class further from the future backends (they're currently a bit intertwined)
- the goal is to provide the minimal set of building blocks

brendanf · 2019-02-04T11:08:28Z

A quick note from my side, I just gave it a try. I needed to install zeromq from source which was surprisingly easy due to having no dependency issues.

I can second this. I've verified that installing ZeroMQ as a non-privileged user from source on a RHEL 6.6(!) cluster went smoothly following the (traditional) installation instruction;
curl -L -O https://github.com/zeromq/libzmq/releases/download/v4.2.5/zeromq-4.2.5.tar.gz
tar xvf zeromq-4.2.5.tar.gz
./configure --prefix=/path/to/zeromq-4.2.5
make
make install
After this, 'rzmq' and 'clustermq' installed like a charm on a fresh R 3.5.1 setup.

Lots of HPC cluster run these older versions of RHEL/CentOS. I think there are folks out there stuck with RHEL 5 as well. Being able to support those is important. ("You need to update your OS", is not useful feedback for users on such environment.)

@HenrikBengtsson I managed to install zeromq on my university cluster (running CentOS 7.6.1810) using the commands you listed, but I'm having trouble getting rzmq to load properly; can you spell out in more detail where the zeromq libraries should be installed and/or what environmental variables need to be set for this to work?

wlandau added depends: help or input difficulty: advanced topic: performance status: priority READ FIRST 🚦 labels Oct 27, 2018

wlandau self-assigned this Oct 27, 2018

wlandau added this to the Version 7.0.0 milestone Oct 28, 2018

wlandau added topic: dependencies depends: a future release labels Oct 28, 2018

wlandau mentioned this issue Oct 31, 2018

Review the code for non-constant-time storr operations #566

Closed

wlandau removed status: priority labels Nov 5, 2018

wlandau mentioned this issue Nov 8, 2018

Bare-bones scheduling and caching #575

Closed

wlandau mentioned this issue Nov 12, 2018

Suggestion: future::with_plan() and/or future::local_plan() futureverse/future#263

Open

This was referenced Dec 4, 2018

Remove dependency on dplyr #593

Closed

Refactor predict_runtime() and predict_load_balancing() #597

Closed

wlandau closed this as completed Dec 27, 2018

wlandau reopened this Dec 27, 2018

wlandau added status: priority and removed depends: a future release labels Jan 2, 2019

wlandau mentioned this issue Jan 3, 2019

Remove superfluous parallel backends #648

Merged

6 tasks

wlandau pushed a commit that referenced this issue Jan 3, 2019

Open the backdoor to custom scheduler functions

bbbd190

Ref: #561

wlandau mentioned this issue Jan 3, 2019

Custom drake backends and new usage of hasty mode ropensci-books/drake#57

Closed

wlandau closed this as completed Jan 4, 2019

wlandau added the difficulty: advanced label Jan 4, 2019

This was referenced Jan 5, 2019

Fix #650 and #651 #652

Merged

Should we abandon development in favor of drake.hasty? wlandau/workers#10

Closed

Direction for development wlandau/drake.hasty#1

Open

wlandau mentioned this issue Jan 13, 2019

Initial planning phase: efficient data management for persistent workers #672

Closed

bart1 mentioned this issue Feb 22, 2019

Dynamically scale clustermq workers #751

Closed

adrfantini mentioned this issue Mar 2, 2019

Implement futures for st_apply r-spatial/stars#124

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove all non-clustermq parallel backends? #561

Remove all non-clustermq parallel backends? #561

wlandau commented Oct 27, 2018 •

edited

Loading

mschubert commented Oct 27, 2018 via email

wlandau commented Oct 28, 2018 •

edited

Loading

idavydov commented Nov 2, 2018

bart1 commented Nov 2, 2018 •

edited

Loading

HenrikBengtsson commented Nov 2, 2018

wlandau commented Nov 4, 2018 •

edited

Loading

wlandau commented Nov 4, 2018

violetcereza commented Nov 4, 2018

wlandau commented Nov 4, 2018

krlmlr commented Nov 7, 2018

wlandau commented Nov 9, 2018

wlandau commented Nov 9, 2018

wlandau commented Nov 11, 2018

violetcereza commented Nov 14, 2018

HenrikBengtsson commented Dec 26, 2018

kendonB commented Dec 26, 2018

mschubert commented Dec 26, 2018 •

edited

Loading

wlandau commented Dec 27, 2018 •

edited

Loading

wlandau commented Dec 27, 2018

wlandau commented Jan 3, 2019

wlandau commented Jan 3, 2019 •

edited

Loading

wlandau commented Jan 3, 2019

wlandau commented Jan 3, 2019 •

edited

Loading

wlandau commented Jan 4, 2019

HenrikBengtsson commented Jan 4, 2019

brendanf commented Feb 4, 2019

Remove all non-clustermq parallel backends? #561

Remove all non-clustermq parallel backends? #561

Comments

wlandau commented Oct 27, 2018 • edited Loading

Background

Proposal

Benefits

Your help

How will this affect you as a user?

What am I missing?

Related issues

cc

mschubert commented Oct 27, 2018 via email

wlandau commented Oct 28, 2018 • edited Loading

idavydov commented Nov 2, 2018

bart1 commented Nov 2, 2018 • edited Loading

HenrikBengtsson commented Nov 2, 2018

wlandau commented Nov 4, 2018 • edited Loading

wlandau commented Nov 4, 2018

violetcereza commented Nov 4, 2018

wlandau commented Nov 4, 2018

krlmlr commented Nov 7, 2018

wlandau commented Nov 9, 2018

wlandau commented Nov 9, 2018

wlandau commented Nov 11, 2018

violetcereza commented Nov 14, 2018

HenrikBengtsson commented Dec 26, 2018

kendonB commented Dec 26, 2018

mschubert commented Dec 26, 2018 • edited Loading

wlandau commented Dec 27, 2018 • edited Loading

wlandau commented Dec 27, 2018

wlandau commented Jan 3, 2019

wlandau commented Jan 3, 2019 • edited Loading

wlandau commented Jan 3, 2019

wlandau commented Jan 3, 2019 • edited Loading

wlandau commented Jan 4, 2019

HenrikBengtsson commented Jan 4, 2019

brendanf commented Feb 4, 2019

wlandau commented Oct 27, 2018 •

edited

Loading

wlandau commented Oct 28, 2018 •

edited

Loading

bart1 commented Nov 2, 2018 •

edited

Loading

wlandau commented Nov 4, 2018 •

edited

Loading

mschubert commented Dec 26, 2018 •

edited

Loading

wlandau commented Dec 27, 2018 •

edited

Loading

wlandau commented Jan 3, 2019 •

edited

Loading

wlandau commented Jan 3, 2019 •

edited

Loading