-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for asynchronous building and pushing of profiles #271
base: master
Are you sure you want to change the base?
Conversation
src/cli.rs
Outdated
// await both the remote builds and the local builds to speed up deployment times | ||
try_join!( | ||
// remote builds can be run asynchronously since they do not affect the local machine | ||
try_join_all(remote_builds.into_iter().map(|data| async { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From try_join_all
docs:
If any future returns an error then all other futures will be canceled and an error will be returned immediately
IMO, it might be better to wait for all futures to be completed/failed instead of canceling everything on the first failure, thus the non-failed builds will complete despite the error in an unrelated build. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the builds are started concurrently, the build progress information is pretty much useless and flickers a lot.
I'm not sure how viable it is to implement, but perhaps, it's possible to collect stdout
and stderr
for each future, if it is, we can collect and report detailed output synchronously once all remote builds are completed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, it might be better to wait for all futures to be completed/failed instead of canceling everything on the first failure, thus the non-failed builds will complete despite the error in an unrelated build. What do you think?
I'll see how easy it would be to implement that, if it is that might be worth checking out so that consecutive builds don't have to build as much.
perhaps, it's possible to collect stdout and stderr for each future, if it is, we can collect and report detailed output synchronously once all remote builds are completed
I was considering that maybe one line from the bottom per build would be neat e.g.:
[previous output]
host 1: [build progress]
host 2: [build progress]
This will, however, require some more fine grained control over the terminal output i.e. raw output instead of cooked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be neat
Agree, but indeed sounds quite non-trivial
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, another somewhat conceptual concern. Profiles can potentially depend on each other (see profilesOrder
flake option), so perhaps it's worth doing the parallelization on per-host basis instead of per-profile
bb4a111
to
1cc6e35
Compare
mentioning #46 for visibility the current version works as expected, the only issue is the log flickering with multiple invocations of nix writing to stdout in parallel. I was considering utilising the raw-internal-json output similarly to how nix-output-monitor does it but that might be out of scope here 🤔 |
activation is fully synchronous but that is usually the part that takes the least amount of time |
src/cli.rs
Outdated
async { | ||
// run local builds synchronously to prevent hardware deadlocks | ||
for data in &local_builds { | ||
deploy::push::build_profile(data).await.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a huge fan of using "partial functions", isn't unhandled panic from unwrap
going to kill the main thread?
Also, AFAICS, at the moment we completely ignore non-remote build/push results
The new approach seems good 👍 |
52f5c53
to
dd7ec8c
Compare
build remote builds (remote_build = true) asynchronously to speed up the deployment process. local builds should not be run asynchronously to prevent running into hardware deadlocks
dd7ec8c
to
42ae995
Compare
there are a still lot of unwraps that need to be handled better, this pr is still far from finished but progress has been made: I added some progress indicators using the indicatif crate for each concurrent build 🥳 This is what they look like (image the spinner spinning and the text changing as nix builds stuff: For remote and builds each host gets its own progress spinner that is reused for all of that hosts profiles since they are built in order anyways. This could be adjusted for multi-profile hosts since indicatif also supports nested tree-like progress bars. I changed some data structures around to make working with them in async contexts easier. |
the custom implementation handles indicatif's progressbars better so as to not leave orphaned progress bar fragments when logging while a progressbar is active
With these two last commits I would declare this PR somewhat usable 🥳 If anyone wants to try it out with their own setups I would be very happy to hear how it works for you, I am always open for feedback of any kind |
Hi all, This works well for these three systems, which are a pair of VPSes and a local (to me) server; were I trying to deploy all the systems in my home network, I would prefer instead that my development machine (personal laptop) be able to push the profiles to a single server (technetium), so that it can then act as a substituter for e.g. aluminium and nickel. Maybe provide a "build-host" attribute per-host, with a default of the hostself? e.g. deploy.nickel.buildHost = "technetium"; deploy.littlecreek.buildHost = "littlecreek"; |
Hi there, our team at Pyro utilized this PR and observed a massive speedup deploying to a our node cluster and for our initial deployments we see a huge speed increase of at minimum 2x, however there might be a problem. I'm deploying from Ubuntu WSL, so that might be the culprit but it seems after every deployment in the same terminal session only half will be working in parallel, and then half again, until its back to single threaded (even though the spinners are still there). It might not be halfing but there is at minimum diminishing returns over multiple command invocations. Here's a successful one if you're curious, I haven't been able to record the above yet though. I'm really excited for this to get merged and I'll test anything necessary to speed things along. https://asciinema.org/a/dirahL9YYWFaT0lORkU98nX8s P.s. A huge potential speedup would be processing checks in a parallel fashion as well, as that takes up a huge chunk of time, though I'm not sure how viable this is. |
oh that is great to hear! thanks for the feedback.
hmm, I'll have to investigate how that might happen. I'll add some more debug output for each thread so it'll easier to debug on your side. This pr pretty much just spawns a bunch of threads using high-level abstractions so I'm curious how that might happen 🤔
unfortunately, the checks are limited by nix's eval speed. there is some work going on on detsys's side (see e.g. https://determinate.systems/posts/parallel-nix-eval/) but afaiu nix flake checks are currently single-threaded in nature. you could attempt to get around that by checking/building all systems in parallel batches using |
NB: I have opened this Pull Request as a draft since I intend to continue working on it by improving the logging output.
Since the builds are started concurrently, the build progress information is pretty much useless and flickers a lot.
Problem
The current implementation builds every single profile synchronously, including remote builds.
Since remote builds were introduced back in #175, remote builds could be pipelined to deploy
new configurations in a more time-efficient manner.
Solution
Build configurations that support remote builds concurrently.
Sidenote: I have decided to continue building local builds in a synchronous manner because I have run into hardware deadlocks previously when trying to evaluate and/or build multiple systems at the same time.
I have tested this code by deploying to my personal infrastructure (https://github.com/philtaken/dotfiles) and it works as intended.