-
-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stdenv: single make jobserver across multiple nix builds #143820
Conversation
the token loss problem could be solved on linux with a little fuse filesystem that emulates pipes, tracks how many tokens were taken from each file description of a pipe, and returns missing tokens when a pipe file description is released. darwin also has a fuse implementation, so there may just be a chance that if this works on linux it could also work reasonably well on darwin at some point. |
updated with a CUSE-based jobserver fifo-alike. this variant survives nix-daemon killing builds hard without losing tokens, and in theory it could be implemented as a FUSE filesystem too (eg for darwin, or if the CUSE variant is deemed too exotic). at the moment it's CUSE mostly because FUSE doesn't seem to have a way to forbid mmapping of files, and while mmapping a token fd wouldn't break anything it would consume 4k tokens in one fell swoop and not give them back for a good while. |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/rfc-global-jobserver-for-all-nix-builds/15923/1 |
updated with support for cargo, reintroduced per-build core limits, and added a nixos module. getting there 🤞 |
@@ -25,7 +23,7 @@ else | |||
throw "Unsupported architecture"; | |||
|
|||
buildStdenv = if stdenv.isDarwin then | |||
overrideCC clangStdenv [ clang_9 llvmPackages_9.llvm llvmPackages_9.lld ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this slip in by mistake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope. without this change this package will fail to build trying to build stdenv.jscall (because after this override stdenv.cc is a list, which is illegal. the jscall drv interpolates stdenv.cc into a bash string directly, thus failing here)
python312Packages.plugwise: 0.37.8 -> 0.37.9
zoom-us: 6.0.2.4680 -> 6.0.10.5325
kde-rounded-corners: 0.6.5 -> 0.6.6
jnv: 0.2.2 -> 0.2.3
flashmq: 1.13.0 -> 1.13.1
…s.django-modeltranslation python311Packages.django-modeltranslation: 0.18.13 -> 0.19.0
firebase-tools: 13.10.0 -> 13.10.1
python312Packages.foolscap: fix build
openjdk17, openjfx17, corretto17: update
jetbrains-jdk: 17.0.11-b1000.8 -> 17.0.11-b1207.24)
…s.ytmusicapi python311Packages.ytmusicapi: 1.7.1 -> 1.7.2
binutils: Add --undefined-version on lld 17+
libxml2: Test for pthread_create instead of pthread_join on FreeBSD
I have no further plan to review CppNix code anymore as I will dedicate myself to Lix development. Signed-off-by: Raito Bezarius <[email protected]>
initially only make and cargo support using the jobserver. other build systems may follow suit later.
Signed-off-by: Raito Bezarius <[email protected]>
Motivation for this change
make -jN -lN
in stdenv is a very blunt instrument. it works well when max-jobs=1, but as nix-level paralellism increases it becomes increasingly deficient. starting from a low-load situation we start max-jobs * N compilers, loadavg goes through the roof, the-lN
load limit kicks in and inhibits new compilers starting until loadavg has fallen below N—at which point all make instances spawn a lot of new compilers and loadavg goes through the roof again. this oscillation leaves the system underutilized in low phases and overcommitted in high phases.testing the current stdenv against a jobserver with 26 tokens on a 12C/24T machine shows that parallel builds of llvm_{8..11} run about 7% faster (35:52min for stdenv, 33:30min with jobserver), a larger build of llvm{5..13} is about about 11% faster (1:27h for stdenv, 1:17h with jobserver). (
removing the[more testing says that-l
from stdenv also improves utilization but is less efficient. preliminary testing here shows that-l${1.5 * N}
may be a good alternative to-lN
as used currently, #141266 could be a good vector to go for that instead of this whole mess.-l2N
would be a minimum to get better utilization, but so far every-l
setting we've tried has produced some underutilization except excessive large numbers like 6N or higher])nothing in here should be regarded as a final suggestion in any way, it's more of a "hey look, this might just work". as such it's extremely rough around the edges, eg to use the jobserver the experimenter currently has to bring a
/jobserver
fifo filled with tokens into the nix sandbox:is this something worth pursuing? a 10% speedup for hydra does seem tempting
todos before this is more generally usable:
cc @vcunat @Artturin