Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stdenv: single make jobserver across multiple nix builds #143820

Closed
wants to merge 175 commits into from
Closed

stdenv: single make jobserver across multiple nix builds #143820

wants to merge 175 commits into from

Conversation

pennae
Copy link
Contributor

@pennae pennae commented Oct 30, 2021

Motivation for this change

make -jN -lN in stdenv is a very blunt instrument. it works well when max-jobs=1, but as nix-level paralellism increases it becomes increasingly deficient. starting from a low-load situation we start max-jobs * N compilers, loadavg goes through the roof, the -lN load limit kicks in and inhibits new compilers starting until loadavg has fallen below N—at which point all make instances spawn a lot of new compilers and loadavg goes through the roof again. this oscillation leaves the system underutilized in low phases and overcommitted in high phases.

testing the current stdenv against a jobserver with 26 tokens on a 12C/24T machine shows that parallel builds of llvm_{8..11} run about 7% faster (35:52min for stdenv, 33:30min with jobserver), a larger build of llvm{5..13} is about about 11% faster (1:27h for stdenv, 1:17h with jobserver). (removing the -l from stdenv also improves utilization but is less efficient. preliminary testing here shows that -l${1.5 * N} may be a good alternative to -lN as used currently, #141266 could be a good vector to go for that instead of this whole mess. [more testing says that -l2N would be a minimum to get better utilization, but so far every -l setting we've tried has produced some underutilization except excessive large numbers like 6N or higher])

nothing in here should be regarded as a final suggestion in any way, it's more of a "hey look, this might just work". as such it's extremely rough around the edges, eg to use the jobserver the experimenter currently has to bring a /jobserver fifo filled with tokens into the nix sandbox:

nix-build -E 'with import ./. {}; pkgs.callPackage ./pkgs/os-specific/linux/nixos-jobserver {}'
touch /tmp/jobserver
sudo ./result -t 24 -g nixbld /tmp/jobserver&
nix build --extra-sandbox-paths "/jobserver=/tmp/jobserver" #...

is this something worth pursuing? a 10% speedup for hydra does seem tempting

todos before this is more generally usable:

  • re-add $NIX_BUILD_CORES support (ioctl on the jobserver fd should do it)
  • maybe port from cuse to fuse to not require root for everything (mmap problems may be solved by reporting jobserver file size as 0 at all times, needs testing)
  • usability of tools (error messages, runtime configuration, etc)
  • add support for other build systems (cargo can honor MAKEFLAGS, maybe others too)
  • nixos module
  • make configurable which env var(s) jscall adds jobserver information to
  • metrics?
  • <your suggestion here>

cc @vcunat @Artturin

@github-actions github-actions bot added the 6.topic: stdenv Standard environment label Oct 30, 2021
@pennae
Copy link
Contributor Author

pennae commented Nov 1, 2021

the token loss problem could be solved on linux with a little fuse filesystem that emulates pipes, tracks how many tokens were taken from each file description of a pipe, and returns missing tokens when a pipe file description is released. darwin also has a fuse implementation, so there may just be a chance that if this works on linux it could also work reasonably well on darwin at some point.

@pennae pennae mentioned this pull request Nov 1, 2021
12 tasks
@pennae
Copy link
Contributor Author

pennae commented Nov 3, 2021

updated with a CUSE-based jobserver fifo-alike. this variant survives nix-daemon killing builds hard without losing tokens, and in theory it could be implemented as a FUSE filesystem too (eg for darwin, or if the CUSE variant is deemed too exotic). at the moment it's CUSE mostly because FUSE doesn't seem to have a way to forbid mmapping of files, and while mmapping a token fd wouldn't break anything it would consume 4k tokens in one fell swoop and not give them back for a good while.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/rfc-global-jobserver-for-all-nix-builds/15923/1

@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 6.topic: rust 8.has: documentation This PR adds or changes documentation 8.has: module (update) This PR changes an existing module in `nixos/` 6.topic: GNOME GNOME desktop environment and its underlying platform 6.topic: haskell 6.topic: python 6.topic: vim labels Nov 14, 2021
@pennae
Copy link
Contributor Author

pennae commented Nov 14, 2021

updated with support for cargo, reintroduced per-build core limits, and added a nixos module. getting there 🤞

@github-actions github-actions bot removed 6.topic: vim 6.topic: haskell 6.topic: python 6.topic: GNOME GNOME desktop environment and its underlying platform labels Nov 14, 2021
@ofborg ofborg bot added the 8.has: package (new) This PR adds a new package label Nov 14, 2021
@ofborg ofborg bot requested review from lovek323, edolstra and np November 14, 2021 19:06
@@ -25,7 +23,7 @@ else
throw "Unsupported architecture";

buildStdenv = if stdenv.isDarwin then
overrideCC clangStdenv [ clang_9 llvmPackages_9.llvm llvmPackages_9.lld ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this slip in by mistake?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope. without this change this package will fail to build trying to build stdenv.jscall (because after this override stdenv.cc is a list, which is illegal. the jscall drv interpolates stdenv.cc into a bash string directly, thus failing here)

NickCao and others added 18 commits May 26, 2024 10:17
python312Packages.plugwise: 0.37.8 -> 0.37.9
…s.django-modeltranslation

python311Packages.django-modeltranslation: 0.18.13 -> 0.19.0
firebase-tools: 13.10.0 -> 13.10.1
python312Packages.foolscap: fix build
openjdk17, openjfx17, corretto17: update
jetbrains-jdk: 17.0.11-b1000.8 -> 17.0.11-b1207.24)
…s.ytmusicapi

python311Packages.ytmusicapi: 1.7.1 -> 1.7.2
binutils: Add --undefined-version on lld 17+
libxml2: Test for pthread_create instead of pthread_join on FreeBSD
I have no further plan to review CppNix code anymore as I will dedicate
myself to Lix development.

Signed-off-by: Raito Bezarius <[email protected]>
initially only make and cargo support using the jobserver. other build
systems may follow suit later.
@github-actions github-actions bot added 6.topic: python 6.topic: GNOME GNOME desktop environment and its underlying platform 6.topic: policy discussion labels May 26, 2024
@NixOS NixOS locked as too heated and limited conversation to collaborators May 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
6.topic: GNOME GNOME desktop environment and its underlying platform 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 6.topic: policy discussion 6.topic: python 6.topic: rust 6.topic: stdenv Standard environment 8.has: documentation This PR adds or changes documentation 8.has: module (update) This PR changes an existing module in `nixos/` 8.has: package (new) This PR adds a new package 10.rebuild-darwin: 501+ 10.rebuild-darwin: 5001+ 10.rebuild-darwin-stdenv This PR causes stdenv to rebuild 10.rebuild-linux: 501+ 10.rebuild-linux: 5001+ 10.rebuild-linux-stdenv This PR causes stdenv to rebuild significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.