-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GNU make jobserver style "fifo" support #2263
Add GNU make jobserver style "fifo" support #2263
Conversation
- add new TokenPool interface - GNU make implementation for TokenPool parses and verifies the magic information from the MAKEFLAGS environment variable - RealCommandRunner tries to acquire TokenPool * if no token pool is available then there is no change in behaviour - When a token pool is available then RealCommandRunner behaviour changes as follows * CanRunMore() only returns true if TokenPool::Acquire() returns true * StartCommand() calls TokenPool::Reserve() * WaitForCommand() calls TokenPool::Release() Documentation for GNU make jobserver http://make.mad-scientist.net/papers/jobserver-implementation/ Fixes ninja-build#1139
Improve on the original jobserver client implementation. This makes ninja a more aggressive GNU make jobserver client. - add monitor interface to TokenPool - TokenPool is passed down when main loop indicates that more work is ready and would be allowed to start if a token becomes available - posix: update DoWork() to monitor TokenPool read file descriptor - WaitForCommand() exits when DoWork() sets token flag - Main loop starts over when WaitForCommand() sets token exit status
This emulates the behaviour of GNU make. - add parallelism_from_cmdline flag to build configuration - set the flag when -jN is given on command line - pass the flag to TokenPool::Get() - GNUmakeTokenPool::Setup() * prints a warning when the flag is true and jobserver was detected * returns false, i.e. jobserver will be ignored - ignore config.parallelism in CanRunMore() when we have a valid TokenPool, because it gets always initialized to a default when not given on the command line
This emulates the behaviour of GNU make. - build: make a copy of max_load_average and pass it to TokenPool. - GNUmakeTokenPool: if we detect a jobserver and a valid -lN argument in MAKEFLAGS then set max_load_average to N.
- replace printf() with calls to LinePrinter - print GNU make jobserver message only when verbose build is requested
- fix Windows build error in no-op TokenPool implementation - improve Acquire() to block for a maximum of 100ms - address review comments
- TokenPool setup - GetMonitorFd() API - implicit token and tokens in jobserver pipe - Acquire() / Reserve() / Release() protocol - Clear() API
- add TokenPoolTest stub to provide TokenPool::GetMonitorFd() - add two tests * both tests set up a dummy GNUmake jobserver pipe * both tests call DoWork() with TokenPoolTest * test 1: verify that DoWork() detects when a token is available * test 2: verify that DoWork() works as before without a token - the tests are not compiled in under Windows
Add tests that verify the token functionality of the builder main loop. We replace the default fake command runner with a special version where the tests can control each call to AcquireToken(), CanRunMore() and WaitForCommand().
GNU make uses a semaphore as jobserver protocol on Win32. See also https://www.gnu.org/software/make/manual/html_node/Windows-Jobserver.html Usage is pretty simple and straightforward, i.e. WaitForSingleObject() to obtain a token and ReleaseSemaphore() to return it. Unfortunately subprocess-win32.cc uses an I/O completion port (IOCP). IOCPs aren't waitable objects, i.e. we can't use WaitForMultipleObjects() to wait on the IOCP and the token semaphore at the same time. Therefore GNUmakeTokenPoolWin32 creates a child thread that waits on the token semaphore and posts a dummy I/O completion status on the IOCP when it was able to obtain a token. That unblocks SubprocessSet::DoWork() and it can then check if a token became available or not. - split existing GNUmakeTokenPool into common and platform bits - add GNUmakeTokenPool interface - move the Posix bits to GNUmakeTokenPoolPosix - add the Win32 bits as GNUmakeTokenPoolWin32 - move Setup() method up to TokenPool interface - update Subprocess & TokenPool tests accordingly
- remove unnecessary "struct" from TokenPool - add PAPCFUNC cast to QueryUserAPC() - remove hard-coded MAKEFLAGS string from win32 - remove useless build test CompleteNoWork - rename TokenPoolTest to TestTokenPool - add tokenpool modules to CMake build - remove unused no-op TokenPool implementation - fix errors flagged by codespell & clang-tidy - POSIX GNUmakeTokenPool should return same token - address review comments from ninja-build#1140 (comment) ninja-build#1140 (review) ninja-build#1140 (review) ninja-build#1140 (comment) ninja-build#1140 (comment)
Make space to add new API to set up token pool master.
This method will set up to the token pool master.
When this option is given on the command line then ninja will set up a token pool master instead of being a token pool client.
- don't set up token pool for serial builds - add implementation specific CreatePool() & SetEnv() methods - generate contents for MAKEFLAGS variable to pass down to children
Set up a pipe (POSIX) or semaphore (win32) with N tokens.
GNU make 4.4 introduced a new jobserver style "fifo" for POSIX systems which passes a named pipe down to the clients. - update auth parser to recognize "fifo:<name>" format - open named pipe for reading and writing - make sure the file descriptors are closed in the destructor - add 2 tests that aren't compiled for WIN32
GNU make 4.4 introduced a new command line option --jobserver-style with which the user can choose a different style than the default. For ninja we make the style an optional argument to the -m/--tokenpool-master option instead. - add argument value to BuildConfig and pass it down via SetupMaster() to CreatePool() - POSIX supports the styles "fifo" (default) and "pipe" - Win32 only supports the style "sem" (default) - an unsupported style causes ninja to abort with a fatal error - as the "fifo" style isn't implemented yet, hard-code the tests to the "pipe" style to make them pass - replace "OPTIONAL_ARG" with "optional_argument" in the getopt implementation to match the getopt_long() man page.
GNU make 4.4 introduced a new jobserver style "fifo" for POSIX systems which passes a named pipe down to the clients. - split CreatePool() into CreateFifo(), CreatePipe() & CreateTokens() - add implementation to CreateFifo() which creates a named pipe in the temp directory - make sure the named pipe ise removed in the destructor - update non-WIN32 tests to support "fifo" style as default - add a test for "pipe" style that isn't compiled for WIN32
double max_load_average, | ||
const char* style) { | ||
// no need to set up token pool for serial builds | ||
if (parallelism == 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not true - you still need to setup a job server, as it's the responsibility of the job server to communicate the serial nature to child processes. There's not only make which is default-serial, but also gcc
and co. which are default-parallel.
bool can_run_more = | ||
failures_allowed && | ||
plan_.more_ready() && | ||
command_runner_->CanRunMore(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This CanRunMore()
is kind of wrong. If there is a token pool, it's the sole authority on parallelism. No matter whether it's doing this based on load measurement or token counting.
If you have an AcquireToken()
now, then you should only be calling CanRunMore()
as an implementation detail within AcquireToken
in the fallback case when a token pool is absent.
// token became available | ||
if (subproc == NULL) { | ||
result->status = ExitTokenAvailable; | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can't be right... subprocs_.NextFinished()
in that loop above has popped any number of completed processes.
But tokens_->Release()
is only being called conditionally and only once at most.
So this is leaving tokens incorrectly marked as "in-use". You would have needed to release the token together with finished_.push(*i);
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... no, that loop above is looping while it does not pop anything. And you are also only quitting if no process had finished, and you were certain to have gotten a token instead.
Okay, this is working then, just really hard to follow and the naming got seriously confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might have wanted to put it in the else
to the above if
, and put a comment on the loop explaining the non-trivial exit condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the methods used here are definitely overdue for renaming, their names don't reflect what they do.
void GNUmakeTokenPool::Release() { | ||
available_++; | ||
used_--; | ||
if (available_ > 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't do that synchronously on every Release()
. Return tokens in bulk when you start blocking on sub-process completion, but before that, keep holding on!
bool GNUmakeTokenPool::SetupClient(bool ignore, | ||
bool verbose, | ||
double& max_load_average) { | ||
const char* value = GetEnv("MAKEFLAGS"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You appear to have missed the part where those flags can also occur on the command line, and have yet to be copied into the environment block for child processes.
Accidental variable expansion of $(MAKEFLAGS)
is common enough.
const char* jobserver = strstr(value, "--jobserver-fds="); | ||
if (!jobserver) | ||
// GNU make => 4.2 | ||
jobserver = strstr(value, "--jobserver-auth="); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume --jobserver-auth=
has precedence over --jobserver-fds=
please.
} | ||
|
||
void GNUmakeTokenPool::Clear() { | ||
while (used_ > 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you got it right, you should always end up with used_ == 0
at this point, given that all sub-processes must have been terminated.
// Temporarily replace SIGCHLD handler with our own | ||
memset(&act, 0, sizeof(act)); | ||
act.sa_handler = CloseDupRfd; | ||
if (sigaction(SIGCHLD, &act, &old_act) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?... Unlike make
, ninja
doesn't monitor the child as a child process via SIGCHLD
, but merely tracks the pipe opened to it. Meaning it's not even subject to the same race conditions which make
had to handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah... this code used to provide non-blocking reads in a legacy portable way, not recognizing O_NONBLOCK
as a legit way of doing this.
The SIGCHLD
part is merely an optimization.
Still, nothing you wouldn't had rather done with a non-blocking handle instead. There are a couple of bad edge-cases where this will now result in a 100ms stall.
|
||
free(filename); | ||
|
||
rfd_ = rfd; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't you forget SetCloseOnExec
for the FIFO case? You almost certainly didn't want them to be inherited for the named-pipe case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you did, then you should have also exported --jobserver-fds=
for downward compatibility.
// Temporarily replace SIGCHLD handler with our own | ||
memset(&act, 0, sizeof(act)); | ||
act.sa_handler = CloseDupRfd; | ||
if (sigaction(SIGCHLD, &act, &old_act) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah... this code used to provide non-blocking reads in a legacy portable way, not recognizing O_NONBLOCK
as a legit way of doing this.
The SIGCHLD
part is merely an optimization.
Still, nothing you wouldn't had rather done with a non-blocking handle instead. There are a couple of bad edge-cases where this will now result in a 100ms stall.
struct timeval timeout = { 0, 0 }; | ||
FD_ZERO(&set); | ||
FD_SET(rfd_, &set); | ||
int ret = select(rfd_ + 1, &set, NULL, NULL, &timeout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicated code between the code updating SubprocessSet::token_available_
and this select here.
pollfd *pfd = &fds[nfds - 1]; | ||
if (pfd->fd >= 0) { | ||
assert(pfd->fd == tokens->GetMonitorFd()); | ||
if (pfd->revents != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You did only care for POLLIN | POLLPRI
event bits, unlike for the fds[cur_nfd++].revents
where also the error states were of interest.
|
||
// command completed | ||
if (tokens_) | ||
tokens_->Release(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has to be done after result->status = subproc->Finish()
- the process is still running, only the pipe was broken yet.
Unrelated - but this is also a spot for optimization, as Ninja will get stalled here if a long-running process detaches from terminal early.
The project has decided to close PR #1140 on which this PR is based on. Therefore this PR can never be merged. |
Part 3 in the implementation series for PR #1139: add support for the POSIX jobserver style
fifo
, introduced in GNU make 4.4, that uses a named pipe as communication channel between master & clients.-m
/--tokenpool-master
to specify the jobserver style.fifo
(default) andpipe
sem
(default)GNUmakeTokenPoolPosix
to support thefifo
stylefifo
styleDocumentation for GNU make jobserver
http://make.mad-scientist.net/papers/jobserver-implementation/
https://www.gnu.org/software/make/manual/html_node/Job-Slots.html#Job-Slots
Fixes #1139