Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GNU make jobserver style "fifo" support #2263

Conversation

stefanb2
Copy link
Contributor

Part 3 in the implementation series for PR #1139: add support for the POSIX jobserver style fifo, introduced in GNU make 4.4, that uses a named pipe as communication channel between master & clients.

  • add an optional argument to the command line option -m / --tokenpool-master to specify the jobserver style.
    • POSIX supports fifo (default) and pipe
    • Win32 supports sem (default)
    • specifying an unknown style will cause ninja to abort with a fatal error
  • update client & server parts of GNUmakeTokenPoolPosix to support the fifo style
  • add tests for fifo style

Documentation for GNU make jobserver

http://make.mad-scientist.net/papers/jobserver-implementation/
https://www.gnu.org/software/make/manual/html_node/Job-Slots.html#Job-Slots

Fixes #1139

Stefan Becker and others added 20 commits February 19, 2023 16:01
- add new TokenPool interface
- GNU make implementation for TokenPool parses and verifies the magic
  information from the MAKEFLAGS environment variable
- RealCommandRunner tries to acquire TokenPool
  * if no token pool is available then there is no change in behaviour
- When a token pool is available then RealCommandRunner behaviour
  changes as follows
  * CanRunMore() only returns true if TokenPool::Acquire() returns true
  * StartCommand() calls TokenPool::Reserve()
  * WaitForCommand() calls TokenPool::Release()

Documentation for GNU make jobserver

  http://make.mad-scientist.net/papers/jobserver-implementation/

Fixes ninja-build#1139
Improve on the original jobserver client implementation. This makes
ninja a more aggressive GNU make jobserver client.

- add monitor interface to TokenPool
- TokenPool is passed down when main loop indicates that more work is
  ready and would be allowed to start if a token becomes available
- posix: update DoWork() to monitor TokenPool read file descriptor
- WaitForCommand() exits when DoWork() sets token flag
- Main loop starts over when WaitForCommand() sets token exit status
This emulates the behaviour of GNU make.

- add parallelism_from_cmdline flag to build configuration
- set the flag when -jN is given on command line
- pass the flag to TokenPool::Get()
- GNUmakeTokenPool::Setup()
  * prints a warning when the flag is true and jobserver was detected
  * returns false, i.e. jobserver will be ignored
- ignore config.parallelism in CanRunMore() when we have a valid
  TokenPool, because it gets always initialized to a default when not
  given on the command line
This emulates the behaviour of GNU make.

- build: make a copy of max_load_average and pass it to TokenPool.
- GNUmakeTokenPool: if we detect a jobserver and a valid -lN argument in
  MAKEFLAGS then set max_load_average to N.
- replace printf() with calls to LinePrinter
- print GNU make jobserver message only when verbose build is requested
- fix Windows build error in no-op TokenPool implementation
- improve Acquire() to block for a maximum of 100ms
- address review comments
- TokenPool setup
- GetMonitorFd() API
- implicit token and tokens in jobserver pipe
- Acquire() / Reserve() / Release() protocol
- Clear() API
- add TokenPoolTest stub to provide TokenPool::GetMonitorFd()
- add two tests
  * both tests set up a dummy GNUmake jobserver pipe
  * both tests call DoWork() with TokenPoolTest
  * test 1: verify that DoWork() detects when a token is available
  * test 2: verify that DoWork() works as before without a token
- the tests are not compiled in under Windows
Add tests that verify the token functionality of the builder main loop.
We replace the default fake command runner with a special version where
the tests can control each call to AcquireToken(), CanRunMore() and
WaitForCommand().
GNU make uses a semaphore as jobserver protocol on Win32. See also

   https://www.gnu.org/software/make/manual/html_node/Windows-Jobserver.html

Usage is pretty simple and straightforward, i.e. WaitForSingleObject()
to obtain a token and ReleaseSemaphore() to return it.

Unfortunately subprocess-win32.cc uses an I/O completion port (IOCP).
IOCPs aren't waitable objects, i.e. we can't use WaitForMultipleObjects()
to wait on the IOCP and the token semaphore at the same time.

Therefore GNUmakeTokenPoolWin32 creates a child thread that waits on the
token semaphore and posts a dummy I/O completion status on the IOCP when
it was able to obtain a token. That unblocks SubprocessSet::DoWork() and
it can then check if a token became available or not.

- split existing GNUmakeTokenPool into common and platform bits
- add GNUmakeTokenPool interface
- move the Posix bits to GNUmakeTokenPoolPosix
- add the Win32 bits as GNUmakeTokenPoolWin32
- move Setup() method up to TokenPool interface
- update Subprocess & TokenPool tests accordingly
- remove unnecessary "struct" from TokenPool
- add PAPCFUNC cast to QueryUserAPC()
- remove hard-coded MAKEFLAGS string from win32
- remove useless build test CompleteNoWork
- rename TokenPoolTest to TestTokenPool
- add tokenpool modules to CMake build
- remove unused no-op TokenPool implementation
- fix errors flagged by codespell & clang-tidy
- POSIX GNUmakeTokenPool should return same token
- address review comments from

ninja-build#1140 (comment)
ninja-build#1140 (review)
ninja-build#1140 (review)
ninja-build#1140 (comment)
ninja-build#1140 (comment)
Make space to add new API to set up token pool master.
This method will set up to the token pool master.
When this option is given on the command line then ninja will set up a
token pool master instead of being a token pool client.
- don't set up token pool for serial builds
- add implementation specific CreatePool() & SetEnv() methods
- generate contents for MAKEFLAGS variable to pass down to children
Set up a pipe (POSIX) or semaphore (win32) with N tokens.
GNU make 4.4 introduced a new jobserver style "fifo" for POSIX systems
which passes a named pipe down to the clients.

- update auth parser to recognize "fifo:<name>" format
- open named pipe for reading and writing
- make sure the file descriptors are closed in the destructor
- add 2 tests that aren't compiled for WIN32
GNU make 4.4 introduced a new command line option --jobserver-style with
which the user can choose a different style than the default. For ninja
we make the style an optional argument to the -m/--tokenpool-master
option instead.

- add argument value to BuildConfig and pass it down via SetupMaster()
  to CreatePool()
- POSIX supports the styles "fifo" (default) and "pipe"
- Win32 only supports the style "sem" (default)
- an unsupported style causes ninja to abort with a fatal error
- as the "fifo" style isn't implemented yet, hard-code the tests to the
  "pipe" style to make them pass
- replace "OPTIONAL_ARG" with "optional_argument" in the getopt
  implementation to match the getopt_long() man page.
GNU make 4.4 introduced a new jobserver style "fifo" for POSIX systems
which passes a named pipe down to the clients.

- split CreatePool() into CreateFifo(), CreatePipe() & CreateTokens()
- add implementation to CreateFifo() which creates a named pipe in the
  temp directory
- make sure the named pipe ise removed in the destructor
- update non-WIN32 tests to support "fifo" style as default
- add a test for "pipe" style that isn't compiled for WIN32
double max_load_average,
const char* style) {
// no need to set up token pool for serial builds
if (parallelism == 1)
Copy link

@Ext3h Ext3h Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not true - you still need to setup a job server, as it's the responsibility of the job server to communicate the serial nature to child processes. There's not only make which is default-serial, but also gcc and co. which are default-parallel.

bool can_run_more =
failures_allowed &&
plan_.more_ready() &&
command_runner_->CanRunMore();
Copy link

@Ext3h Ext3h Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This CanRunMore() is kind of wrong. If there is a token pool, it's the sole authority on parallelism. No matter whether it's doing this based on load measurement or token counting.

If you have an AcquireToken() now, then you should only be calling CanRunMore() as an implementation detail within AcquireToken in the fallback case when a token pool is absent.

// token became available
if (subproc == NULL) {
result->status = ExitTokenAvailable;
return true;
Copy link

@Ext3h Ext3h Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be right... subprocs_.NextFinished() in that loop above has popped any number of completed processes.

But tokens_->Release() is only being called conditionally and only once at most.

So this is leaving tokens incorrectly marked as "in-use". You would have needed to release the token together with finished_.push(*i);.

Copy link

@Ext3h Ext3h Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... no, that loop above is looping while it does not pop anything. And you are also only quitting if no process had finished, and you were certain to have gotten a token instead.

Okay, this is working then, just really hard to follow and the naming got seriously confusing.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might have wanted to put it in the else to the above if, and put a comment on the loop explaining the non-trivial exit condition.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the methods used here are definitely overdue for renaming, their names don't reflect what they do.

void GNUmakeTokenPool::Release() {
available_++;
used_--;
if (available_ > 1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't do that synchronously on every Release(). Return tokens in bulk when you start blocking on sub-process completion, but before that, keep holding on!

bool GNUmakeTokenPool::SetupClient(bool ignore,
bool verbose,
double& max_load_average) {
const char* value = GetEnv("MAKEFLAGS");
Copy link

@Ext3h Ext3h Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You appear to have missed the part where those flags can also occur on the command line, and have yet to be copied into the environment block for child processes.

Accidental variable expansion of $(MAKEFLAGS) is common enough.

const char* jobserver = strstr(value, "--jobserver-fds=");
if (!jobserver)
// GNU make => 4.2
jobserver = strstr(value, "--jobserver-auth=");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume --jobserver-auth= has precedence over --jobserver-fds= please.

}

void GNUmakeTokenPool::Clear() {
while (used_ > 0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you got it right, you should always end up with used_ == 0 at this point, given that all sub-processes must have been terminated.

// Temporarily replace SIGCHLD handler with our own
memset(&act, 0, sizeof(act));
act.sa_handler = CloseDupRfd;
if (sigaction(SIGCHLD, &act, &old_act) == 0) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?... Unlike make, ninja doesn't monitor the child as a child process via SIGCHLD, but merely tracks the pipe opened to it. Meaning it's not even subject to the same race conditions which make had to handle.

Copy link

@Ext3h Ext3h Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... this code used to provide non-blocking reads in a legacy portable way, not recognizing O_NONBLOCK as a legit way of doing this.

The SIGCHLD part is merely an optimization.

Still, nothing you wouldn't had rather done with a non-blocking handle instead. There are a couple of bad edge-cases where this will now result in a 100ms stall.


free(filename);

rfd_ = rfd;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't you forget SetCloseOnExec for the FIFO case? You almost certainly didn't want them to be inherited for the named-pipe case.

Copy link

@Ext3h Ext3h Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you did, then you should have also exported --jobserver-fds= for downward compatibility.

// Temporarily replace SIGCHLD handler with our own
memset(&act, 0, sizeof(act));
act.sa_handler = CloseDupRfd;
if (sigaction(SIGCHLD, &act, &old_act) == 0) {
Copy link

@Ext3h Ext3h Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... this code used to provide non-blocking reads in a legacy portable way, not recognizing O_NONBLOCK as a legit way of doing this.

The SIGCHLD part is merely an optimization.

Still, nothing you wouldn't had rather done with a non-blocking handle instead. There are a couple of bad edge-cases where this will now result in a 100ms stall.

struct timeval timeout = { 0, 0 };
FD_ZERO(&set);
FD_SET(rfd_, &set);
int ret = select(rfd_ + 1, &set, NULL, NULL, &timeout);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated code between the code updating SubprocessSet::token_available_ and this select here.

pollfd *pfd = &fds[nfds - 1];
if (pfd->fd >= 0) {
assert(pfd->fd == tokens->GetMonitorFd());
if (pfd->revents != 0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You did only care for POLLIN | POLLPRI event bits, unlike for the fds[cur_nfd++].revents where also the error states were of interest.


// command completed
if (tokens_)
tokens_->Release();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to be done after result->status = subproc->Finish() - the process is still running, only the pipe was broken yet.

Unrelated - but this is also a spot for optimization, as Ninja will get stalled here if a long-running process detaches from terminal early.

@stefanb2
Copy link
Contributor Author

The project has decided to close PR #1140 on which this PR is based on. Therefore this PR can never be merged.

@stefanb2 stefanb2 closed this Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add GNU make jobserver client support
2 participants