Add "-m <N>" option to dynamically limit jobs #1354

maxim-kuvyrkov · 2017-11-07T12:12:34Z

... on memory threshold. So far only Unix-style OS are supported.
No functional change for other configurations.

Last-second manual fix.

Comedy of errors.

metti · 2017-11-08T00:26:29Z

you can refer to a previous pull request (#660) that implemented this feature and its discussion back then.

tmilev · 2019-08-28T15:59:29Z

@maxim-kuvyrkov
I would like to have RAM memory throttling: would help a very tricky build failure in a complex situation. Giving my +1 to this effort.

Bidski · 2019-11-11T22:02:58Z

This would be a very useful feature to have. What will it take to get this merged?

skardach · 2021-09-21T08:39:22Z

Please correct me if I'm wrong but this pull request currently does not contain any functional code. Instead it's located on the limit-on-ram branch?

maxim-kuvyrkov · 2021-09-21T11:50:36Z

Please correct me if I'm wrong but this pull request currently does not contain any functional code. Instead it's located on the limit-on-ram branch?

Hi @skardach ,
That's correct, but I don't consider code on that branch (or this pull request) useful anymore.

I've found that for builds that can, potentially, exhaust RAM, it's much more effective to

Put such builds into containers with memory cgroup limiting amount of available RAM.
Tweak parallelism based on ratio of on-CPU/off-CPU time that processes inside this container get.
When container hits the limit of available-to-container RAM, it starts swapping, thus causing a lot of off-CPU time for the processes. This causes ninja to reduce parallelism due to (2).

The above approach is implemented here https://github.com/maxim-kuvyrkov/ninja/tree/limit-on-cpu , but it depends on CONFIG_HZ kernel setting, which I didn't find a public API for. Therefore I don't see how to make this approach generic enough to be included in upstream ninja.

tmilev · 2021-09-22T14:44:29Z

I'd like to note that our team has a use case where

Containerization is not an option.
We have a situation where we can run a very efficient RAM-wise compilation (some 200 parallel jobs) but we can't run efficient linkining. So, if it so happens that many link jobs happen to run in parallel at the same time, we run out of RAM. Furthermore this happens somewhat late in the build, as linking happens last.

maxim-kuvyrkov · 2021-09-22T15:10:04Z

I'd like to note that our team has a use case where

Containerization is not an option.

We have a situation where we can run a very efficient RAM-wise compilation (some 200 parallel jobs) but we can't run efficient linkining. So, if it so happens that many link jobs happen to run in parallel at the same time, we run out of RAM. Furthermore this happens somewhat late in the build, as linking happens last.

@tmilev , your usecase can, potentially, be addressed with https://ninja-build.org/manual.html#ref_pool .

puetzk · 2022-10-12T17:30:58Z

@maxim-kuvyrkov Since it looks like you are already relying on the cgroup2 filesystem (/sys/fs/cgroup), perhaps you could make use of the PSI monitor files {io,cpu,memory}.pressure: https://docs.kernel.org/accounting/psi.html rather than cpuacct.stat/CONFIG_HZ?

These read directly as your desired "ratio of time spent waiting for X", in addition to absolute time spent waiting, which should get you away from needing CONFIG_HZ. Pressure Stall Information is a cgroups2 thing (kernel 4.20 and up) but it's easily detectable by whether the *.pressure files exist.

Or for the extra-credits solution, you can even set up your own desired triggers that can be monitored via select/poll() (https://docs.kernel.org/accounting/psi.html#monitoring-for-pressure-thresholds), which you could even plumb into SubprocessSet::DoWork to integrate something similar to https://github.com/tobixen/thrash-protect (using SIGSTOP to temporarily pause jobs you've already launched, letting ninja recover gracefully if turns out ninja has already launched too many new jobs that all turned out to be large, and pushed the cgroup into thrashing). Then use SIGCONT to resume a paused job (instead of launching a new one) as non-paused jobs finish and the pressure gets better.

maxim-kuvyrkov · 2022-10-13T06:13:53Z

Hi @puetzk , thanks for the suggestion!

I have implemented another approach (see [1] and [2]), which allows for several containers to gracefully compete for limited RAM. Your suggestion applies equally well to that new approach.

The idea in [1] and [2] is that we increase parallelism until we run into CPU waiting. The CPU waiting can come from direct CPU share limit (our container / cgroup used up its fair share of CPU cycles) or from RAM allowance getting exhausted, and we have started to use swap. While swapping we are "waiting" for CPU just like when hitting CPU share limit.

This new approach allows ninja to dance on the edge of swapping when there is high demand for RAM from other containers, while increasing parallelism to the maximum when other containers aren't using RAM.

[1] maxim-kuvyrkov@70ef9be .
[2] https://github.com/maxim-kuvyrkov/ninja/commits/limit-on-cpu

maxim-kuvyrkov · 2022-10-13T06:14:23Z

Dropping this pull request in favour of https://github.com/maxim-kuvyrkov/ninja/commits/limit-on-cpu .

puetzk · 2022-10-28T20:12:03Z

Yeah, my suggested related more to your limit-on-cpu branch, but this was the thread where it was getting talked about.
What the PSI feature offers you is a direct way to read the "wait ratio" as measured within the kernel, and distinguish whether it is being caused by competition for CPU, memory(swap) time, or other io. And the possibility to get a waitable fd for thresholds.

puetzk · 2022-10-28T20:16:17Z

That seemed like the solution to your comment about

it depends on CONFIG_HZ kernel setting, which I didn't find a public API for. Therefore I don't see how to make this approach generic enough to be included in upstream ninja.

maxim-kuvyrkov · 2023-06-14T09:32:43Z

The reworked support using Linux PSI is posted here: #2300
@skardach , @tmilev , @puetzk

maxim-kuvyrkov · 2023-06-14T09:34:39Z

I'd like to note that our team has a use case where

Containerization is not an option.

We have a situation where we can run a very efficient RAM-wise compilation (some 200 parallel jobs) but we can't run efficient linkining. So, if it so happens that many link jobs happen to run in parallel at the same time, we run out of RAM. Furthermore this happens somewhat late in the build, as linking happens last.

The implementation in #2300 should work for non-containerized environments as well.

evmar and others added 27 commits July 15, 2012 13:40

mark release 120715

50af448

version 1.0.0

7096bf1

version 1.1.0

2c953d1

include version number in manual

24768fa

mark pools as experimental in the docs

c09bb1a

version 1.2.0

66d33f4

Merge branch 'master' into release

2e2044c

Last-second manual fix.

fix version number so that we actually call ourselves v1.2.0

60f3ea6

and fix version number in manual

d0d199e

Comedy of errors.

Merge branch 'master' into release

ddb2121

manual version 1.3

32d45f1

version 1.3.1

7a26848

version 1.3.2

4956108

version 1.3.3

22f60e9

v1.3.4

045d008

v1.4.0

63d5b10

v1.5.0

69bface

v1.5.1

85e13c1

v1.5.3

3309498

v1.6.0

484c163

v1.7.0

a60702e

v1.7.1

b49b0fc

v1.7.2

717b7b4

forgot to bump version in manual for 1.7.2 release

256bf89

v1.8.0

0c67152

v1.8.1

f69c785

v1.8.2

253e94c

maxim-kuvyrkov force-pushed the master branch from 5826c98 to 4d8acf6 Compare November 10, 2017 07:39

maxim-kuvyrkov force-pushed the master branch from fa90577 to fe93962 Compare November 25, 2017 14:08

jhasse added 2 commits January 30, 2019 19:57

Merge branch 'master' into release

6d5a4b9

v1.9.0

b25c08b

metti mentioned this pull request May 12, 2019

Prevent ninja starting new job when physical memory become low #1571

Open

This comment was marked as abuse.

Sign in to view

jhasse added 2 commits January 27, 2020 11:36

Merge branch 'master' into release

70cce8c

v1.10.0

ed7f670

jagerman mentioned this pull request Aug 7, 2020

cmake: Use job pool feature to limit concurrent jobs monero-project/monero#6747

Merged

jhasse added 5 commits August 18, 2020 21:25

Merge branch 'master' into release

5c2cde2

v1.10.1

a1f879b

Merge branch 'master' into release

9cf70a2

v1.10.2

675c2a0

Fix merge mistake

e72d1d5

obilaniu mentioned this pull request Mar 11, 2021

How to get meson to not add the -pipe option to my cpp compilation steps mesonbuild/meson#8508

Closed

maxim-kuvyrkov force-pushed the master branch 2 times, most recently from add119c to e72d1d5 Compare April 28, 2021 18:10

maxim-kuvyrkov closed this Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add "-m <N>" option to dynamically limit jobs #1354

Add "-m <N>" option to dynamically limit jobs #1354

maxim-kuvyrkov commented Nov 7, 2017

metti commented Nov 8, 2017

This comment was marked as abuse.

tmilev commented Aug 28, 2019 •

edited

Loading

Bidski commented Nov 11, 2019

This comment was marked as abuse.

skardach commented Sep 21, 2021

maxim-kuvyrkov commented Sep 21, 2021

tmilev commented Sep 22, 2021

maxim-kuvyrkov commented Sep 22, 2021

puetzk commented Oct 12, 2022 •

edited

Loading

maxim-kuvyrkov commented Oct 13, 2022

maxim-kuvyrkov commented Oct 13, 2022

puetzk commented Oct 28, 2022

puetzk commented Oct 28, 2022

maxim-kuvyrkov commented Jun 14, 2023

maxim-kuvyrkov commented Jun 14, 2023

Add "-m <N>" option to dynamically limit jobs #1354

Add "-m <N>" option to dynamically limit jobs #1354

Conversation

maxim-kuvyrkov commented Nov 7, 2017

metti commented Nov 8, 2017

This comment was marked as abuse.

tmilev commented Aug 28, 2019 • edited Loading

Bidski commented Nov 11, 2019

This comment was marked as abuse.

skardach commented Sep 21, 2021

maxim-kuvyrkov commented Sep 21, 2021

tmilev commented Sep 22, 2021

maxim-kuvyrkov commented Sep 22, 2021

puetzk commented Oct 12, 2022 • edited Loading

maxim-kuvyrkov commented Oct 13, 2022

maxim-kuvyrkov commented Oct 13, 2022

puetzk commented Oct 28, 2022

puetzk commented Oct 28, 2022

maxim-kuvyrkov commented Jun 14, 2023

maxim-kuvyrkov commented Jun 14, 2023

tmilev commented Aug 28, 2019 •

edited

Loading

puetzk commented Oct 12, 2022 •

edited

Loading