Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "-m <N>" option to dynamically limit jobs #1354

Closed
wants to merge 36 commits into from

Conversation

maxim-kuvyrkov
Copy link

... on memory threshold. So far only Unix-style OS are supported.
No functional change for other configurations.

@metti
Copy link

metti commented Nov 8, 2017

you can refer to a previous pull request (#660) that implemented this feature and its discussion back then.

@jonesmz

This comment was marked as abuse.

@tmilev
Copy link

tmilev commented Aug 28, 2019

@maxim-kuvyrkov
I would like to have RAM memory throttling: would help a very tricky build failure in a complex situation. Giving my +1 to this effort.

@Bidski
Copy link

Bidski commented Nov 11, 2019

This would be a very useful feature to have. What will it take to get this merged?

@jonesmz

This comment was marked as abuse.

@skardach
Copy link

Please correct me if I'm wrong but this pull request currently does not contain any functional code. Instead it's located on the limit-on-ram branch?

@maxim-kuvyrkov
Copy link
Author

Please correct me if I'm wrong but this pull request currently does not contain any functional code. Instead it's located on the limit-on-ram branch?

Hi @skardach ,
That's correct, but I don't consider code on that branch (or this pull request) useful anymore.

I've found that for builds that can, potentially, exhaust RAM, it's much more effective to

  1. Put such builds into containers with memory cgroup limiting amount of available RAM.
  2. Tweak parallelism based on ratio of on-CPU/off-CPU time that processes inside this container get.
  3. When container hits the limit of available-to-container RAM, it starts swapping, thus causing a lot of off-CPU time for the processes. This causes ninja to reduce parallelism due to (2).

The above approach is implemented here https://github.com/maxim-kuvyrkov/ninja/tree/limit-on-cpu , but it depends on CONFIG_HZ kernel setting, which I didn't find a public API for. Therefore I don't see how to make this approach generic enough to be included in upstream ninja.

@tmilev
Copy link

tmilev commented Sep 22, 2021

I'd like to note that our team has a use case where

  1. Containerization is not an option.
  2. We have a situation where we can run a very efficient RAM-wise compilation (some 200 parallel jobs) but we can't run efficient linkining. So, if it so happens that many link jobs happen to run in parallel at the same time, we run out of RAM. Furthermore this happens somewhat late in the build, as linking happens last.

@maxim-kuvyrkov
Copy link
Author

I'd like to note that our team has a use case where

  1. Containerization is not an option.
  2. We have a situation where we can run a very efficient RAM-wise compilation (some 200 parallel jobs) but we can't run efficient linkining. So, if it so happens that many link jobs happen to run in parallel at the same time, we run out of RAM. Furthermore this happens somewhat late in the build, as linking happens last.

@tmilev , your usecase can, potentially, be addressed with https://ninja-build.org/manual.html#ref_pool .

@puetzk
Copy link

puetzk commented Oct 12, 2022

@maxim-kuvyrkov Since it looks like you are already relying on the cgroup2 filesystem (/sys/fs/cgroup), perhaps you could make use of the PSI monitor files {io,cpu,memory}.pressure: https://docs.kernel.org/accounting/psi.html rather than cpuacct.stat/CONFIG_HZ?

These read directly as your desired "ratio of time spent waiting for X", in addition to absolute time spent waiting, which should get you away from needing CONFIG_HZ. Pressure Stall Information is a cgroups2 thing (kernel 4.20 and up) but it's easily detectable by whether the *.pressure files exist.

Or for the extra-credits solution, you can even set up your own desired triggers that can be monitored via select/poll() (https://docs.kernel.org/accounting/psi.html#monitoring-for-pressure-thresholds), which you could even plumb into SubprocessSet::DoWork to integrate something similar to https://github.com/tobixen/thrash-protect (using SIGSTOP to temporarily pause jobs you've already launched, letting ninja recover gracefully if turns out ninja has already launched too many new jobs that all turned out to be large, and pushed the cgroup into thrashing). Then use SIGCONT to resume a paused job (instead of launching a new one) as non-paused jobs finish and the pressure gets better.

@maxim-kuvyrkov
Copy link
Author

Hi @puetzk , thanks for the suggestion!

I have implemented another approach (see [1] and [2]), which allows for several containers to gracefully compete for limited RAM. Your suggestion applies equally well to that new approach.

The idea in [1] and [2] is that we increase parallelism until we run into CPU waiting. The CPU waiting can come from direct CPU share limit (our container / cgroup used up its fair share of CPU cycles) or from RAM allowance getting exhausted, and we have started to use swap. While swapping we are "waiting" for CPU just like when hitting CPU share limit.

This new approach allows ninja to dance on the edge of swapping when there is high demand for RAM from other containers, while increasing parallelism to the maximum when other containers aren't using RAM.

[1] maxim-kuvyrkov@70ef9be .
[2] https://github.com/maxim-kuvyrkov/ninja/commits/limit-on-cpu

@maxim-kuvyrkov
Copy link
Author

Dropping this pull request in favour of https://github.com/maxim-kuvyrkov/ninja/commits/limit-on-cpu .

@puetzk
Copy link

puetzk commented Oct 28, 2022

Yeah, my suggested related more to your limit-on-cpu branch, but this was the thread where it was getting talked about.
What the PSI feature offers you is a direct way to read the "wait ratio" as measured within the kernel, and distinguish whether it is being caused by competition for CPU, memory(swap) time, or other io. And the possibility to get a waitable fd for thresholds.

@puetzk
Copy link

puetzk commented Oct 28, 2022

That seemed like the solution to your comment about

it depends on CONFIG_HZ kernel setting, which I didn't find a public API for. Therefore I don't see how to make this approach generic enough to be included in upstream ninja.

@maxim-kuvyrkov
Copy link
Author

The reworked support using Linux PSI is posted here: #2300
@skardach , @tmilev , @puetzk

@maxim-kuvyrkov
Copy link
Author

I'd like to note that our team has a use case where

  1. Containerization is not an option.
  2. We have a situation where we can run a very efficient RAM-wise compilation (some 200 parallel jobs) but we can't run efficient linkining. So, if it so happens that many link jobs happen to run in parallel at the same time, we run out of RAM. Furthermore this happens somewhat late in the build, as linking happens last.

The implementation in #2300 should work for non-containerized environments as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants