Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"max-workers" as a parameter or a step to check the size of resource class #174

Open
livmade opened this issue Dec 19, 2022 · 2 comments
Open

Comments

@livmade
Copy link

livmade commented Dec 19, 2022

Describe Request:

Using the install command seems to allow the install-packages command to then allow the CircleCI docker executors tell all the node-related tools that there is a large amount of GB of RAM and dozens of CPUs, stalling out jobs. This causes issues in the restore-cache step.

Supporting Documentation Links:

There's a support article here to add into the config, but it would be helpful to have this built-in instead:
https://support.circleci.com/hc/en-us/articles/360038192673-NodeJS-Builds-or-Test-Suites-Fail-With-ENOMEM-or-a-Timeout

@marboledacci
Copy link
Contributor

The document you pointed to is about jest, not about the installation of packages. I haven't found any way to do the equivalent for install commands, if you know how to do it please let me know.

@Peter-Darton-i2
Copy link

This issue is, in effect, a manifestation of the CircleCI issue logged as CCI-I-578.
In short, when code runs on a CircleCI docker container, the container environment lies about the amount of RAM, CPUs etc that are available, causing all software that automatically adjusts its resource usage to the available capacity (e.g. lerna, nx, jest and many others, not just node-related tools either) to fail to live within the confines of the container's resource_class.

To work around this defect I ended up writing an orb command and associated script to make an environment variable available that set the right number of CPUs and then passed that in.

description: >
  Calculates the number of CPUs our execution environment has
  and records it in an environment variable.


  This is often necessary when building using auto-scaling tools like Lerna.
  e.g. Lerna automatically detects the size of the machine it's running on
  and decides how many processes to run in parallel.
  ...but this breaks when it's run within a docker environment where
  the /proc information lies and describes the docker daemon's host
  machine instead of the docker container's own local restrictions.
  ...and CircleCI's docker environment is (sadly) one that lies.

  This means that, for Lerna to work correctly in a CircleCI docker executor,
  it will have to be told how many CPUs to use, and this command lets you do that.


  To use this, you'll need to call this command as a pre-step step for all your
  node build jobs in your workflow
  (so that the variable is set when you build things)
  AND
  edit your top-level package.json file's scripts section such that,
  where you'd normally just call "lerna",
  you call "lerna --concurrency ${CIRCLE_EXECUTOR_CPUS}" instead
  (so that lerna won't try to use its automatic decision making).


  For extra flexibility (so developers can run the same targets) consider using
  the package https://www.npmjs.com/package/@naholyr/cross-env to set a default
  value of 0 (which tells lerna to decide automatically) in case the target gets
  called outside of a CircleCI build (e.g. a developer testing the CI build).

  e.g.
  "scripts": {
    "ci:build": "cross-env-shell LERNA_CPUS=\\${CIRCLE_EXECUTOR_CPUS:0} lerna --concurrency \\${LERNA_CPUS} run ci:build --",
    "preci:test": "mkdir -p junit-reports",
    "ci:test": "cross-env-shell LERNA_CPUS=\\${CIRCLE_EXECUTOR_CPUS:0} lerna --concurrency \\${LERNA_CPUS} run ci:test --"
  },

parameters:
  env_var_name:
    type: string
    description: |
      Specifies the name of the environment variable used in the top-level package.json file.
      e.g. if set to FOO then the package.json file's scripts section should call "lerna --concurrency ${FOO}" in place of just "lerna".
    default: CIRCLE_EXECUTOR_CPUS

steps:
  - run:
      name: Determine available CPUs
      command: <<include(scripts/count_executor_cpus.sh)>>
      environment:
        ENV_VAR_NAME: << parameters.env_var_name >>

...where scripts/count_executor_cpus.sh is:

#!/bin/bash
#
# Script that runs calculates how many CPUs we have available,
# and exports that number into an environment variable for later CircleCI commands to use.
#
# For "normal" machines, we use nproc.
# For CircleCI docker containers, we have to get clever and interrogate cgroup to find out how many CPUs the container is permitted to use
# because if we allow auto-sizing tools like Lerna to work it out automatically they'll get the wrong answers (as the CircleCI containers
# lie about their size, showing the host hardware's size instead of the containers' share of that).
#
# Given:
#  ENV_VAR_NAME = the env var to set to the CPU count
#  BASH_ENV = file to write the 'export ...=...' instruction to
#

set -eu
set -o pipefail

function weAreOnDocker() {
  [ -e /proc/1/cgroup ] && grep -q docker </proc/1/cgroup >/dev/null 2>&1 && [ -f /sys/fs/cgroup/cpu/cpu.cfs_quota_us ]
}

function calcCpusForDocker() {
  # This logic came from CircleCI Feature request CCI-I-578, aka
  # https://ideas.circleci.com/cloud-feature-requests/p/have-nproc-accurately-reporter-number-of-cpus-available-to-container
  # which had a suggestion from Sebastian Lerna at 2022-12-15T17:11 which provided Node code to divide cpu quota by period.
  #
  # Note: this doesn't always give you the answer you expect (for the resource_class you're using) because sometimes CircleCI docker
  # containers are given more (e.g. 2x) CPU than you'd expect from the official execution environment size chart. This is "expected
  # behaviour" because CircleCI are (randomly) generous when there's extra CPU going spare.
  local quota
  local period
  quota=$(cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us)
  period=$(cat /sys/fs/cgroup/cpu/cpu.cfs_period_us)
  echo $((quota / period))
}

function variableIsAlreadySet() {
  local variableName="$1"
  env | grep -q "^${variableName}="
}

# Insist that these are set
[ -n "${ENV_VAR_NAME}" ]
[ -n "${BASH_ENV}" ]
# Ensure we don't have any clashes
[ "${ENV_VAR_NAME}" != 'BASH_ENV' ]
[ "${ENV_VAR_NAME}" != 'local_cpu_count' ]

# Don't recalculate a value if one is already set
if variableIsAlreadySet "${ENV_VAR_NAME}"; then
  echo "INFO: Variable ${ENV_VAR_NAME} is already set"
  exit 0
fi

if weAreOnDocker; then
  local_cpu_count="$(calcCpusForDocker)"
  echo "INFO: we are on docker with ${local_cpu_count} CPUs"
else
  local_cpu_count="$(nproc)"
  echo "INFO: we have ${local_cpu_count} CPUs"
fi

# Check that it's safe to make this variable assignment
export "${ENV_VAR_NAME}=${local_cpu_count}"
# Record the answer so that all subsequent CircleCI steps have this env var.
echo "export '${ENV_VAR_NAME}=${local_cpu_count}'" >> "${BASH_ENV}"
# and log what we're doing too
echo "export '${ENV_VAR_NAME}=${local_cpu_count}'"

It would be really nice if the CircleCI node orb provided environment variables that CI processes could rely on telling self-scaling tools the real number of CPUs, RAM etc.
If you decide to do this, feel free to use the code above as a starting point 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants