Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test load balancing using jest's --shard option #65

Merged
merged 2 commits into from
Jun 29, 2022

Conversation

emmacasolin
Copy link
Contributor

@emmacasolin emmacasolin commented Jun 27, 2022

Description

Jest 28.0.0 introduced the --shard option, which we can use to implement test-load balancing https://jestjs.io/docs/cli#--shard. This option requires a custom test sequencer, which defines how to shard the tests. Ideally, we would like this to be based on test times, but test size can also be a good approximation of this.

Once the test sequencer is working, we can create multiple test runners in the CI/CD (one per shard per platform) in order to more efficiently run our tests across all platforms (especially for larger projects such as Polykey).

Issues Fixed

Tasks

  • 1. Create a custom test sequencer for sharding tests
  • [ ] 2. Determine the most efficient number of runners for test sharding (we want the fastest test runtime with the fewest runners) TBD later - this is more relevant to pk so can be done there. Will involve creating a bin packing algorithm to calculate the most efficient number of shards based on caches of previous test times.
  • 3. Increase the number of CI/CD runners (done using parallel keyword)
  • 4. Add --shard option to CI/CD runners to balance the test load
  • [ ] 5. Fix junit reporting to include logger output - pushed back to Include stdout/stderr in Junit test reports on Gitlab #66 due to upstream issues.

Final checklist

  • Domain specific tests
  • Full tests
  • Updated inline-comment documentation
  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

@emmacasolin emmacasolin self-assigned this Jun 27, 2022
@emmacasolin emmacasolin added the development Standard development label Jun 27, 2022
@ghost
Copy link

ghost commented Jun 27, 2022

👇 Click on the image for a new way to code review
  • Make big changes easier — review code in small groups of related files

  • Know where to start — see the whole change at a glance

  • Take a code tour — explore the change with an interactive tour

  • Make comments and review — all fully sync’ed with github

    Try it now!

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map Legend

@emmacasolin
Copy link
Contributor Author

The example test sequencer here https://jestjs.io/docs/configuration#testsequencer-string shards the tests based on the path, which I believe is simply the name of the test, which isn't what we want to shard the tests by.

Taking a look at the other fields of the Test object https://github.com/facebook/jest/blob/6b8b1404a1d9254e7d5d90a8934087a9c9899dab/packages/jest-runner/src/types.ts#L17-L21 I wonder if duration might be how long the test last took to run? So in this case, sorting on duration would be the best option, and we can fall back on path or file size (if possible) if the duration doesn't exist.

@emmacasolin
Copy link
Contributor Author

The parallel keyword seems to work really well for running the shards! Example usage:

parallel: 2
script:
  - >
      nix-shell --run '
      npm run build --verbose;
      npm test -- --ci --coverage --shard=$CI_NODE_INDEX/$CI_NODE_TOTAL;
      '

$CI_NODE_INDEX is the job number (out of the parallel jobs) and $CI_NODE_TOTAL is the total number of parallel jobs the job was split into, so these work really well as the shardIndex and shardCount for sharding. Since TS-Demo-Lib only has very few tests I don't think sharding is efficient here, but you can get a sense for how it would look when used for larger code bases.

This is how the jobs look in the pipeline:
image

And then here is the test part of the output of check:test 1/2:

> @matrixai/[email protected] test
> jest "--ci" "--coverage" "--shard=1/2"
GLOBAL SETUP
PASS tests/bin/typescript-demo-lib.test.ts (23.598 s)
  main
    ✓ main takes synthetic parameters (2576 ms)
    ✓ no input (2373 ms)
    ✓ adds 0 + 0 (2388 ms)
    ✓ adds 0 + 1 (2370 ms)
    ✓ adds 1 + 0 (2390 ms)
    ✓ adds 7657 + 238947 (2368 ms)
    ✓ level should work (2368 ms)
    ✓ worker threads should work (2374 ms)
    ✓ utp-native should work (2369 ms)
PASS tests/index.test.ts
  index
    ✓ some arbitrary test (3 ms)
-------------------------|---------|----------|---------|---------|-------------------
File                     | % Stmts | % Branch | % Funcs | % Lines | Uncovered Line #s 
-------------------------|---------|----------|---------|---------|-------------------
All files                |   98.95 |    54.54 |     100 |   98.88 |                   
 src                     |     100 |      100 |     100 |     100 |                   
  index.ts               |     100 |      100 |     100 |     100 |                   
  utils.ts               |     100 |      100 |     100 |     100 |                   
 src/bin                 |   96.66 |       75 |     100 |   96.66 |                   
  typescript-demo-lib.ts |   96.66 |       75 |     100 |   96.66 | 50                
 src/lib                 |     100 |        0 |     100 |     100 |                   
  Library.ts             |     100 |        0 |     100 |     100 | 4                 
  NumPair.ts             |     100 |        0 |     100 |     100 | 5                 
  test-fd-lock.ts        |     100 |      100 |     100 |     100 |                   
  test-level.ts          |     100 |      100 |     100 |     100 |                   
  test-utp-native.ts     |     100 |      100 |     100 |     100 |                   
 src/lib/workers         |     100 |      100 |     100 |     100 |                   
  test-workers.ts        |     100 |      100 |     100 |     100 |                   
-------------------------|---------|----------|---------|---------|-------------------
Test Suites: 2 passed, 2 total
Tests:       10 passed, 10 total
Snapshots:   0 total
Time:        24.08 s
Ran all test suites.
GLOBAL TEARDOWN
Saving cache for successful job

And for check:test 2/2:

> @matrixai/[email protected] test
> jest "--ci" "--coverage" "--shard=2/2"
GLOBAL SETUP
PASS tests/lib/Library.test.ts
  Library class
    ✓ some arbitrary test (2 ms)
------------|---------|----------|---------|---------|-------------------
File        | % Stmts | % Branch | % Funcs | % Lines | Uncovered Line #s 
------------|---------|----------|---------|---------|-------------------
All files   |     100 |        0 |     100 |     100 |                   
 Library.ts |     100 |        0 |     100 |     100 | 4                 
------------|---------|----------|---------|---------|-------------------
Test Suites: 1 passed, 1 total
Tests:       1 passed, 1 total
Snapshots:   0 total
Time:        0.832 s
Ran all test suites.
GLOBAL TEARDOWN
Saving cache for successful job

It's only outputting the coverage for the tests that get run in the shard (which is better than the test-pipelines script, which doesn't do this), although it may be difficult to combine the coverage reports (for example, Library.ts is included in both coverage reports, and if they were both less than 100% coverage and covered some of the same and different lines then how would they be combined?).

@emmacasolin
Copy link
Contributor Author

I changed the .gitlab-ci.yml temporarily to observe the sharding in the build jobs and it looks to work well there as well. I don't think the sharding increases the total runtime, since the starting up of all of the runners happens in parallel. https://gitlab.com/MatrixAI/open-source/typescript-demo-lib/-/pipelines/573446267

@emmacasolin
Copy link
Contributor Author

Some related issues to work through here:

  1. If we don't override jest's default sharding behaviour does it already do what we need it to do? Need to look at the source code and see if it checks the cache for previous runtimes if they exist. Just using filesize is not preferable.
  2. How do we determine the most efficient number of shards to use? There are algorithms for this (e.g. https://en.wikipedia.org/wiki/Knapsack_problem)
  3. Does Junit capture test output correctly? There are upstream issues for this stating that jest (and therefore Junit) only captures from console.log/console.error, but our test loggers use a StreamHandler to output to process.error. If the test output is not being captured correctly then we may need to use the ConsoleHandler instead for tests on the CI/CD, or alternatively redirect and collect the output somewhere else where Junit can then see and display it.

@CMCDragonkai
Copy link
Member

Also that we want to continue using the child pipelines so that we can at least have a way to dynamically compute the shard count when creating the child pipelines that uses the parallel keyword.

@emmacasolin
Copy link
Contributor Author

  1. If we don't override jest's default sharding behaviour does it already do what we need it to do? Need to look at the source code and see if it checks the cache for previous runtimes if they exist. Just using filesize is not preferable.

This is the code for jest's default TestSequencer implementation:

/**
 * The TestSequencer will ultimately decide which tests should run first.
 * It is responsible for storing and reading from a local cache
 * map that stores context information for a given test, such as how long it
 * took to run during the last run and if it has failed or not.
 * Such information is used on:
 * TestSequencer.sort(tests: Array<Test>)
 * to sort the order of the provided tests.
 *
 * After the results are collected,
 * TestSequencer.cacheResults(tests: Array<Test>, results: AggregatedResult)
 * is called to store/update this information on the cache map.
 */
class TestSequencer {
  _cache = new Map();

  _getCachePath(testContext) {
    const {config} = testContext;

    const HasteMapClass = _jestHasteMap().default.getStatic(config);

    return HasteMapClass.getCacheFilePath(
      config.cacheDirectory,
      `perf-cache-${config.id}`
    );
  }

  _getCache(test) {
    const {context} = test;

    if (!this._cache.has(context) && context.config.cache) {
      const cachePath = this._getCachePath(context);

      if (fs().existsSync(cachePath)) {
        try {
          this._cache.set(
            context,
            JSON.parse(fs().readFileSync(cachePath, 'utf8'))
          );
        } catch {}
      }
    }

    let cache = this._cache.get(context);

    if (!cache) {
      cache = {};

      this._cache.set(context, cache);
    }

    return cache;
  }
  /**
   * Select tests for shard requested via --shard=shardIndex/shardCount
   * Sharding is applied before sorting
   *
   * @param tests All tests
   * @param options shardIndex and shardIndex to select
   *
   * @example
   * ```typescript
   * class CustomSequencer extends Sequencer {
   *  shard(tests, { shardIndex, shardCount }) {
   *    const shardSize = Math.ceil(tests.length / options.shardCount);
   *    const shardStart = shardSize * (options.shardIndex - 1);
   *    const shardEnd = shardSize * options.shardIndex;
   *    return [...tests]
   *     .sort((a, b) => (a.path > b.path ? 1 : -1))
   *     .slice(shardStart, shardEnd);
   *  }
   * }
   * ```
   */

  shard(tests, options) {
    const shardSize = Math.ceil(tests.length / options.shardCount);
    const shardStart = shardSize * (options.shardIndex - 1);
    const shardEnd = shardSize * options.shardIndex;
    return tests
      .map(test => {
        const relativeTestPath = path().posix.relative(
          (0, _slash().default)(test.context.config.rootDir),
          (0, _slash().default)(test.path)
        );
        return {
          hash: crypto()
            .createHash('sha1')
            .update(relativeTestPath)
            .digest('hex'),
          test
        };
      })
      .sort((a, b) => (a.hash < b.hash ? -1 : a.hash > b.hash ? 1 : 0))
      .slice(shardStart, shardEnd)
      .map(result => result.test);
  }
  /**
   * Sort test to determine order of execution
   * Sorting is applied after sharding
   * @param tests
   *
   * ```typescript
   * class CustomSequencer extends Sequencer {
   *   sort(tests) {
   *     const copyTests = Array.from(tests);
   *     return [...tests].sort((a, b) => (a.path > b.path ? 1 : -1));
   *   }
   * }
   * ```
   */

  sort(tests) {
    /**
     * Sorting tests is very important because it has a great impact on the
     * user-perceived responsiveness and speed of the test run.
     *
     * If such information is on cache, tests are sorted based on:
     * -> Has it failed during the last run ?
     * Since it's important to provide the most expected feedback as quickly
     * as possible.
     * -> How long it took to run ?
     * Because running long tests first is an effort to minimize worker idle
     * time at the end of a long test run.
     * And if that information is not available they are sorted based on file size
     * since big test files usually take longer to complete.
     *
     * Note that a possible improvement would be to analyse other information
     * from the file other than its size.
     *
     */
    const stats = {};

    const fileSize = ({path, context: {hasteFS}}) =>
      stats[path] || (stats[path] = hasteFS.getSize(path) || 0);

    const hasFailed = (cache, test) =>
      cache[test.path] && cache[test.path][0] === FAIL;

    const time = (cache, test) => cache[test.path] && cache[test.path][1];

    tests.forEach(test => (test.duration = time(this._getCache(test), test)));
    return tests.sort((testA, testB) => {
      const cacheA = this._getCache(testA);

      const cacheB = this._getCache(testB);

      const failedA = hasFailed(cacheA, testA);
      const failedB = hasFailed(cacheB, testB);
      const hasTimeA = testA.duration != null;

      if (failedA !== failedB) {
        return failedA ? -1 : 1;
      } else if (hasTimeA != (testB.duration != null)) {
        // If only one of two tests has timing information, run it last
        return hasTimeA ? 1 : -1;
      } else if (testA.duration != null && testB.duration != null) {
        return testA.duration < testB.duration ? 1 : -1;
      } else {
        return fileSize(testA) < fileSize(testB) ? 1 : -1;
      }
    });
  }

  allFailedTests(tests) {
    const hasFailed = (cache, test) => {
      var _cache$test$path;

      return (
        ((_cache$test$path = cache[test.path]) === null ||
        _cache$test$path === void 0
          ? void 0
          : _cache$test$path[0]) === FAIL
      );
    };

    return this.sort(
      tests.filter(test => hasFailed(this._getCache(test), test))
    );
  }

  cacheResults(tests, results) {
    const map = Object.create(null);
    tests.forEach(test => (map[test.path] = test));
    results.testResults.forEach(testResult => {
      if (testResult && map[testResult.testFilePath] && !testResult.skipped) {
        const cache = this._getCache(map[testResult.testFilePath]);

        const perf = testResult.perfStats;
        cache[testResult.testFilePath] = [
          testResult.numFailingTests ? FAIL : SUCCESS,
          perf.runtime || 0
        ];
      }
    });

    this._cache.forEach((cache, context) =>
      fs().writeFileSync(this._getCachePath(context), JSON.stringify(cache))
    );
  }
}

So it does cache results - both whether the test passed or failed and the time it took the test to run. However, this information is only used to sort the tests, not to shard them, and sharding is done before sorting (so the sorting is within individual shards, or runners in our case). This makes sense though. The sharding has to be completely deterministic because it gets calculated for each shard. Using the test run time, or even whether the test fails or not, can change every time you run a test - it's not reliable, and incorporating it into the sharding decision would run the risk of tests being included in multiple shards or excluded entirely. The shards even have the potential to affect each other in this case as well if they're not run entirely in parallel.

I think the only improvement we could make here is to incorporate filesize into the sharding decision if we wanted that (I've run a few tests and it doesn't look like the default sharding is affected by this).

I don't think there's a simple way to incorporate knapsacking into the jest sharding system - I think sharding just isn't designed to work that way. Rather than increasing efficiency by knapsacking the tests it increases efficiency by sorting the tests after they've already been sharded.

@emmacasolin
Copy link
Contributor Author

Also that we want to continue using the child pipelines so that we can at least have a way to dynamically compute the shard count when creating the child pipelines that uses the parallel keyword.

I'm not sure if the parallel keyword can be used for child pipelines, so some experimentation needs to be done here.

@emmacasolin
Copy link
Contributor Author

emmacasolin commented Jun 27, 2022

Using the child pipelines works for splitting the tests and running them, however, we don't see the coverage or Junit test reports https://gitlab.com/MatrixAI/open-source/typescript-demo-lib/-/pipelines/573529050/builds. I hadn't noticed this before inside Polykey but the same is true there.

I think I remember reading somewhere that child pipelines can't store artifacts, so this is probably why.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jun 27, 2022

It is working, if you look at the child pipeline panel:

childpipeline

child
b.png)

The child pipeline has a different panel compared to the parent pipeline. That's the main issue. So coverage and junit report doesn't appear for the parent pipeline.

There's an upstream issue about showing child pipeline results in the parent pipeline page, but it hasn't been resolved yet: https://gitlab.com/gitlab-org/gitlab/-/issues/212894.

So this is fine, and it is definitely working.

So the only issue here is that we do need to stitch together the coverage reports from sharding and put them together to be useful.

Note that https://docs.gitlab.com/ee/ci/testing/test_coverage_visualization.html#coverage-report-from-child-pipeline says that you must use strategy: depend, if coverage reports are generated by the child pipeline job. But it does say:

On GitLab.com, this feature is not available. The feature is not ready for production use.

There's an upstream issue here https://gitlab.com/gitlab-org/gitlab/-/issues/280818 to track, maybe it might be fixed soon. Have a look.

If we look at https://medium.com/@mfreundlich1/speed-up-your-jest-tests-with-shards-776e9f02f637, it says you can merge coverage reports, but it's done outside of jest using a different tool.

The merged coverage report can be uploaded as an artifact and we can use that information. At the end of the day, I expect to use the MatrixAIBot to do the heavy lifting and report that coverage to code climate or to our PR comment.

@CMCDragonkai
Copy link
Member

So for now, we can stick with manually setting the shard count.

We can figure out a bin-packing algorithm that makes use of the duration time (which you will need to acquire from each prior worker) and auto calculate both the shard count and shard allocation based on optimising against worker count and time to completion.

Can you comment on what you found out regarding how jest does sharding and what the default sharding and sorting code ends up doing.

Also this means we don't need to our own custom test sequencer right?

Can you also compare our solution with what gitlab did for their own tests?

Also what algorithm does this project use https://github.com/kamilkisiela/split-tests?

@CMCDragonkai
Copy link
Member

Note that the coverage percentage is extracted through a regular expression on the stdout/stderr log of the coverage report. So if it says 100% that's cause the regex got 100%. So uncovered lines is not relevant here.

@emmacasolin
Copy link
Contributor Author

Can you comment on what you found out regarding how jest does sharding and what the default sharding and sorting code ends up doing.

Jest's default sharding algorithm doesn't optimise the shards in any way - it just randomly (and uniformly) distributes the test files into shards deterministically. It doesn't incorporate things like test time since these can change on every test run, so if shards are not run completely separately from each other they can conflict (this doesn't affect us though). Note that it only separates test files, so individual tests in the same file will never be in different shards (which makes sense, because otherwise any setup/teardown in beforeAll/afterAlls would need to be repeated).

Once the tests are sharded, jest then sorts the test files within each shard. This is designed to optimise test run times on machines with multiple cores, so again, their default algorithm isn't the most suitable for our purposes. By default, jest sorts the test files by:

  1. Whether the file had one or more failing tests on the previous run (passing files run first)
  2. Whether a file has timing information from a previous run (files with no timing information run first)
  3. How long the file took on the previous run (files that took longer to run go first)
  4. File size (larger files run first)

This is in an effort to start long-running tests first so that they can run while shorter-running tests run in the remaining cores. Since the CI/CD only has one core we may want to override the default sorting and have shorter tests run first in order to get a better gauge of the test progress.

Also this means we don't need to our own custom test sequencer right?

Correct - if we use jest's default sharding and sorting algorithms then we don't need our own custom test sequencer. But if we want to change the default behaviour then we do need a custom test sequencer to override the default implementation.

Can you also compare our solution with what gitlab did for their own tests?

It looks like in that MR they chose to use jest's default sharding behaviour but they overrode the default sorting to sort alphabetically it seems. It looks like they did this to make it easier to combine and cache metadata from the parallel runners (e.g. coverage info) https://gitlab.com/gitlab-org/gitlab/-/merge_requests/25104#note_290010970

This is their script for merging the coverage reports: https://gitlab.com/gitlab-org/gitlab/-/blob/53ac64da21eee6f32cc4c9154155f1c06994cd54/scripts/frontend/merge_coverage_frontend.js

Also what algorithm does this project use https://github.com/kamilkisiela/split-tests?

It looks like it actually creates a "queue" of sorts - it sorts the test files by previous run time and then allocates each file to whichever shard currently has the shortest total runtime. E.g.

Sorted tests by previous run time = [ 1, 1, 2, 4, 6, 10, 11 ]
Shard 1: (total=0s)
Shard 2: (total=0s)
Shard 3: (total=0s)

Sorted tests by previous run time = [ 1, 2, 4, 6, 10, 11 ]
Shard 1: 1 (total=1s)
Shard 2: (total=0s)
Shard 3: (total=0s)

Sorted tests by previous run time = [ 2, 4, 6, 10, 11 ]
Shard 1: 1 (total=1s)
Shard 2: 1 (total=1s)
Shard 3: (total=0s)

Sorted tests by previous run time = [ 4, 6, 10, 11 ]
Shard 1: 1 (total=1s)
Shard 2: 1 (total=1s)
Shard 3: 2 (total=2s)

Sorted tests by previous run time = [ 6, 10, 11 ]
Shard 1: 1, 4 (total=5s)
Shard 2: 1 (total=1s)
Shard 3: 2 (total=2s)

Sorted tests by previous run time = [ 10, 11 ]
Shard 1: 1, 4 (total=5s)
Shard 2: 1, 6 (total=7s)
Shard 3: 2 (total=2s)

Sorted tests by previous run time = [ 11 ]
Shard 1: 1, 4 (total=5s)
Shard 2: 1, 6 (total=7s)
Shard 3: 2, 10 (total=12s)

Sorted tests by previous run time = [ ]
Shard 1: 1, 4, 11 (total=26s)
Shard 2: 1, 6 (total=7s)
Shard 3: 2, 10 (total=12s)

So it's not super efficient but it would likely be better than random assignment. Note that it doesn't have any algorithm for choosing the shard count - that still needs to be chosen manually.

@CMCDragonkai
Copy link
Member

I realised another issue we may have. That's the fact that some of our tests block other tests through global file locking. That is the usage of the global PK agent.

I remember that after changes to our crypto mocking that we actually don't entirely need it for performance.

I'm wondering can you check which tests get blocked on the global agent and whether that could cause problems and whether we can potentially remove it in favour of individual agents that just have mocked keys.

@CMCDragonkai
Copy link
Member

Also when you're testing can you see what happens when we switch to console handler and test a failing test and see if we are able to see it in our junit report on gitlab interface.

@emmacasolin
Copy link
Contributor Author

I realised another issue we may have. That's the fact that some of our tests block other tests through global file locking. That is the usage of the global PK agent.

I remember that after changes to our crypto mocking that we actually don't entirely need it for performance.

I'm wondering can you check which tests get blocked on the global agent and whether that could cause problems and whether we can potentially remove it in favour of individual agents that just have mocked keys.

Would this be a problem though? I think each child pipeline would have its own global agent. But pk already splits out the bin tests in the CI/CD into separate child pipelines, of which many use the global agent, and it hasn't been an issue there yet.

@emmacasolin
Copy link
Contributor Author

Also when you're testing can you see what happens when we switch to console handler and test a failing test and see if we are able to see it in our junit report on gitlab interface.

TS demo lib doesn't actually import js-logger, but I can import it just to test quickly.

@emmacasolin
Copy link
Contributor Author

Also when you're testing can you see what happens when we switch to console handler and test a failing test and see if we are able to see it in our junit report on gitlab interface.

TS demo lib doesn't actually import js-logger, but I can import it just to test quickly.

It doesn't seem to show up in the junit reports. What's more, if a test doesn't fail then the log won't even show up in the terminal output.

I modified these two tests

// In the beforeEach():
logger = new Logger('main', LogLevel.INFO, [new ConsoleHandler()]);

test('no input', async () => {
  logger.info('This test should fail');
  const mockLog = mockProcessStdout();
  await main([]);
  const tmpMockLog = mockLog.mock.calls.join('');
  expect(tmpMockLog).toContain('[]\n');
  expect(tmpMockLog).toContain('new library\n');
  expect(tmpMockLog).toMatch(uuidRegex);
  expect(tmpMockLog).toContain('0 + 0 = 1\n');
  mockLog.mockRestore();
});
test('adds 0 + 0', async () => {
  logger.info('This test should pass');
  const mockLog = mockProcessStdout();
  await main(['', '', '0', '0', dataDir]);
  const tmpMockLog = mockLog.mock.calls.join('');
  expect(tmpMockLog).toContain('[0,0]\n');
  expect(tmpMockLog).toContain('new library\n');
  expect(tmpMockLog).toMatch(uuidRegex);
  expect(tmpMockLog).toContain('0 + 0 = 0\n');
  mockLog.mockRestore();
});

And this is the terminal output (both locally and in the CI/CD)

  console.error
    INFO:main:This test should fail
      31 |   });
      32 |   test('no input', async () => {
    > 33 |     logger.info('This test should fail');
         |            ^
      34 |     const mockLog = mockProcessStdout();
      35 |     await main([]);
      36 |     const tmpMockLog = mockLog.mock.calls.join('');
      at ConsoleHandler.emit (node_modules/@matrixai/logger/src/handlers/ConsoleHandler.ts:6:13)
      at ConsoleHandler.handle (node_modules/@matrixai/logger/src/Handler.ts:18:10)
      at Logger.callHandlers (node_modules/@matrixai/logger/src/Logger.ts:112:15)
      at Logger.log (node_modules/@matrixai/logger/src/Logger.ts:96:12)
      at Logger.info (node_modules/@matrixai/logger/src/Logger.ts:82:10)
      at Object.<anonymous> (tests/bin/typescript-demo-lib.test.ts:33:12)
FAIL tests/bin/typescript-demo-lib.test.ts (23.463 s)
  main
    ✓ main takes synthetic parameters (2580 ms)
    ✕ no input (2426 ms)
    ✓ adds 0 + 0 (2595 ms)
    ✓ adds 0 + 1 (2375 ms)
    ✓ adds 1 + 0 (2369 ms)
    ✓ adds 7657 + 238947 (2368 ms)
    ✓ level should work (2367 ms)
    ✓ worker threads should work (2392 ms)
    ✓ utp-native should work (2373 ms)
  ● main › no input
    expect(received).toContain(expected) // indexOf
    Expected substring: "0 + 0 = 1
    "
    Received string:    "[]
    new library
    64ffee28-b052-4a00-9195-b07649984d7f
    0 + 0 = 0
    lets test some levelDB
    hello Level!·
    Lets test workers.
    Hello Worker!·
    Lets test utp-native.
    echo: hello UTP!!·
    ended
    2.0.2
    true
    Acquired lock!
    "
      38 |     expect(tmpMockLog).toContain('new library\n');
      39 |     expect(tmpMockLog).toMatch(uuidRegex);
    > 40 |     expect(tmpMockLog).toContain('0 + 0 = 1\n');
         |                        ^
      41 |     mockLog.mockRestore();
      42 |   });
      43 |   test('adds 0 + 0', async () => {
      at Object.<anonymous> (tests/bin/typescript-demo-lib.test.ts:40:24)

All that gets shown in the junit report is this

Error: expect(received).toContain(expected) // indexOf

Expected substring: "0 + 0 = 1
"
Received string:    "[]
new library
11f0d783-a04c-4db2-8cf7-3e8c2734008c
0 + 0 = 0
lets test some levelDB
hello Level!·
Lets test workers.
Hello Worker!·
Lets test utp-native.
echo: hello UTP!!·
ended
2.0.2
true
Acquired lock!
"
    at Object.<anonymous> (/builds/MatrixAI/open-source/typescript-demo-lib/tests/bin/typescript-demo-lib.test.ts:40:24)

@emmacasolin
Copy link
Contributor Author

There's actually an option we can set for junit to include console output, but it only works for stdout. The option is includeConsoleOutput https://github.com/jest-community/jest-junit

@CMCDragonkai
Copy link
Member

I've added an upstream issue: jest-community/jest-junit#218 (you should do this next time you find something that would be nice to have an upstream project).

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jun 28, 2022

That feature includeConsoleOutput appears to be implemented in a simple way, see: https://github.com/jest-community/jest-junit/pull/90/files#diff-98b33b9b4e6f46484fa6c752c587c61df6f8e15120262d850d62dfa5a55f2144.

It seems to use suite.console only, and no specification on STDOUT or STDERR. Perhaps it does work on console.error?

    // Write stdout console output if available
    if (options.includeConsoleOutput === 'true' && suite.console && suite.console.length) {
      // Stringify the entire console object
      // Easier this way because formatting in a readable way is tough with XML
      // And this can be parsed more easily
      let testSuiteConsole = {
        'system-out': {
          _cdata: JSON.stringify(suite.console, null, 2)
        }
      };

      testSuite.testsuite.push(testSuiteConsole);
    }

Can you check this? And if it doesn't, then maybe we can make a PR addition to jest-junit to also capture console.error?

@CMCDragonkai
Copy link
Member

And see this comment jestjs/jest#6718 (comment) for why jest doesn't capture process.stdout. It's just not implemented yet, compared to intercepting console.stdout/stderr.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jun 28, 2022

If it comes to it, we can always add another console handler, so one can choose to use stdout or stderr. ConsoleOutHandler vs ConsoleErrHandler.

@CMCDragonkai
Copy link
Member

I realised another issue we may have. That's the fact that some of our tests block other tests through global file locking. That is the usage of the global PK agent.
I remember that after changes to our crypto mocking that we actually don't entirely need it for performance.
I'm wondering can you check which tests get blocked on the global agent and whether that could cause problems and whether we can potentially remove it in favour of individual agents that just have mocked keys.

Would this be a problem though? I think each child pipeline would have its own global agent. But pk already splits out the bin tests in the CI/CD into separate child pipelines, of which many use the global agent, and it hasn't been an issue there yet.

It would a global agent per child pipeline test job, not per child pipeline, and that global agent would be used across all test modules that are running concurrently.

Yea I suppose it should still continue to work, but I remember in our test timeouts, I realised that the beforeAll computation time is added to the total test execution time, although it has a separate timeout. This can cause issues in the future, if there are loads of tests blocked on each other to have mutually exclusive time interacting with the global agent. So maybe this can be addressed later in the future, where we can have faster agent startups (by fixing our crypto and mocking), then we can eventually reduce tests that make use of the global agent, or potentially eliminate it entirely. Or isolate the global agent within a single test module, so that each test function makes use of it, this is fine because each test module is executed serially anyway, so locking won't even be needed.

I think as soon as we have replaced our crypto with webcrypto (and also use our crypto mocking), then we should be able to eliminate cross-module global agent (by removing our file locking on the global agent), and instead do per-module global agent in the beforeAll where applicable.

@CMCDragonkai
Copy link
Member

Jest's default sharding algorithm doesn't optimise the shards in any way - it just randomly (and uniformly) distributes the test files into shards deterministically. It doesn't incorporate things like test time since these can change on every test run, so if shards are not run completely separately from each other they can conflict (this doesn't affect us though). Note that it only separates test files, so individual tests in the same file will never be in different shards (which makes sense, because otherwise any setup/teardown in beforeAll/afterAlls would need to be repeated).

I see, I must have been confused about this: MatrixAI/Polykey#380 (comment).

Did you see that comment?

Basically I originally thought jest was sharding on file size, but yes, this is basically done through a deterministic uniform distribution using a sha hash.

This means jest's sharding system doesn't do any kind of bin-packing. I would imagine that first time runs, bin packing should be done based on file size, and then subsequent runs bin packing to be done on timing data.

Ok so we can create a future issue to add a bin-packing algo for the sharding method for our custom test sequencer. Yes it's NP-complete, but there are heuristics that be done. I'd like to see how long it would take to execute...

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Jun 28, 2022

When we get around to optimising the shard count and such, generating the YAML file is currently done with just a shell script test-pipelines.sh, and shell languages are not always easy to program with. We could use node to execute a JS script with https://github.com/eemeli/yaml, to generate yaml instead. That would give a bit more flexibility, although the shell script is pretty cool in using cat << EOF for output.

@CMCDragonkai
Copy link
Member

Oh, if the child pipeline needs to pull in dependencies from the parent pipeline. Refer to this: https://stackoverflow.com/questions/70663367/artifacts-are-not-pulled-in-a-child-pipeline.

And since our CI is now a DAG, our child pipeline jobs should also be updated to use DAG like features. See the second answer. Any leaf job should have needs: [] to make that explicit.

@emmacasolin
Copy link
Contributor Author

Jest's default sharding algorithm doesn't optimise the shards in any way - it just randomly (and uniformly) distributes the test files into shards deterministically. It doesn't incorporate things like test time since these can change on every test run, so if shards are not run completely separately from each other they can conflict (this doesn't affect us though). Note that it only separates test files, so individual tests in the same file will never be in different shards (which makes sense, because otherwise any setup/teardown in beforeAll/afterAlls would need to be repeated).

I see, I must have been confused about this: MatrixAI/js-polykey#380 (comment).

Did you see that comment?

Basically I originally thought jest was sharding on file size, but yes, this is basically done through a deterministic uniform distribution using a sha hash.

This means jest's sharding system doesn't do any kind of bin-packing. I would imagine that first time runs, bin packing should be done based on file size, and then subsequent runs bin packing to be done on timing data.

Ok so we can create a future issue to add a bin-packing algo for the sharding method for our custom test sequencer. Yes it's NP-complete, but there are heuristics that be done. I'd like to see how long it would take to execute...

Ahh yeah I think that article they wrote is talking about their sorting algorithm rather than their sharding algorithm.

@emmacasolin
Copy link
Contributor Author

There are some upstream issues preventing us from properly including logger output in out junit reports on gitlab:

  1. Jest has these problems Running tests in verbose mode with console logging breaks output jest-community/jest-react-reporter#4 (comment)
  • Console output is not properly captured unless verbose is set to false
  • We can override this for our tests when running on the CI/CD, but it's still not ideal
  1. Junit has these problems console support jest-community/jest-junit#45 (comment)
  • The issue with needing to run at least two test files seems to be resolved, at least, I'm observing console output being reported for shards containing only one test file
  • The verbosity issue still remains though and is subject to the above bug with Jest being fixed
  1. Gitlab has these problems https://gitlab.com/gitlab-org/gitlab/-/issues/287664
  • Even once the Jest and Junit issues above are resolved, Gitlab itself cannot display console output as it is currently formatted in the Junit output
  • Once Gitlab is able to display information from <system-out> tags then the other issues can be worked around

js-logger has also been updated now to have separate console handlers for console.error and console.stdout. Either of these will work with Junit reporting once the issues with that have been resolved. However, at the end of the day, we don't really want to use ConsoleErrHandler, we want to use StreamHandler as that's what is correct. And doesn't produce all that noise in our local testing logs. Also verbose is what we do want in local testing, but it could be disabled in CI/CD with --verbose=false if in the future it's needed. But for now, it's not feasible to show stdout/stderr logs in our junit reports until all those problems are fixed.

I'll create a new issue here for reporting the console output of tests once that's available (and to watch the issues blocking it), and will create issues in Polykey for test load balancing (including bin packing). We don't want to fully port over test load balancing to PK due to not being able to capture stderr/stdout, but we can use it for our MacOS and Windows tests (keeping the existing child pipeline setup for linux tests so we can do debugging there).

We can use jest's `shard` option to implement test load balancing. This required updating to jest v28, as sharding was introduced in this version.
For now the shard count is chosen manually and is statically set, however, this may be optimised in the future.
Test load balancing is now utilised for both `test` and `build` (platform) pipelines.
Changed the options for junit reporting to better organise the tests.
Tests will now be displayed in Gitlab with the top-level describe, test name, and file path.
@emmacasolin emmacasolin marked this pull request as ready for review June 29, 2022 04:34
@emmacasolin emmacasolin merged commit 716a6a0 into staging Jun 29, 2022
emmacasolin added a commit that referenced this pull request Jun 29, 2022
- Removed `rules` from child pipelines and other jobs with dependencies
- Separated value for `parallel` into a variable for easier editing
- Uploading of the dist separated into its own job so multiple parallel jobs aren't uploading it
- Removed `--runInBand` since we want to make use of multiple cores

Relates to #65
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development
Development

Successfully merging this pull request may close these issues.

Test load balancing - parallelize jest tests
2 participants