[rush] When using cobuilds, no-op operations should not be treated as uncacheable #4660

aramissennyeydd · 2024-04-18T20:42:22Z

Summary

This PR does 2 things,

it adds logging for the cobuild build plan. It's currently pretty difficult to debug why cobuilds aren't performing as expected and some visibility into the clustering logic would be a good step in that direction.
it fixes a bug where operations that are no-ops, like ignoreMissingScript or missingScriptBehavior: silent are treated as disabling the build cache causing all operations that depend on that operation to cluster. This causes some pretty extreme slow down for projects that use central dependencies that don't have build/test steps.

Details

No-op operations now no longer play into build cache cluster calculations. I'm also adding some visibility to the output for cobuilds so users can understand when this is happening. This solves the initial issue I was having, but additional operations may still have problems. While this is a bug fix, there may be projects that are incorrectly set up to build with cobuilds. This change may expose that and cause additional work by repo owners. However, I expect that they thought they were cobuilding all along. The bug fix should improve performance, the extra logging may decrease improvement and may be best to set behind verbose logging.

How it was tested

Tested a few different ways,

In the cobuild sandbox repo, using rm -rf common/temp/build-cache && RUSH_COBUILD_CONTEXT_ID=foo REDIS_PASS=redis123 RUSH_COBUILD_RUNNER_ID=runner1 node ../../lib/runRush.js cobuild -p 10 and checking the output plan.
In [rush] add support for sharding phases #4652's sharded-repo cobuild sandbox, simulating 2 runners and viewing the output. There was significantly less resource contention as the number of clusters went from 7 to 227.
In our internal repo, where number of clusters went from 3 to 127.
I also verified that adding disableBuildCacheForProject: true to rush-project.json caused the expected drop in clusters, adding it to the e project in the sharded-repo project dropped the number of clusters from 227 to 127 as expected.

libraries/rush-lib/src/logic/operations/CacheableOperationPlugin.ts

aramissennyeydd · 2024-04-24T13:32:15Z

Example output with --debug:

Build Plan Depth (deepest dependency tree): 5
Build Plan Width (maximum parallelism): 3
Number of Nodes per Depth: 1, 1, 2, 2, 3
Plan @ Depth 4 has 3 nodes and 0 dependents:
- f (build)
- g (build)
- e (build)
Plan @ Depth 3 has 2 nodes and 3 dependents:
- f (pre-build)
- g (pre-build)
Plan @ Depth 2 has 2 nodes and 5 dependents:
- d (build)
- a (build)
Plan @ Depth 1 has 1 nodes and 7 dependents:
- c (build)
Plan @ Depth 0 has 1 nodes and 8 dependents:
- b (build)
##################################################
        a (build): (4)
        b (build): (3)
        c (build): -(6)
 f (pre-build): -(9)
 g (pre-build): -(10)
        d (build): --(8)
         f (build): --(9)
        g (build): --(10)
        e (build): ---(12)
##################################################
Cluster 0:
- Dependencies: none
- Clustered by: 
  - none
- Operations: a (pre-build) [SKIPPED]
--------------------------------------------------
Cluster 1:
- Dependencies: none
- Clustered by: 
  - none
- Operations: b (pre-build) [SKIPPED]
--------------------------------------------------
Cluster 2:
- Dependencies: b (_phase:build)
- Clustered by: 
  - none
- Operations: c (pre-build) [SKIPPED]
--------------------------------------------------
Cluster 3:
- Dependencies: b (_phase:pre-build)
- Clustered by: 
  - none
- Operations: b (build)
--------------------------------------------------
Cluster 4:
- Dependencies: a (_phase:pre-build)
- Clustered by: 
  - none
- Operations: a (build)
--------------------------------------------------
Cluster 5:
- Dependencies: b (_phase:build), c (_phase:build)
- Clustered by: 
  - none
- Operations: d (pre-build) [SKIPPED]
--------------------------------------------------
Cluster 6:
- Dependencies: c (_phase:pre-build), b (_phase:build)
- Clustered by: 
  - none
- Operations: c (build)
--------------------------------------------------
Cluster 7:
- Dependencies: b (_phase:build), d (_phase:build)
- Clustered by: 
  - none
- Operations: e (pre-build) [SKIPPED]
--------------------------------------------------
Cluster 8:
- Dependencies: d (_phase:pre-build), b (_phase:build), c (_phase:build)
- Clustered by: 
  - none
- Operations: d (build)
--------------------------------------------------
Cluster 9:
- Dependencies: b (_phase:build)
- Clustered by: 
  - (f (pre-build)) "Caching has been disabled for this project."
- Operations: f (pre-build), f (build)
--------------------------------------------------
Cluster 10:
- Dependencies: b (_phase:build)
- Clustered by: 
  - (g (pre-build)) "Project does not have a rush-project.json configuration file, or one provided by a rig, so it does not support caching."
- Operations: g (pre-build), g (build)
--------------------------------------------------
Cluster 11:
- Dependencies: a (_phase:build)
- Clustered by: 
  - none
- Operations: h (pre-build) [SKIPPED]
--------------------------------------------------
Cluster 12:
- Dependencies: e (_phase:pre-build), b (_phase:build), d (_phase:build)
- Clustered by: 
  - none
- Operations: e (build)
--------------------------------------------------
Cluster 13:
- Dependencies: h (_phase:pre-build), a (_phase:build)
- Clustered by: 
  - none
- Operations: h (build) [SKIPPED]

dmichon-msft

Mechanically this seems fine, though it will result in the Rush build cache engine doing considerably more work for noops than it used to, unless we still have logic to skip the actual cache reads/writes for noops.

libraries/rush-lib/src/logic/operations/CacheableOperationPlugin.ts

aramissennyeydd · 2024-04-25T19:19:00Z

@dmichon-msft I think no-ops should still be skipped bc the null operation runner has cacheable = false and https://github.com/aramissennyeydd/rushstack/blob/da87eea7b88dc81e28bafb5abecd0e375986dbd6/libraries/rush-lib/src/logic/operations/CacheableOperationPlugin.ts#L243-L245 should exit early.

iclanton · 2024-05-10T03:42:29Z

common/reviews/api/rush-lib.api.md

@@ -1384,7 +1384,7 @@ export class RushLifecycleHooks {
 // @alpha
 export class RushProjectConfiguration {
 readonly disableBuildCacheForProject: boolean;
- getCacheDisabledReason(trackedFileNames: Iterable<string>, phaseName: string): string | undefined;
+ getCacheDisabledReason(operation: Operation, trackedFileNames: Iterable<string>, phaseName: string): string | undefined;


This is technically a breaking change, even though the API is alpha

If I flipped it to

getCacheDisabledReason(trackedFileNames: Iterable<string>, phaseName: string, operation?: Operation): string | undefined;

that would be less breaking right?

@iclanton I'm not sure I agree with this recommendation. It seems like in the current the getCacheDisabledReason() is not going to produce correct results unless the Operation is included, so it is arguably a better experience for people's plugins to have a compile error about a missing function parameter, rather than silently appearing to compile while producing a wrong result. (In general, we should be cautious about introducing optional parameters in public APIs, because they frequently produce error-prone contracts.)

Is there actually a realistic concern about third-party Rush plugins using this @alpha API? If so, as a compromise what we could do is make operation into a required parameter, but put it last and retain the operation?.runner test in case an old plugin calls the API incorrectly (because it was compiled using an older Rush .d.ts file).

(Although it's questionable how much risk there is for this contract, given that it is @alpha.)

libraries/rush-lib/src/cli/scriptActions/PhasedScriptAction.ts

libraries/rush-lib/src/logic/operations/BuildPlanPlugin.ts

Co-authored-by: David Michon <[email protected]>

Co-authored-by: Ian Clanton-Thuon <[email protected]>

octogonz · 2024-05-14T02:40:11Z

@iclanton @dmichon-msft Are we ready to merge this? This PR has been open for nearly a month.

dmichon-msft · 2024-05-23T00:28:02Z

libraries/rush-lib/src/api/RushProjectConfiguration.ts

+ // Skip no-op operations as they won't have any output/cacheable things.
+ if (operation?.runner?.isNoOp) {
+ return undefined;
+ }


We should actually check this first; if the operation is a noop it doesn't matter if it is marked as uncacheable or not.

dmichon-msft · 2024-05-23T00:29:14Z

libraries/rush-lib/src/api/RushProjectConfiguration.ts

+ public getCacheDisabledReason(
+ trackedFileNames: Iterable<string>,
+ phaseName: string,
+ operation?: Operation


Pass isNoOp directly? Or just have the caller check it? I'm not comfortable with a configuration file object (which RushProjectConfiguration is) taking a dependency on the execution logic layer, even only for types.

That'll also make your unit tests much simpler.

dmichon-msft · 2024-05-23T00:31:47Z

libraries/rush-lib/src/logic/operations/BuildPlanPlugin.ts

+ const fileHashes: Map<string, string> | undefined =
+ await projectChangeAnalyzer._tryGetProjectDependenciesAsync(associatedProject, terminal);
+ const cacheDisabledReason: string | undefined = projectConfiguration
+ ? projectConfiguration.getCacheDisabledReason(fileHashes!.keys(), associatedPhase.name, operation)


Suggested change

? projectConfiguration.getCacheDisabledReason(fileHashes!.keys(), associatedPhase.name, operation)

? projectConfiguration.getCacheDisabledReason(fileHashes!.keys(), associatedPhase.name, operation?.runner?.isNoOp)

That avoids a lot of coupling.

dmichon-msft · 2024-05-23T00:35:12Z

libraries/rush-lib/src/logic/operations/BuildPlanPlugin.ts

+}
+
+function generateCobuildPlanSummary(operations: Operation[], terminal: ITerminal): ICobuildPlan['summary'] {
+ const noOpOperations: Set<Operation> = new Set(operations.filter((e) => e.runner?.isNoOp));


It's probably cheaper to just define:

function isNoOp(operation: Operation): boolean { return operation?.runner?.isNoOp === true; }

And call that instead of creating the set. I expect the lookup cost of the property access is negligible and I think it's easier to read calling the function.

Heck, feel free to add a get isNoOp() { return !!this.runner?.isNoOp; } on Operation.

dmichon-msft · 2024-05-23T00:38:55Z

libraries/rush-lib/src/logic/operations/BuildPlanPlugin.ts

+ if (numberOfConsumers === 0) {
+ leafQueue.push(operation);
+ }


Just because your only consumer is a noop does not mean that you are a leaf of the build tree. You are only a leaf if all of your consumers' consumers are noops all the way down.

A -> noop -> B is completely valid. The graph construction deliberately leaves in such nodes, since more typically that looks like:

A -\ /- E B -- > D < -- F C -/ \- G

dmichon-msft · 2024-05-23T00:41:08Z

libraries/rush-lib/src/logic/operations/BuildPlanPlugin.ts

+ let currentLeafNodes: Set<Operation> = new Set<Operation>();
+ const remainingOperations: Set<Operation> = new Set<Operation>(operations);
+ let depth: number = 0;
+ let maxWidth: number = leafQueue.filter((e) => !e.runner?.isNoOp).length;


On line 86 you already filtered out all noops from leafQueue, so this expression is exactly equal to leafQueue.length.

dmichon-msft · 2024-05-23T00:43:08Z

libraries/rush-lib/src/logic/operations/CacheableOperationPlugin.ts

@@ -126,7 +126,7 @@ export class CacheableOperationPlugin implements IPhasedCommandPlugin {
 const operationSettings: IOperationSettings | undefined =
 projectConfiguration?.operationSettingsByOperationName.get(phaseName);
 const cacheDisabledReason: string | undefined = projectConfiguration
- ? projectConfiguration.getCacheDisabledReason(fileHashes.keys(), phaseName)
+ ? projectConfiguration.getCacheDisabledReason(fileHashes.keys(), phaseName, operation)


Suggested change

? projectConfiguration.getCacheDisabledReason(fileHashes.keys(), phaseName, operation)

? projectConfiguration.getCacheDisabledReason(fileHashes.keys(), phaseName, operation?.runner?.isNoOp)

dmichon-msft · 2024-05-23T00:43:59Z

libraries/rush-lib/src/logic/operations/BuildPlanPlugin.ts

+ if (remainingOperations.has(leaf)) {
+ remainingOperations.delete(leaf);


Suggested change

if (remainingOperations.has(leaf)) {

remainingOperations.delete(leaf);

if (remainingOperations.delete(leaf)) {

delete returns a boolean indicating if it did anything.

dmichon-msft · 2024-05-23T00:51:52Z

libraries/rush-lib/src/logic/operations/BuildPlanPlugin.ts

+ const numberOfNodes: number[] = [maxWidth];
+ const depthToOperationsMap: Map<number, Set<Operation>> = new Map<number, Set<Operation>>();
+ depthToOperationsMap.set(depth, new Set(leafQueue));
+ do {


This is presumably a standard well-known algorithm; could you put a comment with the name of the algorithm? Or a brief description of what it is supposed to be doing?

iclanton · 2024-05-23T02:54:42Z

libraries/rush-lib/src/cli/scriptActions/PhasedScriptAction.ts

@@ -189,6 +190,12 @@ export class PhasedScriptAction extends BaseScriptAction<IPhasedCommandConfig> {
 ' including an ASCII chart of the start and stop times for each operation.'
 });
 }
+ this._cobuildPlanParameter = this.defineFlagParameter({
+ parameterLongName: '--cobuild-plan',


Suggested change

parameterLongName: '--cobuild-plan',

parameterLongName: '--log-cobuild-plan',

iclanton · 2024-05-23T02:55:00Z

libraries/rush-lib/src/cli/scriptActions/PhasedScriptAction.ts

+ this._cobuildPlanParameter = this.defineFlagParameter({
+ parameterLongName: '--cobuild-plan',
+ description:
+ 'Before the build starts, log information about the cobuild state. This will include information about ' +


Suggested change

'Before the build starts, log information about the cobuild state. This will include information about ' +

'(EXPERIMENTAL) Before the build starts, log information about the cobuild state. This will include information about ' +

Kinda feels like this should just log the plan instead of logging it and then executing it.

iclanton · 2024-05-23T02:55:18Z

libraries/rush-lib/src/cli/scriptActions/PhasedScriptAction.ts

@@ -415,6 +423,16 @@ export class PhasedScriptAction extends BaseScriptAction<IPhasedCommandConfig> {
 terminal.writeVerboseLine(`Incremental strategy: none (full rebuild)`);
 }

+ const showBuildPlan: boolean = this._cobuildPlanParameter ? this._cobuildPlanParameter.value : false;


Suggested change

const showBuildPlan: boolean = this._cobuildPlanParameter ? this._cobuildPlanParameter.value : false;

const showBuildPlan: boolean = this._cobuildPlanParameter?.value ?? false;

iclanton · 2024-05-23T02:57:53Z

libraries/rush-lib/src/logic/operations/test/BuildPlanPlugin.test.ts

+ destination: mockStreamWritable
+});
+
+describe('BuildPlanPlugin', () => {


Suggested change

describe('BuildPlanPlugin', () => {

describe(BuildPlanPlugin.name, () => {

iclanton · 2024-05-23T02:58:57Z

libraries/rush-lib/src/logic/operations/test/BuildPlanPlugin.test.ts

+import { ProjectChangeAnalyzer } from '../../ProjectChangeAnalyzer';
+
+const mockWritable: MockWritable = new MockWritable();
+const mockTerminal: Terminal = new Terminal(new CollatedTerminalProvider(new CollatedTerminal(mockWritable)));


Why not just use StringBufferTerminalProvider? It's designed for unit testing.

iclanton · 2024-05-23T02:59:10Z

libraries/rush-lib/src/logic/operations/test/BuildPlanPlugin.test.ts

+jest.mock('@rushstack/terminal', () => {
+ const originalModule = jest.requireActual('@rushstack/terminal');
+ return {
+ ...originalModule,
+ ConsoleTerminalProvider: {
+ ...originalModule.ConsoleTerminalProvider,
+ supportsColor: true
+ }
+ };
+});


What's up with this mock?

iclanton · 2024-05-23T02:59:33Z

libraries/rush-lib/src/logic/operations/test/BuildPlanPlugin.test.ts

+const mockStreamWritable: MockWritable = new MockWritable();
+const streamCollator = new StreamCollator({
+ destination: mockStreamWritable
+});


Seems like this should get reinitialized before each test.

I know there's only one test in this file right now, but it'll be confusing if someone adds another one later and sees output from a previously-run test.

iclanton · 2024-05-23T03:01:36Z

libraries/rush-lib/src/logic/operations/test/__snapshots__/BuildPlanPlugin.test.ts.snap

+exports[`BuildPlanPlugin build plan debugging should generate a build plan 1`] = `
+"Build Plan Depth (deepest dependency tree): 5
+Build Plan Width (maximum parallelism): 40
+Number of Nodes per Depth: 1, 4, 11, 40, 34


What does this mean?

iclanton · 2024-05-23T03:02:57Z

libraries/rush-lib/src/logic/operations/test/__snapshots__/BuildPlanPlugin.test.ts.snap

@@ -0,0 +1,736 @@
+// Jest Snapshot v1, https://goo.gl/fbAQLP
+
+exports[`BuildPlanPlugin build plan debugging should generate a build plan 1`] = `


When reading this, I'm not really sure what I'm looking at. There should at least be some extensive documentation on how to read the output from this flag.

aramissennyeydd requested review from iclanton, octogonz, apostolisms, D4N14L and dmichon-msft as code owners April 18, 2024 20:42

MichaelSitter reviewed Apr 19, 2024

View reviewed changes

libraries/rush-lib/src/logic/operations/CacheableOperationPlugin.ts Outdated Show resolved Hide resolved

dmichon-msft reviewed Apr 25, 2024

View reviewed changes

aramissennyeydd requested a review from dmichon-msft April 29, 2024 15:06

iclanton reviewed May 10, 2024

View reviewed changes

aramissennyeydd requested a review from patmill as a code owner May 10, 2024 13:58

aramissennyeydd and others added 18 commits May 10, 2024 09:59

fix cobuild plugin sandbox

84bcc6d

improve cache disabled check for skipped scripts

e5780f4

add changeset

f4b765a

use itsnoop and adjust tests

22fd9cf

small improvements in logging

d7eb323

fix tsc

9ca4353

better handling of noop and skipped elements

29cb7dc

update to debug logs

7e37a16

add logging about tree depth

f39d8f8

adjust maxWidth and depth calcs

e73fdba

update wording for parents vs dependents

b068fe8

operation depth wording

6dd1b75

fix initial width calc

9aa5a49

fix tsc errors

5397768

Apply suggestions from code review

1244ea1

Co-authored-by: David Michon <[email protected]>

address PR comments

d831692

move to a separate logging plugin and add tests

ef71ebc

fix linting errors

067b60b

aramissennyeydd and others added 6 commits May 10, 2024 09:59

fix typing issue

cdf753e

retry tests

138c9e1

more linting

81f3536

Apply suggestions from code review

98a2b91

Co-authored-by: Ian Clanton-Thuon <[email protected]>

make into a non-breaking change

d98fd22

fix snapshots

3ecc6cc

aramissennyeydd force-pushed the fix-caching-behavior-undefined branch 2 times, most recently from b9a70cf to 3ecc6cc Compare May 10, 2024 14:00

octogonz changed the title ~~fix(cobuilds): no-op operations should not be treated as uncacheable~~ [rush] When using cobuilds, no-op operations should not be treated as uncacheable May 14, 2024

aramissennyeydd added 2 commits May 22, 2024 11:45

fix clustering in the display plugin

6865a11

fix linting

21ee0d6

dmichon-msft reviewed May 23, 2024

View reviewed changes

iclanton reviewed May 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rush] When using cobuilds, no-op operations should not be treated as uncacheable #4660

[rush] When using cobuilds, no-op operations should not be treated as uncacheable #4660

aramissennyeydd commented Apr 18, 2024

aramissennyeydd commented Apr 24, 2024

dmichon-msft left a comment

aramissennyeydd commented Apr 25, 2024

iclanton May 10, 2024

aramissennyeydd May 10, 2024

octogonz May 14, 2024

octogonz commented May 14, 2024

dmichon-msft May 23, 2024

dmichon-msft May 23, 2024

dmichon-msft May 23, 2024

dmichon-msft May 23, 2024

dmichon-msft May 23, 2024

dmichon-msft May 23, 2024

dmichon-msft May 23, 2024

dmichon-msft May 23, 2024

dmichon-msft May 23, 2024

dmichon-msft May 23, 2024

iclanton May 23, 2024

iclanton May 23, 2024

iclanton May 23, 2024

iclanton May 23, 2024

iclanton May 23, 2024

iclanton May 23, 2024

iclanton May 23, 2024

iclanton May 23, 2024

iclanton May 23, 2024

iclanton May 23, 2024

iclanton May 23, 2024

	? projectConfiguration.getCacheDisabledReason(fileHashes!.keys(), associatedPhase.name, operation)
	? projectConfiguration.getCacheDisabledReason(fileHashes!.keys(), associatedPhase.name, operation?.runner?.isNoOp)

		if (remainingOperations.has(leaf)) {
		remainingOperations.delete(leaf);

	if (remainingOperations.has(leaf)) {
	remainingOperations.delete(leaf);
	if (remainingOperations.delete(leaf)) {

	parameterLongName: '--cobuild-plan',
	parameterLongName: '--log-cobuild-plan',

	'Before the build starts, log information about the cobuild state. This will include information about ' +
	'(EXPERIMENTAL) Before the build starts, log information about the cobuild state. This will include information about ' +

	const showBuildPlan: boolean = this._cobuildPlanParameter ? this._cobuildPlanParameter.value : false;
	const showBuildPlan: boolean = this._cobuildPlanParameter?.value ?? false;

	describe('BuildPlanPlugin', () => {
	describe(BuildPlanPlugin.name, () => {

		@@ -0,0 +1,736 @@
		// Jest Snapshot v1, https://goo.gl/fbAQLP

		exports[`BuildPlanPlugin build plan debugging should generate a build plan 1`] = `

[rush] When using cobuilds, no-op operations should not be treated as uncacheable #4660

Are you sure you want to change the base?

[rush] When using cobuilds, no-op operations should not be treated as uncacheable #4660

Conversation

aramissennyeydd commented Apr 18, 2024

Summary

Details

How it was tested

aramissennyeydd commented Apr 24, 2024

dmichon-msft left a comment

Choose a reason for hiding this comment

aramissennyeydd commented Apr 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

octogonz commented May 14, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment