Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rush] When using cobuilds, no-op operations should not be treated as uncacheable #4660

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

aramissennyeydd
Copy link
Contributor

Summary

This PR does 2 things,

  1. it adds logging for the cobuild build plan. It's currently pretty difficult to debug why cobuilds aren't performing as expected and some visibility into the clustering logic would be a good step in that direction.
  2. it fixes a bug where operations that are no-ops, like ignoreMissingScript or missingScriptBehavior: silent are treated as disabling the build cache causing all operations that depend on that operation to cluster. This causes some pretty extreme slow down for projects that use central dependencies that don't have build/test steps.

Details

No-op operations now no longer play into build cache cluster calculations. I'm also adding some visibility to the output for cobuilds so users can understand when this is happening. This solves the initial issue I was having, but additional operations may still have problems. While this is a bug fix, there may be projects that are incorrectly set up to build with cobuilds. This change may expose that and cause additional work by repo owners. However, I expect that they thought they were cobuilding all along. The bug fix should improve performance, the extra logging may decrease improvement and may be best to set behind verbose logging.

How it was tested

Tested a few different ways,

  1. In the cobuild sandbox repo, using rm -rf common/temp/build-cache && RUSH_COBUILD_CONTEXT_ID=foo REDIS_PASS=redis123 RUSH_COBUILD_RUNNER_ID=runner1 node ../../lib/runRush.js cobuild -p 10 and checking the output plan.
  2. In [rush] add support for sharding phases #4652's sharded-repo cobuild sandbox, simulating 2 runners and viewing the output. There was significantly less resource contention as the number of clusters went from 7 to 227.
  3. In our internal repo, where number of clusters went from 3 to 127.
  4. I also verified that adding disableBuildCacheForProject: true to rush-project.json caused the expected drop in clusters, adding it to the e project in the sharded-repo project dropped the number of clusters from 227 to 127 as expected.

@aramissennyeydd
Copy link
Contributor Author

Example output with --debug:

Build Plan Depth (deepest dependency tree): 5
Build Plan Width (maximum parallelism): 3
Number of Nodes per Depth: 1, 1, 2, 2, 3
Plan @ Depth 4 has 3 nodes and 0 dependents:
- f (build)
- g (build)
- e (build)
Plan @ Depth 3 has 2 nodes and 3 dependents:
- f (pre-build)
- g (pre-build)
Plan @ Depth 2 has 2 nodes and 5 dependents:
- d (build)
- a (build)
Plan @ Depth 1 has 1 nodes and 7 dependents:
- c (build)
Plan @ Depth 0 has 1 nodes and 8 dependents:
- b (build)
##################################################
        a (build): (4)
        b (build): (3)
        c (build): -(6)
 f (pre-build): -(9)
 g (pre-build): -(10)
        d (build): --(8)
         f (build): --(9)
        g (build): --(10)
        e (build): ---(12)
##################################################
Cluster 0:
- Dependencies: none
- Clustered by: 
  - none
- Operations: a (pre-build) [SKIPPED]
--------------------------------------------------
Cluster 1:
- Dependencies: none
- Clustered by: 
  - none
- Operations: b (pre-build) [SKIPPED]
--------------------------------------------------
Cluster 2:
- Dependencies: b (_phase:build)
- Clustered by: 
  - none
- Operations: c (pre-build) [SKIPPED]
--------------------------------------------------
Cluster 3:
- Dependencies: b (_phase:pre-build)
- Clustered by: 
  - none
- Operations: b (build)
--------------------------------------------------
Cluster 4:
- Dependencies: a (_phase:pre-build)
- Clustered by: 
  - none
- Operations: a (build)
--------------------------------------------------
Cluster 5:
- Dependencies: b (_phase:build), c (_phase:build)
- Clustered by: 
  - none
- Operations: d (pre-build) [SKIPPED]
--------------------------------------------------
Cluster 6:
- Dependencies: c (_phase:pre-build), b (_phase:build)
- Clustered by: 
  - none
- Operations: c (build)
--------------------------------------------------
Cluster 7:
- Dependencies: b (_phase:build), d (_phase:build)
- Clustered by: 
  - none
- Operations: e (pre-build) [SKIPPED]
--------------------------------------------------
Cluster 8:
- Dependencies: d (_phase:pre-build), b (_phase:build), c (_phase:build)
- Clustered by: 
  - none
- Operations: d (build)
--------------------------------------------------
Cluster 9:
- Dependencies: b (_phase:build)
- Clustered by: 
  - (f (pre-build)) "Caching has been disabled for this project."
- Operations: f (pre-build), f (build)
--------------------------------------------------
Cluster 10:
- Dependencies: b (_phase:build)
- Clustered by: 
  - (g (pre-build)) "Project does not have a rush-project.json configuration file, or one provided by a rig, so it does not support caching."
- Operations: g (pre-build), g (build)
--------------------------------------------------
Cluster 11:
- Dependencies: a (_phase:build)
- Clustered by: 
  - none
- Operations: h (pre-build) [SKIPPED]
--------------------------------------------------
Cluster 12:
- Dependencies: e (_phase:pre-build), b (_phase:build), d (_phase:build)
- Clustered by: 
  - none
- Operations: e (build)
--------------------------------------------------
Cluster 13:
- Dependencies: h (_phase:pre-build), a (_phase:build)
- Clustered by: 
  - none
- Operations: h (build) [SKIPPED]

Copy link
Contributor

@dmichon-msft dmichon-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mechanically this seems fine, though it will result in the Rush build cache engine doing considerably more work for noops than it used to, unless we still have logic to skip the actual cache reads/writes for noops.

@aramissennyeydd
Copy link
Contributor Author

@dmichon-msft I think no-ops should still be skipped bc the null operation runner has cacheable = false and https://github.com/aramissennyeydd/rushstack/blob/da87eea7b88dc81e28bafb5abecd0e375986dbd6/libraries/rush-lib/src/logic/operations/CacheableOperationPlugin.ts#L243-L245 should exit early.

@@ -1384,7 +1384,7 @@ export class RushLifecycleHooks {
// @alpha
export class RushProjectConfiguration {
readonly disableBuildCacheForProject: boolean;
getCacheDisabledReason(trackedFileNames: Iterable<string>, phaseName: string): string | undefined;
getCacheDisabledReason(operation: Operation, trackedFileNames: Iterable<string>, phaseName: string): string | undefined;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically a breaking change, even though the API is alpha

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I flipped it to

getCacheDisabledReason(trackedFileNames: Iterable<string>, phaseName: string, operation?: Operation): string | undefined;

that would be less breaking right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iclanton I'm not sure I agree with this recommendation. It seems like in the current the getCacheDisabledReason() is not going to produce correct results unless the Operation is included, so it is arguably a better experience for people's plugins to have a compile error about a missing function parameter, rather than silently appearing to compile while producing a wrong result. (In general, we should be cautious about introducing optional parameters in public APIs, because they frequently produce error-prone contracts.)

Is there actually a realistic concern about third-party Rush plugins using this @alpha API? If so, as a compromise what we could do is make operation into a required parameter, but put it last and retain the operation?.runner test in case an old plugin calls the API incorrectly (because it was compiled using an older Rush .d.ts file).

(Although it's questionable how much risk there is for this contract, given that it is @alpha.)

libraries/rush-lib/src/logic/operations/BuildPlanPlugin.ts Outdated Show resolved Hide resolved
@aramissennyeydd aramissennyeydd force-pushed the fix-caching-behavior-undefined branch 2 times, most recently from b9a70cf to 3ecc6cc Compare May 10, 2024 14:00
@octogonz
Copy link
Collaborator

@iclanton @dmichon-msft Are we ready to merge this? This PR has been open for nearly a month.

@octogonz octogonz changed the title fix(cobuilds): no-op operations should not be treated as uncacheable [rush] When using cobuilds, no-op operations should not be treated as uncacheable May 14, 2024
Comment on lines +312 to +315
// Skip no-op operations as they won't have any output/cacheable things.
if (operation?.runner?.isNoOp) {
return undefined;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should actually check this first; if the operation is a noop it doesn't matter if it is marked as uncacheable or not.

public getCacheDisabledReason(
trackedFileNames: Iterable<string>,
phaseName: string,
operation?: Operation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass isNoOp directly? Or just have the caller check it? I'm not comfortable with a configuration file object (which RushProjectConfiguration is) taking a dependency on the execution logic layer, even only for types.

That'll also make your unit tests much simpler.

const fileHashes: Map<string, string> | undefined =
await projectChangeAnalyzer._tryGetProjectDependenciesAsync(associatedProject, terminal);
const cacheDisabledReason: string | undefined = projectConfiguration
? projectConfiguration.getCacheDisabledReason(fileHashes!.keys(), associatedPhase.name, operation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
? projectConfiguration.getCacheDisabledReason(fileHashes!.keys(), associatedPhase.name, operation)
? projectConfiguration.getCacheDisabledReason(fileHashes!.keys(), associatedPhase.name, operation?.runner?.isNoOp)

That avoids a lot of coupling.

}

function generateCobuildPlanSummary(operations: Operation[], terminal: ITerminal): ICobuildPlan['summary'] {
const noOpOperations: Set<Operation> = new Set(operations.filter((e) => e.runner?.isNoOp));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably cheaper to just define:

function isNoOp(operation: Operation): boolean {
  return operation?.runner?.isNoOp === true;
}

And call that instead of creating the set. I expect the lookup cost of the property access is negligible and I think it's easier to read calling the function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heck, feel free to add a get isNoOp() { return !!this.runner?.isNoOp; } on Operation.

Comment on lines +95 to +97
if (numberOfConsumers === 0) {
leafQueue.push(operation);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because your only consumer is a noop does not mean that you are a leaf of the build tree. You are only a leaf if all of your consumers' consumers are noops all the way down.

A -> noop -> B is completely valid. The graph construction deliberately leaves in such nodes, since more typically that looks like:

A -\       /- E
B -- > D < -- F
C -/       \- G

let currentLeafNodes: Set<Operation> = new Set<Operation>();
const remainingOperations: Set<Operation> = new Set<Operation>(operations);
let depth: number = 0;
let maxWidth: number = leafQueue.filter((e) => !e.runner?.isNoOp).length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 86 you already filtered out all noops from leafQueue, so this expression is exactly equal to leafQueue.length.

@@ -126,7 +126,7 @@ export class CacheableOperationPlugin implements IPhasedCommandPlugin {
const operationSettings: IOperationSettings | undefined =
projectConfiguration?.operationSettingsByOperationName.get(phaseName);
const cacheDisabledReason: string | undefined = projectConfiguration
? projectConfiguration.getCacheDisabledReason(fileHashes.keys(), phaseName)
? projectConfiguration.getCacheDisabledReason(fileHashes.keys(), phaseName, operation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
? projectConfiguration.getCacheDisabledReason(fileHashes.keys(), phaseName, operation)
? projectConfiguration.getCacheDisabledReason(fileHashes.keys(), phaseName, operation?.runner?.isNoOp)

Comment on lines +125 to +126
if (remainingOperations.has(leaf)) {
remainingOperations.delete(leaf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (remainingOperations.has(leaf)) {
remainingOperations.delete(leaf);
if (remainingOperations.delete(leaf)) {

delete returns a boolean indicating if it did anything.

const numberOfNodes: number[] = [maxWidth];
const depthToOperationsMap: Map<number, Set<Operation>> = new Map<number, Set<Operation>>();
depthToOperationsMap.set(depth, new Set(leafQueue));
do {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is presumably a standard well-known algorithm; could you put a comment with the name of the algorithm? Or a brief description of what it is supposed to be doing?

@@ -189,6 +190,12 @@ export class PhasedScriptAction extends BaseScriptAction<IPhasedCommandConfig> {
' including an ASCII chart of the start and stop times for each operation.'
});
}
this._cobuildPlanParameter = this.defineFlagParameter({
parameterLongName: '--cobuild-plan',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
parameterLongName: '--cobuild-plan',
parameterLongName: '--log-cobuild-plan',

this._cobuildPlanParameter = this.defineFlagParameter({
parameterLongName: '--cobuild-plan',
description:
'Before the build starts, log information about the cobuild state. This will include information about ' +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'Before the build starts, log information about the cobuild state. This will include information about ' +
'(EXPERIMENTAL) Before the build starts, log information about the cobuild state. This will include information about ' +

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda feels like this should just log the plan instead of logging it and then executing it.

@@ -415,6 +423,16 @@ export class PhasedScriptAction extends BaseScriptAction<IPhasedCommandConfig> {
terminal.writeVerboseLine(`Incremental strategy: none (full rebuild)`);
}

const showBuildPlan: boolean = this._cobuildPlanParameter ? this._cobuildPlanParameter.value : false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const showBuildPlan: boolean = this._cobuildPlanParameter ? this._cobuildPlanParameter.value : false;
const showBuildPlan: boolean = this._cobuildPlanParameter?.value ?? false;

destination: mockStreamWritable
});

describe('BuildPlanPlugin', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
describe('BuildPlanPlugin', () => {
describe(BuildPlanPlugin.name, () => {

import { ProjectChangeAnalyzer } from '../../ProjectChangeAnalyzer';

const mockWritable: MockWritable = new MockWritable();
const mockTerminal: Terminal = new Terminal(new CollatedTerminalProvider(new CollatedTerminal(mockWritable)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use StringBufferTerminalProvider? It's designed for unit testing.

Comment on lines +4 to +13
jest.mock('@rushstack/terminal', () => {
const originalModule = jest.requireActual('@rushstack/terminal');
return {
...originalModule,
ConsoleTerminalProvider: {
...originalModule.ConsoleTerminalProvider,
supportsColor: true
}
};
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's up with this mock?

Comment on lines +43 to +46
const mockStreamWritable: MockWritable = new MockWritable();
const streamCollator = new StreamCollator({
destination: mockStreamWritable
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this should get reinitialized before each test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know there's only one test in this file right now, but it'll be confusing if someone adds another one later and sees output from a previously-run test.

exports[`BuildPlanPlugin build plan debugging should generate a build plan 1`] = `
"Build Plan Depth (deepest dependency tree): 5
Build Plan Width (maximum parallelism): 40
Number of Nodes per Depth: 1, 4, 11, 40, 34
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

@@ -0,0 +1,736 @@
// Jest Snapshot v1, https://goo.gl/fbAQLP

exports[`BuildPlanPlugin build plan debugging should generate a build plan 1`] = `
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reading this, I'm not really sure what I'm looking at. There should at least be some extensive documentation on how to read the output from this flag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

None yet

5 participants