WIP test-bot: parallelize dependent test with sharding#21871
WIP test-bot: parallelize dependent test with sharding#21871GunniBusch wants to merge 1 commit intoHomebrew:mainfrom
Conversation
12de736 to
68d5e26
Compare
ece3685 to
3025738
Compare
MikeMcQuaid
left a comment
There was a problem hiding this comment.
Hey @GunniBusch! I think this misunderstands the goal of "parallelising" here; we're not wanting to parallelise within a single GitHub Actions worker/job but to fan out to many different GitHub Actions workers/jobs.
Let me know if that doesn't make sense!
Good to know. So like my prev one. |
634fcb7 to
9542419
Compare
One question indeed. Do you have any preferences for how this should be configured? If anything is unclear or you want a different configuration approach, let me know. |
744a9a0 to
2bd8a98
Compare
866ed91 to
6716ba0
Compare
6716ba0 to
fe25bbf
Compare
|
@GunniBusch Look at dev-cmd/generate-cask-ci-matrix.rb, that's an example of an existing matrix fanout implementation. Just a high-level comment: I'm happy for you to try this but managing the rollout of this and testing it on our production CI infrastructure is going to be very hard/time consuming give you're not a maintainer. |
I already assumed that. I did my testing I a test repo https://github.com/GunniBusch/brew-dependent-shard-e2e. (the latest test https://github.com/GunniBusch/brew-dependent-shard-e2e/actions/runs/23776799656) I hope this might be at least a bit useful for the future. But thanks for letting me know. |
|
@GunniBusch The repository is a good testing ground but the main thing that's probably worth doing is figuring out a gradual rollout strategy. |
This is exactly what I was thinking. Imagine pushing this and breaking homebrew-core workflows. I think the best solution would be to make this opt in at first. This means, by default there is no change in behaviour. (this is currently the case). What I had in mind how one could do a roll out like this, it would might be useful to have a label in homebrew core that maintainers could add to some selected prs to enable this fan-out (and disable if something broke) and test. And this change actually would not even be one that has to happen in the brew repository, because I already added a ENV variable to enable or disable (opt in) sharding. |
Yeh, I just don't think we can merge it without enough tests to be convinced that this is gonna work. Look at the
Yes, labels to enable/disable are a good fit here. Would be good to open that (draft) PR too. Also, here or in the other PR, it'd be good to write out a proposal for how we do a stage-by-stage rollout. I can help provide feedback and input on that. |
brew lgtm(style, typechecking and tests) with your changes locally?AI was used to help iterate on the implementation and tests. The resulting changes were manually reviewed and verified locally with
./bin/brew lgtm --online.Info: This is still WIP and not the final version. I am opening this PR early so I can test the branch in CI and make progress visible.
This PR changes dependent test-bot execution to use sharding.
The goal is to reduce total dependent-test runtime by splitting the dependent workload across multiple shard instances while keeping the implementation small and reusing the existing runner-selection flow.
The current approach is:
The intent is to make dependent testing faster without introducing a breaking change for the current
homebrew-coreCI pipeline.Overall flow
flowchart TB cf[changed formulae] --> tr[determine-test-runners] tr --> rm[runner matrix] rm --> rc[runner category] rc --> n[n shard jobs] note[each shard job recomputes the same sorted dependent universe locally] subgraph j0[job 0] d0[collect names] u0[uniq plus sort] p0[index 0 total 2] a0[owned dependents] t0[install and test] d0 --> u0 --> p0 --> a0 --> t0 end subgraph j1[job 1] d1[collect names] u1[uniq plus sort] p1[index 1 total 2] a1[owned dependents] t1[install and test] d1 --> u1 --> p1 --> a1 --> t1 end n --> d0 n --> d1 note -.-> d0 note -.-> d1PS: Currently this is opt-in
Configuration details
Shard Count From Dependents + Config
flowchart TD d[dependent count] --> m{runner penalty == 0?} t[base threshold] --> l[count / threshold] d --> l d --> q[quadratic solve] t --> q p[runner penalty] --> q m -- yes --> l m -- no --> q l --> c[ceil] q --> c c --> k[clamp to max runners] x[max runners] --> k k --> o[selected shard count]Shard count selection
This is the actual sizing rule used by the implementation: choose the smallest
rwherecoverage(r) = r*t + p*r*(r-1)reaches the compatible dependent count, then clamp tomax runners.What the knobs do:
tmoves all expansion cutoffs to the right, so runners expand later overallpleaves the first split unchanged, but pushes later splits out, so extra runners become harder to unlockmax runnersonly raises the cap; it does not change the cutoff curve