-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: Enable opt-dist for dist-aarch64-linux builds #133807
Conversation
Some changes occurred in src/tools/opt-dist cc @Kobzol |
Hi! Could you please split the part that moves the job to the aarch64 runner and the PGO/LTO part? So that we can evaluate the CI cost of these two actions separately. Thanks! |
ENV SCRIPT python3 ../x.py build --set rust.debug=true opt-dist && \ | ||
./build/$HOSTS/stage0-tools-bin/opt-dist linux-ci -- python3 ../x.py dist \ | ||
--host $HOSTS --target $HOSTS --include-default-paths build-manifest bootstrap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way this is a completely new dockerfile, so do you mean just replace this with a simple ./x dist
call and then wrap it with opt-dist separately? Just in separate commits or separate PRs altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant a separate PR, so that we can land these two changes (move to aarch64 host first, and then enable optimizations) separately :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, just wanted to make sure - no problem :)
What improvements are you seeing with this PR, over the current artifacts? |
I've not yet benchmarked the changes, and I'm not sure how they compare to the artifacts from cross-compilation because I was only doing aarch64 runs but specifically adding opt-dist with LTO and PGO seems to increase the binary sizes of the main artifacts as follows:
|
@bors try Let's also see how long it takes with the optimizations. |
ci: Enable opt-dist for dist-aarch64-linux builds Move the CI dist-aarch64-linux job to an aarch64 runner and enable optimised dist builds with the opt-dist pipeline. For the time being, disable bolt on aarch64 due to upstream bolt bugs. r? `@Kobzol` cc `@lqd`
ci: Enable opt-dist for dist-aarch64-linux builds Move the CI dist-aarch64-linux job to an aarch64 runner and enable optimised dist builds with the opt-dist pipeline. For the time being, disable bolt on aarch64 due to upstream bolt bugs. r? `@Kobzol` cc `@lqd`
That’s not going to be the good try job Jakub :3 |
💔 Test failed - checks-actions |
Ah, crap. Thanks! @bors try |
ci: Enable opt-dist for dist-aarch64-linux builds Move the CI dist-aarch64-linux job to an aarch64 runner and enable optimised dist builds with the opt-dist pipeline. For the time being, disable bolt on aarch64 due to upstream bolt bugs. r? `@Kobzol` cc `@lqd` try-job: dist-aarch64-linux
☀️ Try build successful - checks-actions |
So that's an extra hour for LTO+PGO without the cache. 2h22 vs 3h22. |
@bors try |
ci: Enable opt-dist for dist-aarch64-linux builds Move the CI dist-aarch64-linux job to an aarch64 runner and enable optimised dist builds with the opt-dist pipeline. For the time being, disable bolt on aarch64 due to upstream bolt bugs. r? `@Kobzol` cc `@lqd` try-job: dist-aarch64-linux
☀️ Try build successful - checks-actions |
1h54 cached, not so bad. Back to roughly the same time as the x86 cross build then. |
I assume good benchmark results can also help with the cost discussion. |
Indeed! You can download the CI artifacts e.g. using rustup-toolchain-install-master and benchmark it locally using rustc-perf. It would be nice to see the perf. diff. Let me know on Zulip if you want help with that. |
@rustbot ready |
@bors r+ |
Rollup of 7 pull requests Successful merges: - rust-lang#132397 (Make missing_abi lint warn-by-default.) - rust-lang#133807 (ci: Enable opt-dist for dist-aarch64-linux builds) - rust-lang#134143 (Convert `struct FromBytesWithNulError` into enum) - rust-lang#134338 (Use a C-safe return type for `__rust_[ui]128_*` overflowing intrinsics) - rust-lang#134678 (Update `ReadDir::next` in `std::sys::pal::unix::fs` to use `&raw const (*p).field` instead of `p.byte_offset().cast()`) - rust-lang#135424 (Detect unstable lint docs that dont enable their feature) - rust-lang#135520 (Make sure we actually use the right trivial lifetime substs when eagerly monomorphizing drop for ADTs) r? `@ghost` `@rustbot` modify labels: rollup
r+ but removed |
Oops, looks like my manual issue modification has raced with bors :) Thanks! |
Maybe we should rollup=never big CI changes like these in the future |
Yes, I just thought that after I noticed that it was already included in a rollup 😆 I guess that usually we run perf. for similar changes, so it's done by default, but since we don't have perf. monitoring for ARM (yet!), it wasn't done here, so I forgot about it, sorry. Just in case the currently running rollup fails, let's mark it as such. @bors rollup=never |
Rollup merge of rust-lang#133807 - mrkajetanp:ci-aarch64-opt-dist, r=Kobzol ci: Enable opt-dist for dist-aarch64-linux builds Move the CI dist-aarch64-linux job to an aarch64 runner and enable optimised dist builds with the opt-dist pipeline. For the time being, disable bolt on aarch64 due to upstream bolt bugs. r? `@Kobzol` cc `@lqd` try-job: dist-aarch64-linux
^ is hopefully just a visual bug. |
let is_aarch64 = target_triple.starts_with("aarch64"); | ||
|
||
let mut skip_tests = vec![ | ||
// Fails because of linker errors, as of June 2023. | ||
"tests/ui/process/nofile-limit.rs".to_string(), | ||
]; | ||
|
||
if is_aarch64 { | ||
skip_tests.extend([ | ||
// Those tests fail only inside of Docker on aarch64, as of December 2024 | ||
"tests/ui/consts/promoted_running_out_of_memory_issue-130687.rs".to_string(), | ||
"tests/ui/consts/large_const_alloc.rs".to_string(), | ||
]); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, I beg of you, do not do this. This hack is causing CI to pass and local developer workflows to fail, and it is hiding this regression #135952.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To explain, we already skip some tests in a similar manner on dist x64 (not just the explicitly skipped ones, but also whole test suites, e.g. run-make).
Some tests fail on dist, but work locally, these are candidates for being skipped on CI (after all, running the test suite on an extracted dist atchive is already a hack).
But if the test fails also outside of this extracted dist setup, then it shouldn't be skipped, ofc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same feedback about the two tests that were using this feature before. See #135961.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, we could remove specific skipped tests, but do you also have an issue with skipping any tests in the dist tests? Because we currently don't run some parts of the test suite, so these are also effectivelly skipped, just without being explicitly enumerated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of the 4 tests that were individually skipped, 3 also did not work for me with x test --stage 2
in an aarch64 dev environment, and the 4th was for a platform I don't have a dev environment on.
Ignoring or skipping tests in the harness is fine if there is some fundamental incompatibility with the harness. Or not running them because nobody has gotten to wiring them into opt-dist yet.
I would have preferred these tests be ignored in the tests themselves via annotations; that would have at least not had me chasing my tail wondering how CI could be passing when the tests fail locally. And it would have made the problem more visible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree with explicit skipping via annotations being the better approach for these kinds of problems, you're right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll write up a patch to change this in a moment, let's see what people say
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point with the annotations. Recently we added a test annotation for only running a given test in dist jobs, the same should be usable also for ignoring a test in a dist job (#135164 - ignore-dist
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've posted a PR that I've linked above which fixes the two tests that aren't related to the aarch64 vs x86_64 allocation failure issue. I'm away from a keyboard for a few days, but it looks like it works, and I would recommend applying that change and ignoring the large const allocs failure on aarch64, pending further investigation.
We should ideally never simultaneously change the CI runner and also enable an optimized build. |
FWIW the description there is a leftover from before we split the PRs, the runner change was done separately here. |
Ah, thank you. In that case ideally PRs should be updated so that their messages reflect their actual content, but I am happy to know things happened in a more bisectable fashion. |
Enable optimised AArch64 dist builds with the opt-dist pipeline.
For the time being, disable bolt on aarch64 due to upstream bolt bugs.
r? @Kobzol
cc @lqd
try-job: dist-aarch64-linux