Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Enable opt-dist for dist-aarch64-linux builds #133807

Merged
merged 1 commit into from
Jan 15, 2025

Conversation

mrkajetanp
Copy link
Contributor

@mrkajetanp mrkajetanp commented Dec 3, 2024

Enable optimised AArch64 dist builds with the opt-dist pipeline.

For the time being, disable bolt on aarch64 due to upstream bolt bugs.

r? @Kobzol
cc @lqd

try-job: dist-aarch64-linux

@rustbot rustbot added A-testsuite Area: The testsuite used to check the correctness of rustc S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. labels Dec 3, 2024
@rustbot
Copy link
Collaborator

rustbot commented Dec 3, 2024

Some changes occurred in src/tools/opt-dist

cc @Kobzol

@Kobzol
Copy link
Contributor

Kobzol commented Dec 3, 2024

Hi! Could you please split the part that moves the job to the aarch64 runner and the PGO/LTO part? So that we can evaluate the CI cost of these two actions separately. Thanks!

Comment on lines +48 to +99
ENV SCRIPT python3 ../x.py build --set rust.debug=true opt-dist && \
./build/$HOSTS/stage0-tools-bin/opt-dist linux-ci -- python3 ../x.py dist \
--host $HOSTS --target $HOSTS --include-default-paths build-manifest bootstrap
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way this is a completely new dockerfile, so do you mean just replace this with a simple ./x dist call and then wrap it with opt-dist separately? Just in separate commits or separate PRs altogether?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant a separate PR, so that we can land these two changes (move to aarch64 host first, and then enable optimizations) separately :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, just wanted to make sure - no problem :)

@lqd
Copy link
Member

lqd commented Dec 3, 2024

What improvements are you seeing with this PR, over the current artifacts?

@mrkajetanp
Copy link
Contributor Author

I've not yet benchmarked the changes, and I'm not sure how they compare to the artifacts from cross-compilation because I was only doing aarch64 runs but specifically adding opt-dist with LTO and PGO seems to increase the binary sizes of the main artifacts as follows:

librustc_driver-8c547d00f2ea16d5.so - 84M -> 94M
libLLVM.so.19.1-rust-1.85.0-nightly - 106M -> 108M
rustc -> 2.7M for both, +100 bytes

@Kobzol
Copy link
Contributor

Kobzol commented Dec 3, 2024

@bors try

Let's also see how long it takes with the optimizations.

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 3, 2024
ci: Enable opt-dist for dist-aarch64-linux builds

Move the CI dist-aarch64-linux job to an aarch64 runner and enable optimised dist builds with the opt-dist pipeline.

For the time being, disable bolt on aarch64 due to upstream bolt bugs.

r? `@Kobzol`
cc `@lqd`
@bors
Copy link
Contributor

bors commented Dec 3, 2024

⌛ Trying commit 02b958d with merge b828608...

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 3, 2024
ci: Enable opt-dist for dist-aarch64-linux builds

Move the CI dist-aarch64-linux job to an aarch64 runner and enable optimised dist builds with the opt-dist pipeline.

For the time being, disable bolt on aarch64 due to upstream bolt bugs.

r? `@Kobzol`
cc `@lqd`
@bors
Copy link
Contributor

bors commented Dec 3, 2024

⌛ Trying commit 02b958d with merge 945b9be...

@lqd
Copy link
Member

lqd commented Dec 3, 2024

That’s not going to be the good try job Jakub :3

@bors
Copy link
Contributor

bors commented Dec 3, 2024

💔 Test failed - checks-actions

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 3, 2024
@Kobzol
Copy link
Contributor

Kobzol commented Dec 3, 2024

Ah, crap. Thanks!

@bors try

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 3, 2024
ci: Enable opt-dist for dist-aarch64-linux builds

Move the CI dist-aarch64-linux job to an aarch64 runner and enable optimised dist builds with the opt-dist pipeline.

For the time being, disable bolt on aarch64 due to upstream bolt bugs.

r? `@Kobzol`
cc `@lqd`

try-job: dist-aarch64-linux
@bors
Copy link
Contributor

bors commented Dec 3, 2024

⌛ Trying commit 02b958d with merge d156b32...

@bors
Copy link
Contributor

bors commented Dec 4, 2024

☀️ Try build successful - checks-actions
Build commit: d156b32 (d156b32c73dd32916f0ca83fb0104e600fbad49d)

@mrkajetanp
Copy link
Contributor Author

So that's an extra hour for LTO+PGO without the cache. 2h22 vs 3h22.

@lqd
Copy link
Member

lqd commented Dec 4, 2024

@bors try

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 4, 2024
ci: Enable opt-dist for dist-aarch64-linux builds

Move the CI dist-aarch64-linux job to an aarch64 runner and enable optimised dist builds with the opt-dist pipeline.

For the time being, disable bolt on aarch64 due to upstream bolt bugs.

r? `@Kobzol`
cc `@lqd`

try-job: dist-aarch64-linux
@bors
Copy link
Contributor

bors commented Dec 4, 2024

⌛ Trying commit 02b958d with merge ef33f24...

@bors
Copy link
Contributor

bors commented Dec 4, 2024

☀️ Try build successful - checks-actions
Build commit: ef33f24 (ef33f249830d94b7afdb529458aae4052f14ca98)

@mrkajetanp
Copy link
Contributor Author

1h54 cached, not so bad. Back to roughly the same time as the x86 cross build then.
Completely outside of the CI costs conversations, these changes not making the overall turnaround time for bors longer than it is already is probably a nice bonus.

@lqd
Copy link
Member

lqd commented Dec 4, 2024

I assume good benchmark results can also help with the cost discussion.

@Kobzol
Copy link
Contributor

Kobzol commented Dec 4, 2024

Indeed! You can download the CI artifacts e.g. using rustup-toolchain-install-master and benchmark it locally using rustc-perf. It would be nice to see the perf. diff. Let me know on Zulip if you want help with that.

@mrkajetanp
Copy link
Contributor Author

@rustbot ready
Just to clean up the label

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 14, 2025
@Kobzol
Copy link
Contributor

Kobzol commented Jan 14, 2025

@bors r+

@bors
Copy link
Contributor

bors commented Jan 14, 2025

📌 Commit 3d54764 has been approved by Kobzol

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 14, 2025
@Kobzol Kobzol added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. relnotes-perf Performance improvements that should be mentioned in the release notes. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jan 14, 2025
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 15, 2025
Rollup of 7 pull requests

Successful merges:

 - rust-lang#132397 (Make missing_abi lint warn-by-default.)
 - rust-lang#133807 (ci: Enable opt-dist for dist-aarch64-linux builds)
 - rust-lang#134143 (Convert `struct FromBytesWithNulError` into enum)
 - rust-lang#134338 (Use a C-safe return type for `__rust_[ui]128_*` overflowing intrinsics)
 - rust-lang#134678 (Update `ReadDir::next` in `std::sys::pal::unix::fs` to use `&raw const (*p).field` instead of `p.byte_offset().cast()`)
 - rust-lang#135424 (Detect unstable lint docs that dont enable their feature)
 - rust-lang#135520 (Make sure we actually use the right trivial lifetime substs when eagerly monomorphizing drop for ADTs)

r? `@ghost`
`@rustbot` modify labels: rollup
@klensy
Copy link
Contributor

klensy commented Jan 15, 2025

r+ but removed S-waiting-on-bors?

@Kobzol Kobzol added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 15, 2025
@Kobzol
Copy link
Contributor

Kobzol commented Jan 15, 2025

Oops, looks like my manual issue modification has raced with bors :) Thanks!

@lqd
Copy link
Member

lqd commented Jan 15, 2025

Maybe we should rollup=never big CI changes like these in the future

@Kobzol
Copy link
Contributor

Kobzol commented Jan 15, 2025

Yes, I just thought that after I noticed that it was already included in a rollup 😆 I guess that usually we run perf. for similar changes, so it's done by default, but since we don't have perf. monitoring for ARM (yet!), it wasn't done here, so I forgot about it, sorry.

Just in case the currently running rollup fails, let's mark it as such.

@bors rollup=never

@bors bors merged commit f9d8b65 into rust-lang:master Jan 15, 2025
7 checks passed
@rustbot rustbot added this to the 1.86.0 milestone Jan 15, 2025
@bors
Copy link
Contributor

bors commented Jan 15, 2025

⌛ Testing commit 3d54764 with merge 2776bdf...

rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Jan 15, 2025
Rollup merge of rust-lang#133807 - mrkajetanp:ci-aarch64-opt-dist, r=Kobzol

ci: Enable opt-dist for dist-aarch64-linux builds

Move the CI dist-aarch64-linux job to an aarch64 runner and enable optimised dist builds with the opt-dist pipeline.

For the time being, disable bolt on aarch64 due to upstream bolt bugs.

r? `@Kobzol`
cc `@lqd`

try-job: dist-aarch64-linux
@Kobzol
Copy link
Contributor

Kobzol commented Jan 15, 2025

^ is hopefully just a visual bug.

Comment on lines +149 to +163
let is_aarch64 = target_triple.starts_with("aarch64");

let mut skip_tests = vec![
// Fails because of linker errors, as of June 2023.
"tests/ui/process/nofile-limit.rs".to_string(),
];

if is_aarch64 {
skip_tests.extend([
// Those tests fail only inside of Docker on aarch64, as of December 2024
"tests/ui/consts/promoted_running_out_of_memory_issue-130687.rs".to_string(),
"tests/ui/consts/large_const_alloc.rs".to_string(),
]);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, I beg of you, do not do this. This hack is causing CI to pass and local developer workflows to fail, and it is hiding this regression #135952.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To explain, we already skip some tests in a similar manner on dist x64 (not just the explicitly skipped ones, but also whole test suites, e.g. run-make).

Some tests fail on dist, but work locally, these are candidates for being skipped on CI (after all, running the test suite on an extracted dist atchive is already a hack).

But if the test fails also outside of this extracted dist setup, then it shouldn't be skipped, ofc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same feedback about the two tests that were using this feature before. See #135961.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, we could remove specific skipped tests, but do you also have an issue with skipping any tests in the dist tests? Because we currently don't run some parts of the test suite, so these are also effectivelly skipped, just without being explicitly enumerated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of the 4 tests that were individually skipped, 3 also did not work for me with x test --stage 2 in an aarch64 dev environment, and the 4th was for a platform I don't have a dev environment on.

Ignoring or skipping tests in the harness is fine if there is some fundamental incompatibility with the harness. Or not running them because nobody has gotten to wiring them into opt-dist yet.

I would have preferred these tests be ignored in the tests themselves via annotations; that would have at least not had me chasing my tail wondering how CI could be passing when the tests fail locally. And it would have made the problem more visible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree with explicit skipping via annotations being the better approach for these kinds of problems, you're right.

Copy link
Contributor Author

@mrkajetanp mrkajetanp Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll write up a patch to change this in a moment, let's see what people say

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point with the annotations. Recently we added a test annotation for only running a given test in dist jobs, the same should be usable also for ignoring a test in a dist job (#135164 - ignore-dist).

Copy link
Member

@saethlin saethlin Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've posted a PR that I've linked above which fixes the two tests that aren't related to the aarch64 vs x86_64 allocation failure issue. I'm away from a keyboard for a few days, but it looks like it works, and I would recommend applying that change and ignoring the large const allocs failure on aarch64, pending further investigation.

@workingjubilee
Copy link
Member

Move the CI dist-aarch64-linux job to an aarch64 runner and enable optimised dist builds with the opt-dist pipeline.

We should ideally never simultaneously change the CI runner and also enable an optimized build.

@mrkajetanp
Copy link
Contributor Author

We should ideally never simultaneously change the CI runner and also enable an optimized build.

FWIW the description there is a leftover from before we split the PRs, the runner change was done separately here.

@workingjubilee
Copy link
Member

Ah, thank you. In that case ideally PRs should be updated so that their messages reflect their actual content, but I am happy to know things happened in a more bisectable fashion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testsuite Area: The testsuite used to check the correctness of rustc relnotes-perf Performance improvements that should be mentioned in the release notes. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants