Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Enable opt-dist for dist-aarch64-linux builds #133807

Merged
merged 1 commit into from
Jan 15, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion src/ci/docker/host-aarch64/dist-aarch64-linux/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -91,9 +91,12 @@ ENV RUST_CONFIGURE_ARGS \
--set rust.debug-assertions=false \
--set rust.jemalloc \
--set rust.use-lld=true \
--set rust.lto=thin \
--set rust.codegen-units=1

ENV SCRIPT python3 ../x.py dist --host $HOSTS --target $HOSTS
ENV SCRIPT python3 ../x.py build --set rust.debug=true opt-dist && \
./build/$HOSTS/stage0-tools-bin/opt-dist linux-ci -- python3 ../x.py dist \
--host $HOSTS --target $HOSTS --include-default-paths build-manifest bootstrap
Comment on lines +97 to +99
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way this is a completely new dockerfile, so do you mean just replace this with a simple ./x dist call and then wrap it with opt-dist separately? Just in separate commits or separate PRs altogether?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant a separate PR, so that we can land these two changes (move to aarch64 host first, and then enable optimizations) separately :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, just wanted to make sure - no problem :)


ENV CARGO_TARGET_AARCH64_UNKNOWN_LINUX_GNU_LINKER=clang
ENV LIBCURL_NO_PKG_CONFIG 1
Expand Down
13 changes: 11 additions & 2 deletions src/tools/opt-dist/src/bolt.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
use anyhow::Context;
use camino::{Utf8Path, Utf8PathBuf};

use crate::environment::Environment;
use crate::exec::cmd;
use crate::training::BoltProfile;
use crate::utils::io::copy_file;
Expand Down Expand Up @@ -45,13 +46,21 @@ pub fn with_bolt_instrumented<F: FnOnce(&Utf8Path) -> anyhow::Result<R>, R>(
}

/// Optimizes the file at `path` with BOLT in-place using the given `profile`.
pub fn bolt_optimize(path: &Utf8Path, profile: &BoltProfile) -> anyhow::Result<()> {
pub fn bolt_optimize(
path: &Utf8Path,
profile: &BoltProfile,
env: &Environment,
) -> anyhow::Result<()> {
// Copy the artifact to a new location, so that we do not use the same input and output file.
// BOLT cannot handle optimizing when the input and output is the same file, because it performs
// in-place patching.
let temp_path = tempfile::NamedTempFile::new()?.into_temp_path();
copy_file(path, &temp_path)?;

// FIXME: cdsplit in llvm-bolt is currently broken on AArch64, drop this once it's fixed upstream
let split_strategy =
if env.host_tuple().starts_with("aarch64") { "profile2" } else { "cdsplit" };

cmd(&["llvm-bolt"])
.arg(temp_path.display())
.arg("-data")
Expand All @@ -65,7 +74,7 @@ pub fn bolt_optimize(path: &Utf8Path, profile: &BoltProfile) -> anyhow::Result<(
// Split function code into hot and code regions
.arg("-split-functions")
// Split using best available strategy (three-way splitting, Cache-Directed Sort)
.arg("-split-strategy=cdsplit")
.arg(format!("-split-strategy={split_strategy}"))
// Split as many basic blocks as possible
.arg("-split-all-cold")
// Move jump tables to a separate section
Expand Down
28 changes: 21 additions & 7 deletions src/tools/opt-dist/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,21 @@ fn create_environment(args: Args) -> anyhow::Result<(Environment, Vec<String>)>
let target_triple =
std::env::var("PGO_HOST").expect("PGO_HOST environment variable missing");

let is_aarch64 = target_triple.starts_with("aarch64");

let mut skip_tests = vec![
// Fails because of linker errors, as of June 2023.
"tests/ui/process/nofile-limit.rs".to_string(),
];

if is_aarch64 {
skip_tests.extend([
// Those tests fail only inside of Docker on aarch64, as of December 2024
"tests/ui/consts/promoted_running_out_of_memory_issue-130687.rs".to_string(),
"tests/ui/consts/large_const_alloc.rs".to_string(),
]);
}

Comment on lines +149 to +163
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, I beg of you, do not do this. This hack is causing CI to pass and local developer workflows to fail, and it is hiding this regression #135952.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To explain, we already skip some tests in a similar manner on dist x64 (not just the explicitly skipped ones, but also whole test suites, e.g. run-make).

Some tests fail on dist, but work locally, these are candidates for being skipped on CI (after all, running the test suite on an extracted dist atchive is already a hack).

But if the test fails also outside of this extracted dist setup, then it shouldn't be skipped, ofc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same feedback about the two tests that were using this feature before. See #135961.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, we could remove specific skipped tests, but do you also have an issue with skipping any tests in the dist tests? Because we currently don't run some parts of the test suite, so these are also effectivelly skipped, just without being explicitly enumerated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of the 4 tests that were individually skipped, 3 also did not work for me with x test --stage 2 in an aarch64 dev environment, and the 4th was for a platform I don't have a dev environment on.

Ignoring or skipping tests in the harness is fine if there is some fundamental incompatibility with the harness. Or not running them because nobody has gotten to wiring them into opt-dist yet.

I would have preferred these tests be ignored in the tests themselves via annotations; that would have at least not had me chasing my tail wondering how CI could be passing when the tests fail locally. And it would have made the problem more visible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree with explicit skipping via annotations being the better approach for these kinds of problems, you're right.

Copy link
Contributor Author

@mrkajetanp mrkajetanp Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll write up a patch to change this in a moment, let's see what people say

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point with the annotations. Recently we added a test annotation for only running a given test in dist jobs, the same should be usable also for ignoring a test in a dist job (#135164 - ignore-dist).

Copy link
Member

@saethlin saethlin Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've posted a PR that I've linked above which fixes the two tests that aren't related to the aarch64 vs x86_64 allocation failure issue. I'm away from a keyboard for a few days, but it looks like it works, and I would recommend applying that change and ignoring the large const allocs failure on aarch64, pending further investigation.

let checkout_dir = Utf8PathBuf::from("/checkout");
let env = EnvironmentBuilder::default()
.host_tuple(target_triple)
Expand All @@ -155,11 +170,9 @@ fn create_environment(args: Args) -> anyhow::Result<(Environment, Vec<String>)>
.artifact_dir(Utf8PathBuf::from("/tmp/tmp-multistage/opt-artifacts"))
.build_dir(checkout_dir.join("obj"))
.shared_llvm(true)
.use_bolt(true)
.skipped_tests(vec![
// Fails because of linker errors, as of June 2023.
"tests/ui/process/nofile-limit.rs".to_string(),
])
// FIXME: Enable bolt for aarch64 once it's fixed upstream. Broken as of December 2024.
.use_bolt(!is_aarch64)
.skipped_tests(skip_tests)
.build()?;

(env, shared.build_args)
Expand Down Expand Up @@ -304,7 +317,8 @@ fn execute_pipeline(
// the final dist build. However, when BOLT optimizes an artifact, it does so *in-place*,
// therefore it will actually optimize all the hard links, which means that the final
// packaged `libLLVM.so` file *will* be BOLT optimized.
bolt_optimize(&llvm_lib, &llvm_profile).context("Could not optimize LLVM with BOLT")?;
bolt_optimize(&llvm_lib, &llvm_profile, env)
.context("Could not optimize LLVM with BOLT")?;

let rustc_lib = io::find_file_in_dir(&libdir, "librustc_driver", ".so")?;

Expand All @@ -319,7 +333,7 @@ fn execute_pipeline(
print_free_disk_space()?;

// Now optimize the library with BOLT.
bolt_optimize(&rustc_lib, &rustc_profile)
bolt_optimize(&rustc_lib, &rustc_profile, env)
.context("Could not optimize rustc with BOLT")?;

// LLVM is not being cleared here, we want to use the BOLT-optimized LLVM
Expand Down
Loading