Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt-level=0 does not do constant folding #136366

Open
allanwmacdonald opened this issue Jan 31, 2025 · 7 comments
Open

opt-level=0 does not do constant folding #136366

allanwmacdonald opened this issue Jan 31, 2025 · 7 comments
Labels
A-codegen Area: Code generation C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@allanwmacdonald
Copy link

Consider this code:

  let _blah = 1 << (5 + 2);

When inspecting the assembly (from gdb), I expected to see this happen:

   0x000055555555b740 <+0>:	movl   $0x80,-0x4(%rsp)

4	}
=> 0x000055555555b748 <+8>:	ret    

Instead, this happened:

3	  let _blah = 1 << (5 + 2);
=> 0x000055555555b7c1 <+1>:	mov    $0x5,%eax
   0x000055555555b7c6 <+6>:	add    $0x2,%eax
   0x000055555555b7c9 <+9>:	mov    %eax,(%rsp)
   0x000055555555b7cc <+12>:	seto   %al
   0x000055555555b7cf <+15>:	jo     0x55555555b7db <_ZN9debugrust4main17h1db24d82a646b36bE+27>
   0x000055555555b7d1 <+17>:	mov    (%rsp),%eax
   0x000055555555b7d4 <+20>:	cmp    $0x20,%eax
   0x000055555555b7d7 <+23>:	jb     0x55555555b7eb <_ZN9debugrust4main17h1db24d82a646b36bE+43>
   0x000055555555b7d9 <+25>:	jmp    0x55555555b7fe <_ZN9debugrust4main17h1db24d82a646b36bE+62>
   0x000055555555b7db <+27>:	lea    0x4c87e(%rip),%rdi        # 0x5555555a8060
   0x000055555555b7e2 <+34>:	lea    -0x2b9(%rip),%rax        # 0x55555555b530 <_ZN4core9panicking11panic_const24panic_const_add_overflow17h5a5f82b06563d133E>
   0x000055555555b7e9 <+41>:	call   *%rax
   0x000055555555b7eb <+43>:	mov    (%rsp),%ecx
--Type <RET> for more, q to quit, c to continue without paging--c
   0x000055555555b7ee <+46>:	and    $0x1f,%ecx
   0x000055555555b7f1 <+49>:	mov    $0x1,%eax
   0x000055555555b7f6 <+54>:	shl    %cl,%eax
   0x000055555555b7f8 <+56>:	mov    %eax,0x4(%rsp)
   0x000055555555b7fe <+62>:	lea    0x4c873(%rip),%rdi        # 0x5555555a8078
   0x000055555555b805 <+69>:	lea    -0x29c(%rip),%rax        # 0x55555555b570 <_ZN4core9panicking11panic_const24panic_const_shl_overflow17hfdbee9592709de18E>
   0x000055555555b80c <+76>:	call   *%rax

This is the compiler actually generating code that computes the value of 1 << (5 + 2) at run time, along with a whole bunch of stuff that appears to relate to the safety of performing such an operation on variable data at run time.

This is unexpected. The entire expression on the right side of the assignment statement, 1 << (5 + 2), involves no variables. This can be safely resolved at compile time.

Strangely, the following code,

  let _blah = 1 << 7;

actually produces the expected result. This isn't consistent behaviour. Why is this expression, 1 << 7, resolved at compile time while 1 << (5 + 2) is not?

If I set the optimization level to 1 in the Cargo.toml as follows:

[profile.dev]
opt-level = 1

The expected code is produced with either expression (as long as the variable is used later, of course). However, I do not see how this is thought of as an optimization; optimization is supposed to be applied to executed code that involves actual data in variables or references. Since both 1 << (5 + 2) and 1 << 7 are literal expressions that resolve to the same identical value, and they do not involve data, they do not need to be "optimized". Each expression ought to be treated as a single entity and treated the same.

Rationale:
I suppose that this will be viewed as trivial in the context of a PC or server environment. However, in an embedded environment, the additional unnecessary code may result in consumption of valuable resources, if optimization must be disabled in order to perform debugging activites on an embedded system. This might even cause the unoptimized code to not compile at all on a sytem with tight memory constraints. Therefore, it is desirable, and in my opinion, a show stopper, that evaluation of literal expressions ought to be resolved at compile time regardless of optimization level.

Meta

rustc --version --verbose:

rustc 1.84.0 (9fc6b4312 2025-01-07)
binary: rustc
commit-hash: 9fc6b43126469e3858e2fe86cafb4f0fd5068869
commit-date: 2025-01-07
host: x86_64-unknown-linux-gnu
release: 1.84.0
LLVM version: 19.1.5

Backtrace

    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.01s

@allanwmacdonald allanwmacdonald added the C-bug Category: This is a bug. label Jan 31, 2025
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jan 31, 2025
@allanwmacdonald
Copy link
Author

Additional info: A google search of the term "constant folding" results in several articles that describe the process that calculates the literal expression and replaces the expression with a single value, the behavour I am advocating here. Some of these articles even state that this is a "compiler optimization". However, gcc, by default, applies this process even when the optimization level is set to none; therefore, it seems to me that the gcc developers considered this to be a desirable feature that doesn't need an option to invoke. gcc does have a means to inhibit constant folding for floating point numbers, in particular, with the option -frounding-math.

@workingjubilee
Copy link
Member

The semantics of these values is that they are indeed potentially evaluated at runtime.

Consider instead https://rust.godbolt.org/z/637djG7cd which uses the following code:

#[inline(never)]
pub fn whatever() -> i32 {
    let _blah = const { 1 << (5 + 2) };
    _blah
}

The use of the const {} forces the value to be evaluated at compile time.

I am not entirely sure why 1 << (5+2) is not folded but 1 << 7 is, the difference in the MIR is not so great, but the LLVMIR we emit has already had 1 << 7 computed in it.

@ShE3py
Copy link
Contributor

ShE3py commented Jan 31, 2025

The expression is correctly folded with overflow checks disabled;
https://rust.godbolt.org/z/6c1x3sae4

Looks like overflow checks require opt-level ⩾ 1 to be removed, so it prevents constant folding?
E.g. 1 + 2 isn't folded as 3, and n << m is folded as it can't overflow.

Seems like a regression from stable to stable;
https://rust.godbolt.org/z/cj16n7hEz

@rustbot label +A-codegen +C-optimization +I-heavy +T-compiler

@rustbot rustbot added A-codegen Area: Code generation C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 31, 2025
@workingjubilee
Copy link
Member

Oh, thank you for identifying this is a regression! I was wondering.

@workingjubilee workingjubilee added the regression-from-stable-to-stable Performance or correctness regression from one stable version to another. label Jan 31, 2025
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jan 31, 2025
@workingjubilee
Copy link
Member

searched toolchains nightly-2023-03-04 through nightly-2023-05-27


********************************************************************************
Regression in nightly-2023-04-16
********************************************************************************

fetching https://static.rust-lang.org/dist/2023-04-15/channel-rust-nightly-git-commit-hash.txt
nightly manifest 2023-04-15: 40 B / 40 B [=============================================================================================================================================================================================================================================================] 100.00 % 2.29 MB/s converted 2023-04-15 to 84dd17b56a931a631a23dfd5ef2018fd3ef49108
fetching https://static.rust-lang.org/dist/2023-04-16/channel-rust-nightly-git-commit-hash.txt
nightly manifest 2023-04-16: 40 B / 40 B [=============================================================================================================================================================================================================================================================] 100.00 % 3.09 MB/s converted 2023-04-16 to 5cdb7886a5ece816864fab177f0c266ad4dd5358
looking for regression commit between 2023-04-15 and 2023-04-16
fetching (via remote github) commits from max(84dd17b56a931a631a23dfd5ef2018fd3ef49108, 2023-04-13) to 5cdb7886a5ece816864fab177f0c266ad4dd5358
ending github query because we found starting sha: 84dd17b56a931a631a23dfd5ef2018fd3ef49108
get_commits_between returning commits, len: 10
  commit[0] 2023-04-14: Auto merge of #110331 - matthiaskrgr:rollup-9vldvow, r=matthiaskrgr
  commit[1] 2023-04-14: Auto merge of #110197 - cjgillot:codegen-discr, r=pnkfelix
  commit[2] 2023-04-15: Auto merge of #110142 - Mark-Simulacrum:reduce-core-counts, r=pietroalbini
  commit[3] 2023-04-15: Auto merge of #109802 - notriddle:notriddle/rustdoc-search-generics-nested, r=GuillaumeGomez
  commit[4] 2023-04-15: Auto merge of #110335 - asomers:rust-gdb-freebsd, r=jyn514
  commit[5] 2023-04-15: Auto merge of #109900 - cjgillot:disable-const-prop, r=oli-obk
  commit[6] 2023-04-15: Auto merge of #110323 - lcnr:dropck-uwu, r=compiler-errors
  commit[7] 2023-04-15: Auto merge of #110349 - rust-lang:pa-bump-1.71.0, r=pietroalbini
  commit[8] 2023-04-15: Auto merge of #110227 - klensy:bs-win, r=Mark-Simulacrum
  commit[9] 2023-04-15: Auto merge of #110361 - ehuss:disable-jobserver-error, r=Mark-Simulacrum

This was probably due to the following PR which changed const prop to happen at higher MIR opt levels: #109900

@workingjubilee workingjubilee removed C-bug Category: This is a bug. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jan 31, 2025
@workingjubilee
Copy link
Member

Hmm. Also noting #106285 which seems related.

cc @cjgillot

@Noratrieb
Copy link
Member

If this sort of regression is a problem for you, you should definitely be using opt-level = 1. This use case is not a consideration for opt level 0 (and I don't think it should be).
opt level 0 is mostly guided by compile times. If this optimization benefits compile times we could do it, if it doesn't we won't.
The PR that regressed this had a positive compile time impact, so I will classify this as intentional.

@Noratrieb Noratrieb removed regression-from-stable-to-stable Performance or correctness regression from one stable version to another. I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Feb 1, 2025
@Noratrieb Noratrieb changed the title Literal Expressions are Not Resolved at Compile Time opt-level=0 does not do constant folding Feb 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants