Skip to content

Truncate IR dump basenames to respect NAME_MAX#1649

Closed
npiesco wants to merge 1 commit intorust-lang:mainfrom
npiesco:truncate-ir-dump-basenames
Closed

Truncate IR dump basenames to respect NAME_MAX#1649
npiesco wants to merge 1 commit intorust-lang:mainfrom
npiesco:truncate-ir-dump-basenames

Conversation

@npiesco
Copy link
Copy Markdown

@npiesco npiesco commented Apr 19, 2026

Resolves the // FIXME work around filename too long errors marker at src/pretty_clif.rs:287 that has been in place since the CLIF-dump infrastructure was added.

Problem

Symbol-mangled Rust names can easily exceed the per-component filename length limit enforced by most filesystems (255 bytes on ext4/XFS/Btrfs/APFS, 143 on HFS+, 255 UTF-16 code units on NTFS). When --emit=llvm-ir is passed, cg_clif dumps one CLIF file (and in `base.rs` one `.vcode` file) per function to <crate>.clif/<symbol>.opt.clif et al. For large dependency graphs — notably core, alloc, hashbrown, compiler-builtins — several functions mangle to 240–300 byte basenames, which trips `ENAMETOOLONG` from `File::create`. The current behaviour is to swallow the error via `early_warn` and silently drop the dump, making `--emit=llvm-ir` unreliable for any non-trivial crate.

Fix

Introduce truncate_ir_basename inside pretty_clif.rs:

  • Names ≤ 200 bytes pass through unchanged (common case: zero overhead, borrowed Cow).
  • Longer names are rewritten to <first 160 bytes of stem>_h<16-hex-FNV-1a-64 of full stem>.<extensions>.
  • The hash is computed over the full original stem, so any two distinct inputs map to distinct outputs with overwhelming probability, even when sharing a long common prefix.
  • Extension suffix chains (.opt.clif, .unopt.clif, .vcode) are preserved verbatim, so downstream tooling keying off the extension continues to work.
  • Truncation is char-boundary-safe (falls back to a shorter prefix if the 160-byte cut lands mid-UTF-8).

The call sits inside write_ir_file, so both write_clif_file and the .vcode writer in base.rs benefit without changes.

Verification

Verified locally against a -Zbuild-std=core,alloc build with --emit=llvm-ir: before the patch, multiple dumps fail silently with ENAMETOOLONG; after the patch, every expected file is written, longest basename is exactly 200 bytes, and the truncated ones carry the _h<hex> marker at the expected offset.

Backward compatibility

  • No effect on any name ≤ 200 bytes — bit-for-bit identical filenames.
  • Changes file names only for inputs that previously failed outright, so there is no regression path for existing consumers.
  • No change to the actual CLIF/vcode file contents.

Symbol-mangled Rust names can easily exceed the per-component filename
length limit enforced by most filesystems (255 bytes on ext4/XFS/Btrfs,
143 on HFS+, 255 UTF-16 code units on NTFS), causing `File::create`
in `write_ir_file` to fail with ENAMETOOLONG when `--emit=llvm-ir`
is passed so cg_clif dumps CLIF/vcode per function. The previous code
carried a `FIXME work around filename too long errors` marker.

This change introduces `truncate_ir_basename`: names `\u{2264}` 200
bytes pass through unchanged; longer names are rewritten to
`<first 160 bytes of stem>_h<16-hex-FNV-1a-64 of full stem>.<exts>`.
The hash is computed over the original stem, so the transformation is
deterministic and collision-resistant (any two distinct inputs map to
distinct outputs with overwhelming probability). Extension suffix
chains such as `.opt.clif`, `.unopt.clif`, and `.vcode` are
preserved, so downstream tooling keying off the extension is unaffected.

The function is invoked inside `write_ir_file` so that every caller
(`write_clif_file` for CLIF dumps and the vcode dump in `base.rs`)
benefits uniformly, and the FIXME at the top of `write_clif_file` is
removed.
@bjorn3
Copy link
Copy Markdown
Member

bjorn3 commented Apr 23, 2026

This is not quite the right approach to getting rid of the fixme. Also that PR description is way too verbose. It also isn't that important of a fixme. So far I haven't had a case where the function I wanted to get the IR for wasn't written due to the filename length limit.

@bjorn3 bjorn3 closed this Apr 23, 2026
@npiesco
Copy link
Copy Markdown
Author

npiesco commented Apr 23, 2026

Re-opening as #NEW with a trimmed description.

Responding to the substance: filesystem NAME_MAX is deterministic and portable, and --emit=llvm-ir with cg_clif is definitely reachable (e.g. compiling large generic-heavy crates like rustc_*, icu_*, regex-automata, hashbrown, or anything pulling deep monomorphizations through compiler-builtins) — I'm hitting it routinely in a downstream build that runs cg_clif across ~100 deps. The current early_warn path silently drops the dump the user explicitly asked for, which is a quiet correctness problem, not a cosmetic one.

If there's a preferred shape for the fix (e.g. hash-only basename, shorter prefix, different split point), I'm happy to take direction — but leaving the FIXME in place means --emit=llvm-ir stays unreliable for any non-trivial crate. Happy to rework on feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants