Truncate IR dump basenames to respect NAME_MAX#1649
Truncate IR dump basenames to respect NAME_MAX#1649npiesco wants to merge 1 commit intorust-lang:mainfrom
Conversation
Symbol-mangled Rust names can easily exceed the per-component filename
length limit enforced by most filesystems (255 bytes on ext4/XFS/Btrfs,
143 on HFS+, 255 UTF-16 code units on NTFS), causing `File::create`
in `write_ir_file` to fail with ENAMETOOLONG when `--emit=llvm-ir`
is passed so cg_clif dumps CLIF/vcode per function. The previous code
carried a `FIXME work around filename too long errors` marker.
This change introduces `truncate_ir_basename`: names `\u{2264}` 200
bytes pass through unchanged; longer names are rewritten to
`<first 160 bytes of stem>_h<16-hex-FNV-1a-64 of full stem>.<exts>`.
The hash is computed over the original stem, so the transformation is
deterministic and collision-resistant (any two distinct inputs map to
distinct outputs with overwhelming probability). Extension suffix
chains such as `.opt.clif`, `.unopt.clif`, and `.vcode` are
preserved, so downstream tooling keying off the extension is unaffected.
The function is invoked inside `write_ir_file` so that every caller
(`write_clif_file` for CLIF dumps and the vcode dump in `base.rs`)
benefits uniformly, and the FIXME at the top of `write_clif_file` is
removed.
|
This is not quite the right approach to getting rid of the fixme. Also that PR description is way too verbose. It also isn't that important of a fixme. So far I haven't had a case where the function I wanted to get the IR for wasn't written due to the filename length limit. |
|
Re-opening as #NEW with a trimmed description. Responding to the substance: filesystem If there's a preferred shape for the fix (e.g. hash-only basename, shorter prefix, different split point), I'm happy to take direction — but leaving the |
Resolves the
// FIXME work around filename too long errorsmarker atsrc/pretty_clif.rs:287that has been in place since the CLIF-dump infrastructure was added.Problem
Symbol-mangled Rust names can easily exceed the per-component filename length limit enforced by most filesystems (255 bytes on ext4/XFS/Btrfs/APFS, 143 on HFS+, 255 UTF-16 code units on NTFS). When
--emit=llvm-iris passed, cg_clif dumps one CLIF file (and in `base.rs` one `.vcode` file) per function to<crate>.clif/<symbol>.opt.clifet al. For large dependency graphs — notablycore,alloc,hashbrown,compiler-builtins— several functions mangle to 240–300 byte basenames, which trips `ENAMETOOLONG` from `File::create`. The current behaviour is to swallow the error via `early_warn` and silently drop the dump, making `--emit=llvm-ir` unreliable for any non-trivial crate.Fix
Introduce
truncate_ir_basenameinsidepretty_clif.rs:Cow).<first 160 bytes of stem>_h<16-hex-FNV-1a-64 of full stem>.<extensions>..opt.clif,.unopt.clif,.vcode) are preserved verbatim, so downstream tooling keying off the extension continues to work.The call sits inside
write_ir_file, so bothwrite_clif_fileand the.vcodewriter inbase.rsbenefit without changes.Verification
Verified locally against a
-Zbuild-std=core,allocbuild with--emit=llvm-ir: before the patch, multiple dumps fail silently withENAMETOOLONG; after the patch, every expected file is written, longest basename is exactly 200 bytes, and the truncated ones carry the_h<hex>marker at the expected offset.Backward compatibility