Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use WASM function names in compiled objects #8627

Merged
merged 7 commits into from
May 16, 2024

Conversation

lpereira
Copy link
Contributor

Instead of generating symbol names in the format
"wasm[$MODULE_ID]::function[$FUNCTION_INDEX]", generate (if possible) something more readable, such as "wasm[$MODULE_ID]::$FUNCTION_NAME". This helps when debugging or profiling the generated code.

Instead of generating symbol names in the format
"wasm[$MODULE_ID]::function[$FUNCTION_INDEX]", generate (if possible)
something more readable, such as "wasm[$MODULE_ID]::$FUNCTION_NAME".
This helps when debugging or profiling the generated code.

Co-authored-by: Jamey Sharp <[email protected]>
@lpereira lpereira requested a review from a team as a code owner May 15, 2024 17:16
@lpereira lpereira requested review from alexcrichton and removed request for a team May 15, 2024 17:16
@bjorn3
Copy link
Contributor

bjorn3 commented May 15, 2024

Is it guaranteed that the names section only contains unique names?

@alexcrichton
Copy link
Member

Thanks! This makes sense to me and I've often thought this would be nice as well. A hesitation I've had in the past though is that I'm a bit wary about exposing user-provided data to native tools like perf and objdump and such. The name section has completely arbitrary strings and I while I can't say for certainty that native tooling behaves well in the face of arbitrary strings I suspect that there might be a security issue waiting to happen along the lines of "if you run ldd over a binary it can run arbitrary code" or something like that.

(a good example being @bjorn3's suggestion where the name section has no guarantees of uniqueness and I'm not sure what tooling would do with same-named-symbols)

@cfallin
Copy link
Member

cfallin commented May 15, 2024

I wonder if we could define suitable "cleanup" for names: (i) rewrite duplicates with numeric suffixes, (ii) replace or remove any characters not in the usual set ([A-Za-z0-9-_] -- maybe spaces too? can tools handle those?), (iii) cap the length at something reasonable like 32 or 64 characters?

@github-actions github-actions bot added the wasmtime:api Related to the API of the `wasmtime` crate itself label May 15, 2024
@lpereira
Copy link
Contributor Author

Could also add the function_index for the cases where uniqueness isn't guaranteed. Alright, I'll cook up a patch with the suggestions.

@alexcrichton
Copy link
Member

Yeah I think that'd work well (albeit bit a bit more involved in the implementation). While I can't say for certainty that spaces are not supported my guess is that tools would likely eventually choke on non-C-looking symbols so I'd be tempted to leave them out.

@fitzgen
Copy link
Member

fitzgen commented May 15, 2024

How about

  • wasm[N]::func[M] if no name section entry
  • wasm[N]::func[M]: <name> where <name> is sanitized and has a capped length if there is an entry

?

This avoids needing to either add numbers into the sanitized name and then worrying about removing those numbers when capping the length, or needing to account for the length of the numbers that might be added after sanitizing and capping the name.

@lpereira
Copy link
Contributor Author

lpereira commented May 15, 2024

Alright, pushed some changes that:

  • Will filter symbols to only include alphanumeric characters and others that might be generated by name mangling and whatnot, and replace runs of them with a single '?'
  • Always include the function index in the symbol name
  • Truncate the symbol name to 96 characters

I think this should be sufficient. Can anyone please take a look again?

@lpereira lpereira force-pushed the wasm-func-names-in-objs branch 2 times, most recently from dc98f8f to 6cbaade Compare May 15, 2024 19:07
@lpereira lpereira force-pushed the wasm-func-names-in-objs branch 4 times, most recently from 07aaafc to 19b22f1 Compare May 15, 2024 20:30
.trim_matches('?')
.trim_end_matches('?')
.to_string();
name.truncate(Self::MAX_SYMBOL_LEN);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.take(N) in the iterator chain should allow the truncation to happen inline (and stop iteration early if so, since the whole chain is lazy)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, no, I'm not sure how to do this; I want to truncate the string after trim_matches('?') removed repeated occurrences of '?' in the cleaned string, not before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mention that you intended to turn any run of multiple ? into a single ?. Note that trim_matches doesn't do that. Instead, it deletes all copies of the given pattern from both ends of the string. That's why it can return a borrowed &str instead of a heap-allocated String: it doesn't change the middle of the string, so it can just adjust the start pointer and the length.

I think this should do more or less what you want (though I'm typing it in GitHub's web editor without testing it so who knows). It deletes all ? at the beginning of the string, then compresses any run of multiple ? into a single ?. It can leave a single ? at the end of the string but I couldn't think of a simple way to fix that and I don't think it matters. Also it uses take like Chris suggested.

            let mut last_char = '?';
            let name = name
                .chars()
                .map(|c| if bad_char(c) { '?' } else { c })
                .filter(|c| { let keep = last_char != c; last_char = c; keep })
                .take(Self::MAX_SYMBOL_LEN)
                .collect::<String>();

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me, thanks for the iterations!

Filter symbol names to include only characters that are usually used
for function names, and that might be produced by name mangling.
Replace everything else with a question mark (and all repeated question
marks by a single one), and then truncate to a length of 96 characters.

This should be enough to not only avoid passing user-controlled strings
to tools such as "perf" and "objdump", and make it easier to
disambiguate symbols that might have the same name but different
indices.
@cfallin cfallin enabled auto-merge May 15, 2024 20:59
@cfallin cfallin disabled auto-merge May 15, 2024 21:00
@cfallin
Copy link
Member

cfallin commented May 15, 2024

Ah, force-pushed after my review; could you do subsequent updates as additional commits? Let me know when stable and I can re-review.

@lpereira
Copy link
Contributor Author

It's stable now. (I'm not using the take() method for the reason I stated above, so I reverted to the previous version.)

Copy link
Contributor

@jameysharp jameysharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be such a big usability improvement for people using tools like perf! I'm excited.

crates/wasmtime/src/compile.rs Show resolved Hide resolved
.trim_matches('?')
.trim_end_matches('?')
.to_string();
name.truncate(Self::MAX_SYMBOL_LEN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mention that you intended to turn any run of multiple ? into a single ?. Note that trim_matches doesn't do that. Instead, it deletes all copies of the given pattern from both ends of the string. That's why it can return a borrowed &str instead of a heap-allocated String: it doesn't change the middle of the string, so it can just adjust the start pointer and the length.

I think this should do more or less what you want (though I'm typing it in GitHub's web editor without testing it so who knows). It deletes all ? at the beginning of the string, then compresses any run of multiple ? into a single ?. It can leave a single ? at the end of the string but I couldn't think of a simple way to fix that and I don't think it matters. Also it uses take like Chris suggested.

            let mut last_char = '?';
            let name = name
                .chars()
                .map(|c| if bad_char(c) { '?' } else { c })
                .filter(|c| { let keep = last_char != c; last_char = c; keep })
                .take(Self::MAX_SYMBOL_LEN)
                .collect::<String>();

crates/wasmtime/src/compile.rs Outdated Show resolved Hide resolved
@jameysharp
Copy link
Contributor

Ah, the CI test failures are because tests/disas.rs is specifically looking for symbol names which contain both wasm and function in their names, and you've shortened the latter to func. This is near the beginning of disas_elf.

I don't like that this test is relying on this implementation detail of Wasmtime, but I suppose it's fine. So you can either change this PR to continue to use function, or change disas.rs to search for func.

Either way, you'll also need to run WASMTIME_TEST_BLESS=1 cargo test --test disas to update the tests to reflect this change. Some of them do have function names, and those now show up in the expected test output, which is a fantastic bonus.

@lpereira
Copy link
Contributor Author

Alright, thanks for the review! This should be the last round.

(@jameysharp, that filter wouldn't work -- words like "keep", which are fine, would be transformed to "kep" -- but I tweaked it to consider that case.)

@jameysharp
Copy link
Contributor

(@jameysharp, that filter wouldn't work -- words like "keep", which are fine, would be transformed to "kep" -- but I tweaked it to consider that case.)

Oh, you're absolutely right. 😅

Looks good to me, let's ship it!

@jameysharp jameysharp added this pull request to the merge queue May 16, 2024
Merged via the queue into bytecodealliance:main with commit cd8bd0d May 16, 2024
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wasmtime:api Related to the API of the `wasmtime` crate itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants