Skip to content

Implement GC basics #2607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Implement GC basics #2607

wants to merge 4 commits into from

Conversation

zherczeg
Copy link
Contributor

Parsing / reading / writing of core structures (rec/sub) is mostly completed, validation is far from over

@zherczeg zherczeg force-pushed the gc_core branch 2 times, most recently from 243ec44 to 68fe37e Compare May 23, 2025 15:34
struct TypeMut {
Type type;
bool mutable_;
};
using TypeMutVector = std::vector<TypeMut>;

// Garbage Collector specific type information
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a core part of the patch. It contains the (sub ...) part of the type. It is declared as a structure, because it is not mandatory.

virtual Result OnFuncType(Index index,
GCTypeExtension* gc_ext,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This structure is passed as a second argument, because it is a header, but it could go to the last argument since it is an extra (and optional) information. Which one you prefer?

struct RecursiveRange {
Index start_index;
Index type_count;
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another core structure to encode (rec ...) constructs. It represents the range.

std::vector<FuncType> func_types;
std::vector<StructType> struct_types;
std::vector<ArrayType> array_types;
std::vector<RecursiveRange> recursive_ranges;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is stored in an ordered array. It cannot be stored as part of the types, because zero length (rec) range is allowed for whatever reason.

@zherczeg zherczeg force-pushed the gc_core branch 9 times, most recently from e0ce7f8 to adcbdf7 Compare May 31, 2025 02:14
@zherczeg
Copy link
Contributor Author

zherczeg commented Jun 3, 2025

I have reworked the type validation system of the patch. Now it is capable of detecting the first type index for all equal types. This first type index is called canonical index. If I have two types (t1/t2), and their canonical index is computed, then type comparison is t1.canonical_index == t2.canonical_index. Sub type indices can also be turned to canonical sub indices. This is not only useful for validation, but also very important for high speed execution, since it simplifies type comparison a lot. To compute these canonical indices, a hash code is computed for each type. When two types have different hash codes, they are never equal. My hash computation algorithm might not be good, I don't have much experience with these algorithms.

@zherczeg zherczeg force-pushed the gc_core branch 2 times, most recently from 75bdfe5 to 890a316 Compare June 3, 2025 13:06
@zherczeg
Copy link
Contributor Author

zherczeg commented Jun 3, 2025

The type-* gc tests are running except the runtime part of type-subtyping.wast
I think the typing system in the validator and interpreter are ok now. This is another huge change with 1500 lines of new code.

@rossberg
Copy link
Member

rossberg commented Jun 3, 2025

It sounds like you are canonicalising wrt type indices. But type indices are meaningless in any program that consists of more than one module. Type canonicalisation must happen globally, across module boundaries, based on the types' structure. I suspect that is the reason for the link/run-time tests failing.

@zherczeg
Copy link
Contributor Author

zherczeg commented Jun 3, 2025

The link tests are not failing, although the interpreter do a slow type comparison for import/exports. As far as I understand the interpreter here is just a demonstration, so this is probably ok. The runtime tests fail because the operations (such as ref.cast) is not implemented. I will do that in a follow-up patch.

The global type canonicalisation sounds like a very good idea! A high performance engine should do that!

@zherczeg zherczeg force-pushed the gc_core branch 3 times, most recently from cc8e21f to 097046d Compare June 4, 2025 12:09
@zherczeg
Copy link
Contributor Author

zherczeg commented Jun 5, 2025

@sbc100 there is a fuzzer issue in the code. The code is correct though.

https://github.com/WebAssembly/wabt/blob/main/src/interp/binary-reader-interp.cc#L772
As for the fuzzer generated test, it wants to allocate 16190847 entries, which is pretty large for a 38 byte input, but not an invalid value in general.

https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/fuzzer/FuzzerLoop.cpp
LLVM considers this as a large value, and reports it as an error. There is an -rss_limit_mb to modify this limit.

What shall I do?

@sbc100
Copy link
Member

sbc100 commented Jun 5, 2025

@sbc100 there is a fuzzer issue in the code. The code is correct though.

https://github.com/WebAssembly/wabt/blob/main/src/interp/binary-reader-interp.cc#L772 As for the fuzzer generated test, it wants to allocate 16190847 entries, which is pretty large for a 38 byte input, but not an invalid value in general.

https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/fuzzer/FuzzerLoop.cpp LLVM considers this as a large value, and reports it as an error. There is an -rss_limit_mb to modify this limit.

What shall I do?

We don't tend to have time to worry about fixing all the fuzz tests issues, unless they could conceivable show up in real world programs. i.e. we tend to assume trusted and save inputs, since we don't have the resources the harden wabt against other things.

Having said that we obviously would be happy to accept fixes for such issues if folks come up with them.

@zherczeg
Copy link
Contributor Author

zherczeg commented Jun 5, 2025

There is nothing to fix here, the code is correct (and not related to this patch). It is simply a limitation of the fuzzer, it assumes too much memory allocation is likely a bug.

@zherczeg zherczeg marked this pull request as ready for review June 11, 2025 10:11
@zherczeg zherczeg force-pushed the gc_core branch 3 times, most recently from 68c749c to b7fc45c Compare June 20, 2025 08:11
@zherczeg zherczeg force-pushed the gc_core branch 8 times, most recently from 0af19a5 to ffdf723 Compare June 26, 2025 09:16
@zherczeg
Copy link
Contributor Author

Is there a compile time define (or can be defined during build in some way) to detect that the fuzzer is active? Then I could guard out the failing .reserve call. Generating huge numbers is perfectly valid for a fuzzer.

@zherczeg zherczeg force-pushed the gc_core branch 3 times, most recently from 47b24d4 to c060ce0 Compare June 26, 2025 13:19
Zoltan Herczeg added 3 commits June 26, 2025 14:32
Support named references for globals, locals, tables, elems
Support named references for call_ref, ref_null
Extend Var variables with an optional type field
@zherczeg zherczeg force-pushed the gc_core branch 2 times, most recently from f694b94 to 21a32fa Compare June 27, 2025 08:49
@zherczeg
Copy link
Contributor Author

zherczeg commented Jul 8, 2025

I have checked the type hashing system in the whole test system. It performs 45157 comparisons, and the hash/type_count is equal 513 times. From that, 467 times the types are really equal, and 46 times they are not. Overall the efficiency is 99.9%.

I could not improve the hash by using more complex hashes. However, (hash * 33) + value has the same efficiency.
I will check the fails, maybe it gives some ideas.

Edit: Now the efficiency is 100%. It does not mean it is perfect or anything. The missing cases were valid, worth improving the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants