Implement GC basics #2607

zherczeg · 2025-05-23T10:59:10Z

Parsing / reading / writing of core structures (rec/sub) is mostly completed, validation is far from over

zherczeg · 2025-05-23T16:01:04Z

include/wabt/binary-reader.h

 struct TypeMut {
  Type type;
  bool mutable_;
 };
 using TypeMutVector = std::vector<TypeMut>;

+// Garbage Collector specific type information


This is a core part of the patch. It contains the (sub ...) part of the type. It is declared as a structure, because it is not mandatory.

zherczeg · 2025-05-23T16:02:27Z

include/wabt/binary-reader.h

  virtual Result OnFuncType(Index index,
+                            GCTypeExtension* gc_ext,


This structure is passed as a second argument, because it is a header, but it could go to the last argument since it is an extra (and optional) information. Which one you prefer?

zherczeg · 2025-05-23T16:03:41Z

include/wabt/type-checker.h

+  struct RecursiveRange {
+    Index start_index;
+    Index type_count;
+  };


This is another core structure to encode (rec ...) constructs. It represents the range.

zherczeg · 2025-05-23T16:04:58Z

include/wabt/type-checker.h

+    std::vector<FuncType> func_types;
+    std::vector<StructType> struct_types;
+    std::vector<ArrayType> array_types;
+    std::vector<RecursiveRange> recursive_ranges;


It is stored in an ordered array. It cannot be stored as part of the types, because zero length (rec) range is allowed for whatever reason.

zherczeg · 2025-06-03T10:20:04Z

I have reworked the type validation system of the patch. Now it is capable of detecting the first type index for all equal types. This first type index is called canonical index. If I have two types (t1/t2), and their canonical index is computed, then type comparison is t1.canonical_index == t2.canonical_index. Sub type indices can also be turned to canonical sub indices. This is not only useful for validation, but also very important for high speed execution, since it simplifies type comparison a lot. To compute these canonical indices, a hash code is computed for each type. When two types have different hash codes, they are never equal. My hash computation algorithm might not be good, I don't have much experience with these algorithms.

zherczeg · 2025-06-03T13:09:40Z

The type-* gc tests are running except the runtime part of type-subtyping.wast
I think the typing system in the validator and interpreter are ok now. This is another huge change with 1500 lines of new code.

rossberg · 2025-06-03T13:18:52Z

It sounds like you are canonicalising wrt type indices. But type indices are meaningless in any program that consists of more than one module. Type canonicalisation must happen globally, across module boundaries, based on the types' structure. I suspect that is the reason for the link/run-time tests failing.

zherczeg · 2025-06-03T19:15:24Z

The link tests are not failing, although the interpreter do a slow type comparison for import/exports. As far as I understand the interpreter here is just a demonstration, so this is probably ok. The runtime tests fail because the operations (such as ref.cast) is not implemented. I will do that in a follow-up patch.

The global type canonicalisation sounds like a very good idea! A high performance engine should do that!

zherczeg · 2025-06-05T04:44:14Z

@sbc100 there is a fuzzer issue in the code. The code is correct though.

https://github.com/WebAssembly/wabt/blob/main/src/interp/binary-reader-interp.cc#L772
As for the fuzzer generated test, it wants to allocate 16190847 entries, which is pretty large for a 38 byte input, but not an invalid value in general.

https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/fuzzer/FuzzerLoop.cpp
LLVM considers this as a large value, and reports it as an error. There is an -rss_limit_mb to modify this limit.

What shall I do?

sbc100 · 2025-06-05T18:54:14Z

@sbc100 there is a fuzzer issue in the code. The code is correct though.

https://github.com/WebAssembly/wabt/blob/main/src/interp/binary-reader-interp.cc#L772 As for the fuzzer generated test, it wants to allocate 16190847 entries, which is pretty large for a 38 byte input, but not an invalid value in general.

https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/fuzzer/FuzzerLoop.cpp LLVM considers this as a large value, and reports it as an error. There is an -rss_limit_mb to modify this limit.

What shall I do?

We don't tend to have time to worry about fixing all the fuzz tests issues, unless they could conceivable show up in real world programs. i.e. we tend to assume trusted and save inputs, since we don't have the resources the harden wabt against other things.

Having said that we obviously would be happy to accept fixes for such issues if folks come up with them.

zherczeg · 2025-06-05T19:11:32Z

There is nothing to fix here, the code is correct (and not related to this patch). It is simply a limitation of the fuzzer, it assumes too much memory allocation is likely a bug.

zherczeg · 2025-06-26T09:21:03Z

Is there a compile time define (or can be defined during build in some way) to detect that the fuzzer is active? Then I could guard out the failing .reserve call. Generating huge numbers is perfectly valid for a fuzzer.

Support named references for globals, locals, tables, elems Support named references for call_ref, ref_null Extend Var variables with an optional type field

…ions

zherczeg · 2025-07-08T10:06:03Z

I have checked the type hashing system in the whole test system. It performs 45157 comparisons, and the hash/type_count is equal 513 times. From that, 467 times the types are really equal, and 46 times they are not. Overall the efficiency is 99.9%.

I could not improve the hash by using more complex hashes. However, (hash * 33) + value has the same efficiency.
I will check the fails, maybe it gives some ideas.

Edit: Now the efficiency is 100%. It does not mean it is perfect or anything. The missing cases were valid, worth improving the code.

zherczeg force-pushed the gc_core branch 2 times, most recently from 243ec44 to 68fe37e Compare May 23, 2025 15:34

zherczeg commented May 23, 2025

View reviewed changes

zherczeg force-pushed the gc_core branch 9 times, most recently from e0ce7f8 to adcbdf7 Compare May 31, 2025 02:14

zherczeg force-pushed the gc_core branch from adcbdf7 to 3c92952 Compare June 3, 2025 09:34

zherczeg force-pushed the gc_core branch 2 times, most recently from 75bdfe5 to 890a316 Compare June 3, 2025 13:06

zherczeg force-pushed the gc_core branch 3 times, most recently from cc8e21f to 097046d Compare June 4, 2025 12:09

sbc100 mentioned this pull request Jun 5, 2025

wasm-decompile: unexpected opcode 0x12 #2613

Closed

zherczeg marked this pull request as ready for review June 11, 2025 10:11

zherczeg force-pushed the gc_core branch 3 times, most recently from 68c749c to b7fc45c Compare June 20, 2025 08:11

zherczeg force-pushed the gc_core branch 8 times, most recently from 0af19a5 to ffdf723 Compare June 26, 2025 09:16

zherczeg force-pushed the gc_core branch 3 times, most recently from 47b24d4 to c060ce0 Compare June 26, 2025 13:19

Zoltan Herczeg added 3 commits June 26, 2025 14:32

Add support for function references proposal

cacacc7

Support named references for globals, locals, tables, elems Support named references for call_ref, ref_null Extend Var variables with an optional type field

Implement return_call_ref, ref.as_non_null, br_on_[non_]null instruct…

073a9d8

…ions

Support table initializers

a7701de

zherczeg force-pushed the gc_core branch 2 times, most recently from f694b94 to 21a32fa Compare June 27, 2025 08:49

zherczeg force-pushed the gc_core branch from 21a32fa to 8f7ddc4 Compare July 8, 2025 09:59

zherczeg force-pushed the gc_core branch from 8f7ddc4 to a4981a8 Compare July 9, 2025 07:29

Implement GC basics

ee17d2b

zherczeg force-pushed the gc_core branch from a4981a8 to ee17d2b Compare July 9, 2025 09:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement GC basics #2607

Implement GC basics #2607

Uh oh!

zherczeg commented May 23, 2025

Uh oh!

zherczeg May 23, 2025

Uh oh!

zherczeg May 23, 2025

Uh oh!

zherczeg May 23, 2025

Uh oh!

zherczeg May 23, 2025

Uh oh!

zherczeg commented Jun 3, 2025 •

edited

Loading

Uh oh!

zherczeg commented Jun 3, 2025

Uh oh!

rossberg commented Jun 3, 2025

Uh oh!

zherczeg commented Jun 3, 2025

Uh oh!

zherczeg commented Jun 5, 2025

Uh oh!

sbc100 commented Jun 5, 2025

Uh oh!

zherczeg commented Jun 5, 2025

Uh oh!

zherczeg commented Jun 26, 2025

Uh oh!

zherczeg commented Jul 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

		virtual Result OnFuncType(Index index,
		GCTypeExtension* gc_ext,

Implement GC basics #2607

Are you sure you want to change the base?

Implement GC basics #2607

Uh oh!

Conversation

zherczeg commented May 23, 2025

Uh oh!

zherczeg May 23, 2025

Choose a reason for hiding this comment

Uh oh!

zherczeg May 23, 2025

Choose a reason for hiding this comment

Uh oh!

zherczeg May 23, 2025

Choose a reason for hiding this comment

Uh oh!

zherczeg May 23, 2025

Choose a reason for hiding this comment

Uh oh!

zherczeg commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zherczeg commented Jun 3, 2025

Uh oh!

rossberg commented Jun 3, 2025

Uh oh!

zherczeg commented Jun 3, 2025

Uh oh!

zherczeg commented Jun 5, 2025

Uh oh!

sbc100 commented Jun 5, 2025

Uh oh!

zherczeg commented Jun 5, 2025

Uh oh!

zherczeg commented Jun 26, 2025

Uh oh!

zherczeg commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

zherczeg commented Jun 3, 2025 •

edited

Loading

zherczeg commented Jul 8, 2025 •

edited

Loading