Use `built_value` for `AssetNode` and related types. #3914

davidmorgan · 2025-03-07T11:55:12Z

For #3811, review and merge #3908 first.

built_value gives a way to represent immutable data and serialize it, which looks like a very good fit here.

The code is closely equivalent to what was there before; the biggest change is that because AssetNode is now immutable it's necessary to fetch updated nodes rather than getting mutations "for free".

This is still an intermediate state: the next PR can switch to using built_value for serialization and test (value comparison) code, then it becomes much easier to make changes to the data model.

As expected this is worse for performance

before this PR:
json_serializable
shape,libraries,clean/ms,no changes/ms,incremental/ms
loop,1,24385,4365,5483
loop,100,26646,4417,7557
loop,250,30159,4582,12666
loop,500,41991,5000,25656
loop,750,59208,6361,43598
loop,1000,78480,8256,69272

after:
json_serializable
shape,libraries,clean/ms,no changes/ms,incremental/ms
loop,1,24672,4336,5726
loop,100,27260,4521,8359
loop,250,31512,5230,13349
loop,500,55433,13681,33311
loop,750,108009,38464,71159
loop,1000,204181,80411,130737

which is fine: the AssetGraph updates that are expensive were already too expensive before this PR, it's exactly duplicate work in these updates that needs addressing.

github-actions · 2025-03-07T11:57:00Z

PR Health

Changelog Entry ✔️

Package	Changed Files

Changes to files need to be accounted for in their respective changelogs.

jakemac53 · 2025-03-07T16:11:27Z

I would advise against using any codegen for these classes, it is a bit annoying sure to hand write the serialization, but it is much more of a pain in general to have code generator deps in build_runner itself (or actually anything that build_runner depends on).

Basically, you consistently end up in situations where you cannot run build_runner at all because these files are either out of date or have been deleted, so you can't compile, but you need to compile to run the build, etc. We made this mistake in build_modules and build_config and it is really quite annoying sometimes.

On another note is our serialization is already quite slow, so please benchmark this with large asset graphs, in previous testing I saw built value being about 50% slower than normal JSON deserialization.

We also probably want to leave open the option for using a different format than JSON for the asset graph.

davidmorgan · 2025-03-07T16:28:09Z

Basically, you consistently end up in situations where you cannot run build_runner at all because these files are either out of date or have been deleted, so you can't compile, but you need to compile to run the build, etc.

That's definitely a concern.

I wrote a script that runs build_runner using deps from pub instead of the local versions; that seems to do the job.

On another note is our serialization is already quite slow, so please benchmark this with large asset graphs, in previous testing I saw built value being about 50% slower than normal JSON deserialization.

I am expecting that this will end up much, much smaller+faster.

asset_graph.json for the json_serializable_loop_1000 benchmark is current 20Mb because each node repeats the full transitive input set. Sharing the transitive input sets where possible for the runtime performance gain will also give a big size reduction for graph serialization. But, it needs a more complex data structure, and to figure out the details I need to be able to iterate on it while testing with serialization. It's already dauntingly complicated without having to correctly hard-code serialization for every variant I want to try :) so codegen seems like the only way forward.

Thanks.

jakemac53 · 2025-03-07T16:35:12Z

It's already dauntingly complicated without having to correctly hard-code serialization for every variant I want to try :) so codegen seems like the only way forward.

Do compare against json_serializable or a similar package. I definitely have a bias against anything builder pattern and I know you own built_value so it's a default for you, but we can easily resolve that by just measuring it. If it's not as much slower as I think then it's fine.

davidmorgan · 2025-03-07T17:00:35Z

It's already dauntingly complicated without having to correctly hard-code serialization for every variant I want to try :) so codegen seems like the only way forward.

Do compare against json_serializable or a similar package. I definitely have a bias against anything builder pattern and I know you own built_value so it's a default for you, but we can easily resolve that by just measuring it. If it's not as much slower as I think then it's fine.

I think this is the right way to go independent of serialization: immutable data is ideal for adding optimizations that reuse subgraphs.

That should shrink the 20Mb for the 1000 node benchmark to more like 20k. Real cases with less perfect shared subgraphs will gain less, of course, but they should still gain a lot.

Actual serialization performance should end up being much less important, I can look into it if it still shows up on the benchmarks.

jakemac53 · 2025-03-07T18:29:25Z

I don't have anything against using immutable data here, it's the doubling of allocations needed to create/copy/update any object when using the builder pattern that I am concerned about. This applies to not just serialization but all object creation.

davidmorgan · 2025-03-08T09:23:56Z

I don't have anything against using immutable data here, it's the doubling of allocations needed to create/copy/update any object when using the builder pattern that I am concerned about. This applies to not just serialization but all object creation.

Reducing copying is one of the big features of built_value, particularly when used for graphs/trees it guarantees to only copy the part of the graph/tree that changed. This is pretty hard to arrange to do any other way :)

jensjoha

I certainly don't like the runtime increasing by 160%, and I'm not 100% sold on generated code like this.

I'm willing to let this run though, but even after a future, hopefully big, decrease in runtime, we should probably be prepared to walk this back if it shows up in profiles (while I get that the overhead will likely be much less in the future scenario, performance is also avoiding doing stuff that takes longer than the alternative).

jensjoha · 2025-03-11T14:04:18Z

build_runner_core/lib/src/generate/build_impl.dart

              .toList();

-      for (final node in potentialNodes) {
+      for (final potentialId in potentialIds) {
+        final node = _assetGraph.get(potentialId)!;


will this node be different from the node we extracted the id from above? (aka I don't understand why we map it to the id, then get the node based on id?)

It was to avoid having locals around that would become stale after the _buildAsset in the next loop.

This whole method can be improved--added comments and some for loops instead of where / where / map, hopefully this looks better :) thanks.

jensjoha · 2025-03-11T14:05:26Z

build_runner_core/lib/src/generate/build_impl.dart

@@ -991,68 +1021,81 @@ class _SingleBuild {
    for (var output in outputs) {
      var wasOutput = readerWriter.assetsWritten.contains(output);
      var digest = wasOutput ? await _readerWriter.digest(output) : null;
-      var node = _assetGraph.get(output)!;
-
-      // **IMPORTANT**: All updates to `node` must be synchronous. With lazy


Is this no longer "IMPORTANT"? Or should the comment be kept?

Since the nodes are now immutable, new nodes arrive in the graph atomically, and old references to nodes don't see any update until they fetch again.

Added to that, we only run one generator at a time now.

So I think it would be very hard to hit any issue :)

davidmorgan

I certainly don't like the runtime increasing by 160%, and I'm not 100% sold on generated code like this.

I'm willing to let this run though, but even after a future, hopefully big, decrease in runtime, we should probably be prepared to walk this back if it shows up in profiles (while I get that the overhead will likely be much less in the future scenario, performance is also avoiding doing stuff that takes longer than the alternative).

Entirely reasonable--thanks.

davidmorgan · 2025-03-11T15:12:42Z

build_runner_core/lib/src/generate/build_impl.dart

@@ -991,68 +1021,81 @@ class _SingleBuild {
    for (var output in outputs) {
      var wasOutput = readerWriter.assetsWritten.contains(output);
      var digest = wasOutput ? await _readerWriter.digest(output) : null;
-      var node = _assetGraph.get(output)!;
-
-      // **IMPORTANT**: All updates to `node` must be synchronous. With lazy


Since the nodes are now immutable, new nodes arrive in the graph atomically, and old references to nodes don't see any update until they fetch again.

Added to that, we only run one generator at a time now.

So I think it would be very hard to hit any issue :)

davidmorgan · 2025-03-11T15:44:49Z

build_runner_core/lib/src/generate/build_impl.dart

              .toList();

-      for (final node in potentialNodes) {
+      for (final potentialId in potentialIds) {
+        final node = _assetGraph.get(potentialId)!;


It was to avoid having locals around that would become stale after the _buildAsset in the next loop.

This whole method can be improved--added comments and some for loops instead of where / where / map, hopefully this looks better :) thanks.

davidmorgan force-pushed the node-fully-built branch from 8c0dc7d to 129e99b Compare March 7, 2025 11:57

davidmorgan force-pushed the node-fully-built branch 2 times, most recently from 4bada91 to a2e7940 Compare March 10, 2025 12:40

davidmorgan changed the title ~~Use built_value for AssetNode~~ Use built_value for AssetNode and related types. Mar 10, 2025

davidmorgan force-pushed the node-fully-built branch 2 times, most recently from e17b0ec to e436172 Compare March 10, 2025 14:45

davidmorgan mentioned this pull request Mar 10, 2025

Use built_value for serialization. #3915

Merged

Use built_value for AssetNode and related types.

d2f36bc

davidmorgan force-pushed the node-fully-built branch from e436172 to d2f36bc Compare March 11, 2025 13:31

davidmorgan marked this pull request as ready for review March 11, 2025 13:43

davidmorgan requested a review from jensjoha March 11, 2025 13:44

jensjoha approved these changes Mar 11, 2025

View reviewed changes

Clearer _buildGlobNode.

28fdfc1

davidmorgan commented Mar 11, 2025

View reviewed changes

davidmorgan merged commit 3f16290 into dart-lang:master Mar 11, 2025
75 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `built_value` for `AssetNode` and related types. #3914

Use `built_value` for `AssetNode` and related types. #3914

davidmorgan commented Mar 7, 2025 •

edited

Loading

github-actions bot commented Mar 7, 2025 •

edited

Loading

jakemac53 commented Mar 7, 2025 •

edited

Loading

davidmorgan commented Mar 7, 2025

jakemac53 commented Mar 7, 2025

davidmorgan commented Mar 7, 2025

jakemac53 commented Mar 7, 2025

davidmorgan commented Mar 8, 2025

jensjoha left a comment

jensjoha Mar 11, 2025

davidmorgan Mar 11, 2025

jensjoha Mar 11, 2025

davidmorgan Mar 11, 2025

davidmorgan left a comment

davidmorgan Mar 11, 2025

davidmorgan Mar 11, 2025

Use built_value for AssetNode and related types. #3914

Use built_value for AssetNode and related types. #3914

Conversation

davidmorgan commented Mar 7, 2025 • edited Loading

github-actions bot commented Mar 7, 2025 • edited Loading

PR Health

jakemac53 commented Mar 7, 2025 • edited Loading

davidmorgan commented Mar 7, 2025

jakemac53 commented Mar 7, 2025

davidmorgan commented Mar 7, 2025

jakemac53 commented Mar 7, 2025

davidmorgan commented Mar 8, 2025

jensjoha left a comment

Choose a reason for hiding this comment

jensjoha Mar 11, 2025

Choose a reason for hiding this comment

davidmorgan Mar 11, 2025

Choose a reason for hiding this comment

jensjoha Mar 11, 2025

Choose a reason for hiding this comment

davidmorgan Mar 11, 2025

Choose a reason for hiding this comment

davidmorgan left a comment

Choose a reason for hiding this comment

davidmorgan Mar 11, 2025

Choose a reason for hiding this comment

davidmorgan Mar 11, 2025

Choose a reason for hiding this comment

Use `built_value` for `AssetNode` and related types. #3914

Use `built_value` for `AssetNode` and related types. #3914

davidmorgan commented Mar 7, 2025 •

edited

Loading

github-actions bot commented Mar 7, 2025 •

edited

Loading

jakemac53 commented Mar 7, 2025 •

edited

Loading