Skip to content

Conversation

@Yogu
Copy link
Member

@Yogu Yogu commented Nov 13, 2025

No description provided.

@Yogu Yogu force-pushed the refactor-traversal branch 2 times, most recently from c3e799f to 1d8ebc9 Compare November 13, 2025 16:37
@Yogu Yogu marked this pull request as draft November 13, 2025 16:42
@Yogu Yogu force-pushed the memory-issue-root-and-sub-children branch 2 times, most recently from ec371a7 to 5502368 Compare November 13, 2025 17:12
@Yogu Yogu force-pushed the refactor-traversal branch from 1d8ebc9 to 6098d71 Compare November 13, 2025 18:25
@Yogu Yogu force-pushed the memory-issue-root-and-sub-children branch from 5502368 to 265b74c Compare November 13, 2025 18:27
@Yogu Yogu force-pushed the refactor-traversal branch from 6098d71 to bc56d66 Compare November 13, 2025 18:32
@Yogu Yogu force-pushed the memory-issue-root-and-sub-children branch 2 times, most recently from 433ce56 to b60bc9f Compare November 13, 2025 18:58
@Yogu Yogu force-pushed the refactor-traversal branch from bc56d66 to e2af2aa Compare November 14, 2025 08:24
@Yogu Yogu force-pushed the memory-issue-root-and-sub-children branch 13 times, most recently from 3a0d319 to b338b0c Compare November 20, 2025 16:04
@Yogu Yogu marked this pull request as ready for review November 20, 2025 16:05
@Yogu
Copy link
Member Author

Yogu commented Nov 20, 2025

This PR is now finalized, but it depends on #365, so we should wait until that is merged before reviewing this one here probably.

(also, the target branch of this PR is currently the branch of !365)

@Yogu Yogu force-pushed the refactor-traversal branch 3 times, most recently from 6910ac8 to c023b92 Compare November 25, 2025 13:41
@Yogu Yogu force-pushed the memory-issue-root-and-sub-children branch from b338b0c to bac48ec Compare November 25, 2025 15:44
@Yogu Yogu force-pushed the refactor-traversal branch from c023b92 to 7f7ce45 Compare November 25, 2025 15:55
Yogu added 3 commits November 25, 2025 16:56
If you access an outer variable (like a root variable) in a loop, and the
loop also has another subquery, the outer variable will be copied in memory
for each subquery instance, so for each outer loop iteration. This is a
known limitation of ArangoDB.

These tests cover some cases where this is the case. Future commits will
improve some of these cases.
We optimized AQL generation of TraversalQueryNode so it e.g. uses array expansions instead of subqueries, and they are also relevant for regular list fields (child entities, value objects, scalars, enums)

This is especially important when they are queried alongside a root field because subquery nodes sometimes hold a copy of all variables used in sibling nodes.

References have a memory usage regression because their variable is no longer pulled up. This will be fixed in the next commit.

Some tests where IS_LIST(...) ? ... : [] was replaced by ...[*] twice for the same field show increased memory usage. This is because ArangoDB deduplicated the CalculationNode using the IS_LIST approach, but no longer does this with the [*] approach. In other cases (when there is a CalculationNode anyway), the [*] approach reduces memory. As the [*] approach also makes for clearer queries, and there is no clear winner (without making very strong assumptions about ArangoDB's optimization and also doing a lot of analyzing work), we don't introduce a special case to use IS_LIST to getFieldTraversalFragment() fow now.
…ry usage

ArangoDB also has logic to hoist variable assignments if it can, but it sometimes (or rather often in our cases) pushes them down again. This is problematic if the result of the assignment is much smaller than its dependencies. In the case of root entities that are accessed via @root, this is often the case because usually, only a subset of the root's fields are accessed within the loop, but a lot more fields are accessed on root level.

We can prevent ArangoDB from pushing down the variables again by wrapping their assignment in NOEVAL(...)

If a query requests many fields from the root object in general (by default 5), the reduce-extraction-to-projection optimization is no longer applied so the full root object is held in memory, and without variable hosting, can be duplicated for each result item of a traversal, leading to very high memory usage (root entity size * number of collect items)

In the regression test root-fields/root-with-collect in query q, the memory usage increased slightly. This is likely because the added variable assignments have an overhead that is not offset by the hoisting because of the small number of items and small size of objects in the regression tests.
@Yogu Yogu force-pushed the memory-issue-root-and-sub-children branch from bac48ec to 1e15e56 Compare November 25, 2025 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants