Improve query memory usage #365

Yogu · 2025-10-30T15:07:07Z

regression tests store AQL in files
new regression tests with "memory pitfalls" that show us where a document is loaded into memory even if it should not
changes in AQL generation so it's simpler and fixes a few memory issues
- avoid patterns that disable the reduce-extraction-to-projection opitimizations
- avoid subqueries
- generally simplify queries

Copilot

Pull Request Overview

This PR improves query memory usage by simplifying AQL generation and storing regression test queries in files. The changes focus on optimizing query patterns to enable better database-level optimizations and avoid unnecessary memory consumption.

Key changes:

Regression tests now store AQL queries in separate .aql files for better tracking and comparison
Refactored query generation to eliminate patterns that prevent reduce-extraction-to-projection optimizations
Introduced new regression tests demonstrating memory-efficient query patterns

Reviewed Changes

Copilot reviewed 191 out of 1032 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
spec/regression/initialization.ts	Refactored to use new InitTestDataContext class for test data initialization
spec/regression/init-test-data-context.ts	New context class encapsulating GraphQL execution logic for test initialization
spec/regression/list-limits/tests//aql/.aql	New AQL files for list limits regression tests
spec/regression/keywords/tests//aql/.aql	New AQL files for keyword escaping tests
spec/regression/collect/tests//aql/.aql	New AQL files for collection operation tests
spec/regression/access-restrictions/tests//aql/.aql	New AQL files for access restriction tests
spec/regression/access-groups/tests//aql/.aql	New AQL files for access group tests
spec/regression/collect/tests/*/result.json	Updated result files showing work-in-progress error states
spec/regression/access-restrictions/tests/accounting/test.graphql	Added query name to GraphQL test

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

spec/regression/collect/tests/relation-and-field-aggregation/result.json

spec/regression/collect/tests/field-aggregation/result.json

spec/regression/collect/tests/distinct-aggregation/result.json

The original code incorrectly assumed that a log4js logger had its own level. Instead, changing a logger's level changes the level of the logger's category system-wide. That leads to very confusing behavior that depends on the order in which you create projects or database adapters. We're still changing global state for the trace level, but in a direct and more understandable way.

This was not buggy, but it's also unnecessarily complicated in these situations.

The two ways introduce unnecessary complexity. We also will add a feature to output AQL strings for each operation, which requires operation names. Also, some of the anonymous operations were better split into multiple named operations.

There are quite a lot of files possible per test, so this is easier to scan. Also, we will add AQL files where multiple files will be generated per test (one per operation). Also, it's now easier to copy a test by just copying the directory. access-restrictions/logistics-reader was accidentally deleted in eed1127, restored.

This has mainly two use cases - Getting a general feeling of how queries are translated into AQL to discover potential for optimizations and simplifications - Double-checking that a code change does not change AQL in a bad way, even if you don't notice it in the result with the test data

ArangoDB 3.11 is end of life since May 30, 2025. This simplifies regression tests - we would otherwise make sure that the AQL memory usages are only compared when ArangoDB 3.12 is being tested.

This allows us to track if refactorings improve or worsen query behavior. To make this more meaningful, we need to add fields with large string values to entities in the test data so we see when they are duplicated.

These fields are not queried, but they make it easier to see when queries are inefficient and read fields they should not or hold them in memory multiple times

…s-expected The proper formatting is done by prettier via husky, but this part is easy to fix

…ields The tests currently assert that the memory limit is exceeded. This is expected and will be fixed in an upcoming commit by a performance improvement.

This avoids variable names changing when features like sorting and filtering are added or removed, causing changes in .aql files of the regression tests

The null checks prevent arangodb from applying the reduce-extraction-to-projection optimization. See https://docs.arango.ai/arangodb/3.12/aql/execution-and-performance/query-optimization/#reduce-extraction-to-projection

… in ArangoDB through dangling-edge filter We currently do not trust edges - if the document they point to does not exist, we ignore the edge. In the past, this access was done via $node != null. This has the problem that ArangoDB considers it a full document access and no longer applies the reduce-extraction-to-projection optimization rule. This rule is very important to limit query memory limit usage. Removing the dangling edges filter completely would be breaking, but we can optimize them so the reduce-extraction-to-projection optimization still works.

The only reason flatMap was not used here is that it did not exist when the code was written.

These tests are useful to ensure that the next commit, which is a big refactoring, does not break any of the cases. It also makes it easier to see how exactly the AQL generation changes.

This should be treated like an empty array

Instead of collecting { root, obj } pairs in a list, we now generate the whole collect result including field selection in one subquery. Also, we try to avoid unnecessary subqueries because they pass down items in registers and sometimes copy the values for that. This increased complexity of the AQL generation of a TraversalQueryNode, but it generates much more natural AQL queries as a result that have better runtime performance in ArangoDB.

This was disallowed before because it triggered an ArangoDB bug that caused it to use excessive amounts of memory. That bug has been fixed since ArangoDB 3.10, so we can remove the guard again.

…Subquery

The previous names suggested a real difference between the regular and the "alias" variants, but the only difference was that for one, the AQLVariable was created by the method, and the other accepted a custom fragment to bind the variable to. Also harmonized the methods between aql and js.

Yogu force-pushed the refactor-traversal branch 5 times, most recently from 7e03cc3 to 47468d5 Compare October 30, 2025 16:46

Yogu requested a review from Copilot October 30, 2025 16:48

Copilot AI reviewed Oct 30, 2025

View reviewed changes

Yogu force-pushed the refactor-traversal branch 14 times, most recently from 7a49de3 to 81c67b6 Compare November 5, 2025 10:22

Yogu force-pushed the refactor-traversal branch 5 times, most recently from ff92e3c to c59308a Compare November 11, 2025 16:34

Yogu marked this pull request as ready for review November 11, 2025 16:35

Yogu force-pushed the refactor-traversal branch 3 times, most recently from 7fa2697 to eae1d24 Compare November 12, 2025 15:31

Yogu added 22 commits November 21, 2025 14:22

refactor: avoid using globalContext only to get the logger

186e7dd

This was not buggy, but it's also unnecessarily complicated in these situations.

test: make clearing the regression test database more explicit

527eff3

chore: remove support of ArangoDB 3.11

b5b35fc

ArangoDB 3.11 is end of life since May 30, 2025. This simplifies regression tests - we would otherwise make sure that the AQL memory usages are only compared when ArangoDB 3.12 is being tested.

chore: remove support for Node 18

7a736fb

chore: test with Node 24 and 25

c24fc68

test: include peak memory usage in .aql files of regression tests

231bac8

This allows us to track if refactorings improve or worsen query behavior. To make this more meaningful, we need to add fields with large string values to entities in the test data so we see when they are duplicated.

test: add "payload" fields to some regression tests

f57a21c

These fields are not queried, but they make it easier to see when queries are inefficient and read fields they should not or hold them in memory multiple times

chore: fix husky configuration

3126ba1

style: reformat regression test result files with prettier

069af72

test: insert trailing newline in files generated with --save-actual-a…

6ea20c0

…s-expected The proper formatting is done by prettier via husky, but this part is easy to fix

test: add tests for excessive memory usage in traversal and collect f…

420bd3c

…ields The tests currently assert that the memory limit is exceeded. This is expected and will be fixed in an upcoming commit by a performance improvement.

refactor: consistently use decapitalized type names for variables

2695e66

This avoids variable names changing when features like sorting and filtering are added or removed, causing changes in .aql files of the regression tests

chore: update mocha

d5fb4e3

perf: skip null check for parent and root fields

d7aef24

The null checks prevent arangodb from applying the reduce-extraction-to-projection optimization. See https://docs.arango.ai/arangodb/3.12/aql/execution-and-performance/query-optimization/#reduce-extraction-to-projection

refactor: use flatMap instead of reduce/concat

1b2188c

The only reason flatMap was not used here is that it did not exist when the code was written.

test: add more test for traversals

d841d07

These tests are useful to ensure that the next commit, which is a big refactoring, does not break any of the cases. It also makes it easier to see how exactly the AQL generation changes.

test: add test data where a child entity collection is NULL

fff9e2c

This should be treated like an empty array

Yogu force-pushed the refactor-traversal branch 2 times, most recently from 6910ac8 to c023b92 Compare November 25, 2025 13:41

Yogu mentioned this pull request Nov 25, 2025

Memory issue when root and sub-children are selcted at the same time #369

Open

Yogu added 4 commits November 25, 2025 16:55

feat: allow flexSearch queries with sort and skip but no first

86f7fdd

This was disallowed before because it triggered an ArangoDB bug that caused it to use excessive amounts of memory. That bug has been fixed since ArangoDB 3.10, so we can remove the guard again.

refactor: rename aqlExt.parenthesize{List,Object} to aqlExt.[firstOf]…

708de1e

…Subquery

Yogu force-pushed the refactor-traversal branch from c023b92 to 7f7ce45 Compare November 25, 2025 15:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve query memory usage #365

Improve query memory usage #365

Uh oh!

Yogu commented Oct 30, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Improve query memory usage #365

Are you sure you want to change the base?

Improve query memory usage #365

Uh oh!

Conversation

Yogu commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Yogu commented Oct 30, 2025 •

edited

Loading