Skip to content

Conversation

@Yogu
Copy link
Member

@Yogu Yogu commented Oct 30, 2025

  • regression tests store AQL in files
  • new regression tests with "memory pitfalls" that show us where a document is loaded into memory even if it should not
  • changes in AQL generation so it's simpler and fixes a few memory issues
    • avoid patterns that disable the reduce-extraction-to-projection opitimizations
    • avoid subqueries
    • generally simplify queries

@Yogu Yogu force-pushed the refactor-traversal branch 5 times, most recently from 7e03cc3 to 47468d5 Compare October 30, 2025 16:46
@Yogu Yogu requested a review from Copilot October 30, 2025 16:48
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves query memory usage by simplifying AQL generation and storing regression test queries in files. The changes focus on optimizing query patterns to enable better database-level optimizations and avoid unnecessary memory consumption.

Key changes:

  • Regression tests now store AQL queries in separate .aql files for better tracking and comparison
  • Refactored query generation to eliminate patterns that prevent reduce-extraction-to-projection optimizations
  • Introduced new regression tests demonstrating memory-efficient query patterns

Reviewed Changes

Copilot reviewed 191 out of 1032 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
spec/regression/initialization.ts Refactored to use new InitTestDataContext class for test data initialization
spec/regression/init-test-data-context.ts New context class encapsulating GraphQL execution logic for test initialization
spec/regression/list-limits/tests//aql/.aql New AQL files for list limits regression tests
spec/regression/keywords/tests//aql/.aql New AQL files for keyword escaping tests
spec/regression/collect/tests//aql/.aql New AQL files for collection operation tests
spec/regression/access-restrictions/tests//aql/.aql New AQL files for access restriction tests
spec/regression/access-groups/tests//aql/.aql New AQL files for access group tests
spec/regression/collect/tests/*/result.json Updated result files showing work-in-progress error states
spec/regression/access-restrictions/tests/accounting/test.graphql Added query name to GraphQL test

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Yogu Yogu force-pushed the refactor-traversal branch 14 times, most recently from 7a49de3 to 81c67b6 Compare November 5, 2025 10:22
@Yogu Yogu force-pushed the refactor-traversal branch 5 times, most recently from ff92e3c to c59308a Compare November 11, 2025 16:34
@Yogu Yogu marked this pull request as ready for review November 11, 2025 16:35
@Yogu Yogu force-pushed the refactor-traversal branch 3 times, most recently from 7fa2697 to eae1d24 Compare November 12, 2025 15:31
Yogu added 22 commits November 21, 2025 14:22
The original code incorrectly assumed that a log4js logger had its own level. Instead, changing a logger's level changes the level of the logger's category system-wide. That leads to very confusing behavior that depends on the order in which you create projects or database adapters.

We're still changing global state for the trace level, but in a direct and more understandable way.
This was not buggy, but it's also unnecessarily complicated in these situations.
The two ways introduce unnecessary complexity. We also will add a feature to output AQL strings for each operation, which requires operation names. Also, some of the anonymous operations were better split into multiple named operations.
There are quite a lot of files possible per test, so this is easier to scan.

Also, we will add AQL files where multiple files will be generated per test (one per operation).

Also, it's now easier to copy a test by just copying the directory.

access-restrictions/logistics-reader was accidentally deleted in eed1127, restored.
This has mainly two use cases

- Getting a general feeling of how queries are translated into AQL to discover potential for optimizations and simplifications
- Double-checking that a code change does not change AQL in a bad way, even if you don't notice it in the result with the test data
ArangoDB 3.11 is end of life since May 30, 2025.

This simplifies regression tests - we would otherwise make
sure that the AQL memory usages are only compared when
ArangoDB 3.12 is being tested.
This allows us to track if refactorings improve or worsen query behavior.

To make this more meaningful, we need to add fields with large string values to entities in the test data so we see when they are duplicated.
These fields are not queried, but they make it easier to see when queries are inefficient and read fields they should not or hold them in memory multiple times
…s-expected

The proper formatting is done by prettier via husky, but this part is easy to fix
…ields

The tests currently assert that the memory limit is exceeded. This is expected and will be fixed in an upcoming commit by a performance improvement.
This avoids variable names changing when features like sorting and filtering are added or removed, causing changes in .aql files of the regression tests
… in ArangoDB through dangling-edge filter

We currently do not trust edges - if the document they point to does not exist, we ignore the edge.

In the past, this access was done via $node != null. This has the problem that ArangoDB considers it a full document access and no longer applies the reduce-extraction-to-projection optimization rule. This rule is very important to limit query memory limit usage.

Removing the dangling edges filter completely would be breaking, but we can optimize them so the reduce-extraction-to-projection optimization still works.
The only reason flatMap was not used here is that it did not exist when the code was written.
These tests are useful to ensure that the next commit, which is a big refactoring, does not break any of the cases. It also makes it easier to see how exactly the AQL generation changes.
This should be treated like an empty array
Yogu added 4 commits November 25, 2025 16:55
Instead of collecting { root, obj } pairs in a list, we now generate the whole collect result including field selection in one  subquery.

Also, we try to avoid unnecessary subqueries because they pass down items in registers and sometimes copy the values for that. This increased complexity of the AQL generation of a TraversalQueryNode, but it generates much more natural AQL queries as a result that have better runtime performance in ArangoDB.
This was disallowed before because it triggered an ArangoDB bug that caused it to use excessive amounts of memory. That bug has been fixed since ArangoDB 3.10, so we can remove the guard again.
The previous names suggested a real difference between the regular and the "alias" variants, but the only difference was that for one, the AQLVariable was created by the method, and the other accepted a custom fragment to bind the variable to.

Also harmonized the methods between aql and js.
@Yogu Yogu force-pushed the refactor-traversal branch from c023b92 to 7f7ce45 Compare November 25, 2025 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants