-
Notifications
You must be signed in to change notification settings - Fork 16
Improve query memory usage #365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
7e03cc3 to
47468d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves query memory usage by simplifying AQL generation and storing regression test queries in files. The changes focus on optimizing query patterns to enable better database-level optimizations and avoid unnecessary memory consumption.
Key changes:
- Regression tests now store AQL queries in separate
.aqlfiles for better tracking and comparison - Refactored query generation to eliminate patterns that prevent reduce-extraction-to-projection optimizations
- Introduced new regression tests demonstrating memory-efficient query patterns
Reviewed Changes
Copilot reviewed 191 out of 1032 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| spec/regression/initialization.ts | Refactored to use new InitTestDataContext class for test data initialization |
| spec/regression/init-test-data-context.ts | New context class encapsulating GraphQL execution logic for test initialization |
| spec/regression/list-limits/tests//aql/.aql | New AQL files for list limits regression tests |
| spec/regression/keywords/tests//aql/.aql | New AQL files for keyword escaping tests |
| spec/regression/collect/tests//aql/.aql | New AQL files for collection operation tests |
| spec/regression/access-restrictions/tests//aql/.aql | New AQL files for access restriction tests |
| spec/regression/access-groups/tests//aql/.aql | New AQL files for access group tests |
| spec/regression/collect/tests/*/result.json | Updated result files showing work-in-progress error states |
| spec/regression/access-restrictions/tests/accounting/test.graphql | Added query name to GraphQL test |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
spec/regression/collect/tests/relation-and-field-aggregation/result.json
Outdated
Show resolved
Hide resolved
7a49de3 to
81c67b6
Compare
ff92e3c to
c59308a
Compare
7fa2697 to
eae1d24
Compare
The original code incorrectly assumed that a log4js logger had its own level. Instead, changing a logger's level changes the level of the logger's category system-wide. That leads to very confusing behavior that depends on the order in which you create projects or database adapters. We're still changing global state for the trace level, but in a direct and more understandable way.
This was not buggy, but it's also unnecessarily complicated in these situations.
The two ways introduce unnecessary complexity. We also will add a feature to output AQL strings for each operation, which requires operation names. Also, some of the anonymous operations were better split into multiple named operations.
There are quite a lot of files possible per test, so this is easier to scan. Also, we will add AQL files where multiple files will be generated per test (one per operation). Also, it's now easier to copy a test by just copying the directory. access-restrictions/logistics-reader was accidentally deleted in eed1127, restored.
This has mainly two use cases - Getting a general feeling of how queries are translated into AQL to discover potential for optimizations and simplifications - Double-checking that a code change does not change AQL in a bad way, even if you don't notice it in the result with the test data
ArangoDB 3.11 is end of life since May 30, 2025. This simplifies regression tests - we would otherwise make sure that the AQL memory usages are only compared when ArangoDB 3.12 is being tested.
This allows us to track if refactorings improve or worsen query behavior. To make this more meaningful, we need to add fields with large string values to entities in the test data so we see when they are duplicated.
These fields are not queried, but they make it easier to see when queries are inefficient and read fields they should not or hold them in memory multiple times
…s-expected The proper formatting is done by prettier via husky, but this part is easy to fix
…ields The tests currently assert that the memory limit is exceeded. This is expected and will be fixed in an upcoming commit by a performance improvement.
This avoids variable names changing when features like sorting and filtering are added or removed, causing changes in .aql files of the regression tests
The null checks prevent arangodb from applying the reduce-extraction-to-projection optimization. See https://docs.arango.ai/arangodb/3.12/aql/execution-and-performance/query-optimization/#reduce-extraction-to-projection
… in ArangoDB through dangling-edge filter We currently do not trust edges - if the document they point to does not exist, we ignore the edge. In the past, this access was done via $node != null. This has the problem that ArangoDB considers it a full document access and no longer applies the reduce-extraction-to-projection optimization rule. This rule is very important to limit query memory limit usage. Removing the dangling edges filter completely would be breaking, but we can optimize them so the reduce-extraction-to-projection optimization still works.
The only reason flatMap was not used here is that it did not exist when the code was written.
These tests are useful to ensure that the next commit, which is a big refactoring, does not break any of the cases. It also makes it easier to see how exactly the AQL generation changes.
This should be treated like an empty array
6910ac8 to
c023b92
Compare
Instead of collecting { root, obj } pairs in a list, we now generate the whole collect result including field selection in one subquery.
Also, we try to avoid unnecessary subqueries because they pass down items in registers and sometimes copy the values for that. This increased complexity of the AQL generation of a TraversalQueryNode, but it generates much more natural AQL queries as a result that have better runtime performance in ArangoDB.
This was disallowed before because it triggered an ArangoDB bug that caused it to use excessive amounts of memory. That bug has been fixed since ArangoDB 3.10, so we can remove the guard again.
The previous names suggested a real difference between the regular and the "alias" variants, but the only difference was that for one, the AQLVariable was created by the method, and the other accepted a custom fragment to bind the variable to. Also harmonized the methods between aql and js.
c023b92 to
7f7ce45
Compare
Uh oh!
There was an error while loading. Please reload this page.