-
Notifications
You must be signed in to change notification settings - Fork 17
Optimize database performance for OpenTDF platform #2300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Added comprehensive performance enhancements, including 40+ new indexes, PostgreSQL tuning configuration, query-specific optimizations, application-level caching, and monitoring scripts. Introduced dedicated guides for index implementation, query analysis, and performance testing. Also included a performance-optimized Docker Compose file and PostgreSQL config adjustments to reduce sequential scans, lower CPU usage, and improve query execution times by 10-100x.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @pflynn-virtru, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello! Gemini here, providing a summary of this pull request focused on significantly improving the database performance for the OpenTDF platform. The author, @pflynn-virtru, has introduced a comprehensive set of changes aimed at tackling high CPU usage, sequential scans, and slow query response times identified under load. The core strategy involves adding a large number of targeted indexes, providing detailed guides for database tuning and query optimization, adding scripts for testing and applying changes, and including a performance-optimized Docker Compose configuration.
Highlights
- Extensive Indexing: Over 40 new indexes have been added via a dedicated migration script (
service/policy/db/migrations/20250122000000_add_performance_indexes.sql
). These include foreign key indexes, composite indexes for common query patterns (like FQN lookups), partial indexes for active records, and GIN indexes for JSONB fields, specifically targeting the elimination of sequential scans and speeding up JOINs and lookups. - PostgreSQL Configuration Tuning: A new PostgreSQL configuration file (
postgres-performance.conf
) is introduced with settings optimized for memory usage, query planning (especially for SSDs), parallel query execution, logging, statistics collection (pg_stat_statements
), and autovacuum tuning. This config is integrated into a new performance-optimized Docker Compose file. - Comprehensive Documentation: Several new markdown documents (
docs/DATABASE_PERFORMANCE_README.md
,docs/database-performance-tuning.md
,docs/query-specific-optimizations.md
) provide detailed context on the performance problems, root causes, the optimization strategy, specific query analysis, recommended indexes, PostgreSQL tuning parameters, application-level optimizations (caching, prepared statements), monitoring techniques, and a rollback plan. - Performance Testing and Application Scripts: A new SQL script (
scripts/db-performance-test.sql
) is included to run baseline and post-optimization performance tests on key query patterns. A bash script (scripts/apply-performance-indexes.sh
) is provided for safely applying the index migration, including backups, stats checks, and rollback script generation. - Docker Compose for Performance Testing: A new Docker Compose file (
docker-compose-performance.yaml
) is added to easily spin up the OpenTDF platform with the performance-tuned PostgreSQL configuration, allowing for easier testing and validation of the improvements.
Changelog
Click here to see the changelog
- docker-compose-performance.yaml
- Added a new Docker Compose file for performance testing/deployment.
- Configured
opentdfdb
service with resource limits (CPU, memory), tmpfs for temporary data, sysctls for network tuning, shm_size, and mountedpostgres-performance.conf
. - Configured
opentdf_pgdata
volume to use local bind mount for potentially better I/O.
- docs/DATABASE_PERFORMANCE_README.md
- Added a new README summarizing the database performance problem, root cause, solution overview, expected improvements (10-100x faster queries, reduced CPU/sequential scans), files created, next steps, quick win index examples, and monitoring commands.
- docs/database-performance-tuning.md
- Added a new detailed guide covering PostgreSQL configuration tuning (memory, planner, connections, parallel queries, logging, stats, autovacuum, etc.) and application-level optimizations (connection pooling, query patterns, caching, monitoring).
- docs/query-specific-optimizations.md
- Added a new guide analyzing specific slow query patterns (complex attribute queries, subject mappings, FQN lookups, active filtering, JSONB).
- Provided specific index recommendations for each query pattern.
- Suggested query rewriting techniques (CTEs, Materialized Views) and application-level optimizations (caching, prepared statements, batching).
- postgres-performance.conf
- Added a new PostgreSQL configuration file with tuned parameters for memory (
shared_buffers
,effective_cache_size
,work_mem
,maintenance_work_mem
), planner (random_page_cost
,effective_io_concurrency
), parallel queries, logging (log_min_duration_statement
), statistics (pg_stat_statements
), autovacuum, and more.
- Added a new PostgreSQL configuration file with tuned parameters for memory (
- scripts/apply-performance-indexes.sh
- Added a new bash script to automate the process of applying performance indexes.
- Includes steps for running baseline performance tests, backing up existing indexes, checking database stats, applying the migration script, verifying index creation, running post-optimization tests, and generating a rollback script.
- scripts/db-performance-test.sql
- Added a new SQL script containing
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
statements for key slow queries to measure performance. - Includes tests for complex attribute queries, subject mappings, FQN lookups, active record filtering, complex JOINs with GROUP BY, and JSONB queries.
- Also includes commands to show database statistics, index usage, and table sizes.
- Added a new SQL script containing
- service/policy/db/migrations/20250122000000_add_performance_indexes.sql
- Added a new database migration file.
- Created over 40 new indexes, including foreign key indexes, composite indexes, partial indexes for active records, GIN indexes for JSONB, covering indexes, and indexes for timestamp fields.
- Added
ANALYZE
commands at the end to update table statistics after index creation.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Indexes added, scans now few,
Queries run fast, like morning dew.
CPU sighs, takes a rest,
Performance tuning, put to the test.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a comprehensive set of database performance optimizations, including new indexes, PostgreSQL configuration tuning, query optimization guides, and a performance-focused Docker Compose setup. The effort to improve database performance is commendable, and the documentation provided is extensive.
However, there are several critical and high-severity issues that need to be addressed before this PR can be merged, particularly concerning security and the robustness of the deployment scripts.
Summary of Findings
- Security: Hardcoded Credentials: Multiple services in
docker-compose-performance.yaml
use hardcoded 'changeme' passwords. This is a critical security risk and must be addressed by using environment variables, .env files, or a secrets management solution. - Script Robustness: Rollback Mechanism: The
apply-performance-indexes.sh
script generates a rollback script with a hardcoded list of index names. This is fragile and prone to errors if the main migration changes. A dynamic approach to generating the rollback script is needed. - Configuration: Migration File Naming: The SQL migration file and its references in documentation use a future date, which is unconventional and potentially confusing.
- Database: Indexing Strategy: There's a potential discrepancy in the FQN indexing strategy between the migration file and documentation. Also, some partial indexes on the
active
column might need re-evaluation for optimal query coverage if other columns are frequently part of theWHERE
clause. - Configuration: PostgreSQL Parallelism: In
postgres-performance.conf
,max_parallel_workers
is set higher than the CPU limit defined indocker-compose-performance.yaml
, which might not be optimal. - Testing: Unused Table in Performance Test Script: The
db-performance-test.sql
script defines a temporary tableperformance_results
that is not subsequently used.
Merge Readiness
This pull request makes substantial strides in optimizing database performance. The documentation and new configurations are very thorough. However, due to the critical security vulnerabilities from hardcoded credentials and the high risk associated with the current index rollback mechanism, I must recommend that these changes not be merged until these issues are resolved. Addressing the medium-severity items related to configuration and script robustness would also significantly improve the quality of this contribution. I am unable to approve pull requests, but I advise that these changes receive further review and approval after the critical and high severity issues are fixed.
- config_file=/etc/postgresql/postgresql.conf | ||
environment: | ||
POSTGRES_USER: postgres | ||
POSTGRES_PASSWORD: changeme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KC_HTTP_PORT: "8888" | ||
KC_HTTPS_PORT: "8443" | ||
KEYCLOAK_ADMIN: admin | ||
KEYCLOAK_ADMIN_PASSWORD: changeme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KC_DB_URL_PORT: 5432 | ||
KC_DB_URL_DATABASE: keycloak | ||
KC_DB_USERNAME: keycloak | ||
KC_DB_PASSWORD: changeme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using default/placeholder passwords like 'changeme' in configuration files is a significant security risk, even in development-focused setups. These can easily be overlooked and deployed to staging or even production environments. Could these be configured using environment variables that are sourced from a .env
file (which is gitignored) or a secrets management system, rather than being hardcoded here?
KC_DB_PASSWORD: ${KEYCLOAK_DB_PASSWORD}
Style Guide References
restart: always | ||
user: postgres | ||
environment: | ||
POSTGRES_PASSWORD: changeme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PGPASSWORD="${PGPASSWORD:-}" psql -h "$DB_HOST" -p "$DB_PORT" -U "$DB_USER" -d "$DB_NAME" -t <<EOF >> "$BACKUP_DIR/rollback_indexes.sql" | ||
SELECT 'DROP INDEX IF EXISTS ' || schemaname || '.' || indexname || ';' | ||
FROM pg_indexes | ||
WHERE indexname LIKE 'idx_%' | ||
AND schemaname = 'opentdf' | ||
AND indexname IN ( | ||
'idx_attribute_definitions_namespace_id', | ||
'idx_attribute_definitions_namespace_id_active', | ||
'idx_attribute_values_attribute_definition_id', | ||
'idx_attribute_values_attribute_definition_id_active', | ||
'idx_attribute_value_members_value_id', | ||
'idx_attribute_value_members_member_id', | ||
'idx_resource_mappings_attribute_value_id', | ||
'idx_resource_mappings_attribute_value_id_active', | ||
'idx_subject_mappings_attribute_value_id', | ||
'idx_subject_mappings_condition_set_id', | ||
'idx_subject_mappings_attribute_value_id_active', | ||
'idx_key_access_grants_namespace_id', | ||
'idx_key_access_grants_attribute_value_id', | ||
'idx_attribute_fqn_composite', | ||
'idx_attribute_namespaces_active', | ||
'idx_attribute_definitions_active', | ||
'idx_attribute_values_active', | ||
'idx_resource_mappings_active', | ||
'idx_subject_mappings_active', | ||
'idx_subject_condition_set_condition_gin', | ||
'idx_attributes_namespace_lookup', | ||
'idx_values_definition_lookup', | ||
'idx_fqn_resolution', | ||
'idx_attribute_values_groupby', | ||
'idx_subject_mappings_aggregation', | ||
'idx_attribute_definitions_covering', | ||
'idx_attribute_values_covering', | ||
'idx_resource_mappings_created_at', | ||
'idx_subject_mappings_created_at', | ||
'idx_resource_mappings_updated_at', | ||
'idx_subject_mappings_updated_at', | ||
'idx_key_access_grants_composite' | ||
); | ||
EOF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rollback script (rollback_indexes.sql
) is generated by hardcoding a list of index names. This is highly problematic and error-prone:
- If the main migration file (
20250122000000_add_performance_indexes.sql
) is updated (e.g., an index name changes, an index is added/removed), this hardcoded list will become outdated, leading to incorrect rollbacks. - It relies on the assumption that all these indexes were indeed created by this specific migration run, which might not always be true if the script is run multiple times or in different states.
Could this be made more robust? For example:
- During the migration, log the exact names of indexes successfully created.
- Then, generate the
DROP INDEX
statements based on that dynamic list.
Alternatively, if your migration framework supports 'down' migrations, that would be the standard way to handle rollbacks.
Style Guide References
- "8888:8888" | ||
- "8443:8443" | ||
healthcheck: | ||
test: ['CMD-SHELL', '[ -f /tmp/HealthCheck.java ] || echo "public class HealthCheck { public static void main(String[] args) throws java.lang.Throwable { System.exit(java.net.HttpURLConnection.HTTP_OK == ((java.net.HttpURLConnection)new java.net.URL(args[0]).openConnection()).getResponseCode() ? 0 : 1); } }" > /tmp/HealthCheck.java && java /tmp/HealthCheck.java http://localhost:8888/auth/health/live'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Keycloak healthcheck command is quite complex, involving echoing a Java class to a file and then compiling and running it. This seems a bit brittle and adds dependencies (JDK in the Keycloak image for java
command if not already present for this purpose). Could a simpler HTTP GET request to Keycloak's native health endpoint (e.g., http://localhost:8888/auth/health/live
) using curl
or wget
achieve the same result more reliably and with fewer dependencies?
test: ["CMD", "curl", "-f", "http://localhost:8888/auth/health/live"]
Style Guide References
CREATE TEMP TABLE IF NOT EXISTS performance_results ( | ||
test_name TEXT, | ||
execution_time INTERVAL, | ||
query_plan TEXT, | ||
run_timestamp TIMESTAMP DEFAULT NOW() | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The temporary table performance_results
is created but doesn't seem to be used later in the script to store the outcomes of the EXPLAIN
commands. The EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
commands will output to psql
's standard output.
Was the intention to parse these JSON outputs and store specific metrics (like execution time) in this table for easier comparison, or is this table a leftover from a previous approach? If it's not used, perhaps it could be removed to avoid confusion?
Style Guide References
@@ -0,0 +1,91 @@ | |||
-- Performance optimization indexes for OpenTDF Platform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The migration filename 20250122000000_add_performance_indexes.sql
(and its references in documentation) uses a future date (January 22, 2025). Migration filenames typically use the timestamp of their creation to ensure correct ordering and reflect history. Is there a specific reason for this future dating? It might be clearer to use the actual creation date.
Style Guide References
|
||
-- 2. Indexes for Common Query Patterns | ||
-- FQN lookups (composite index for namespace + attribute + value pattern) | ||
CREATE INDEX idx_attribute_fqn_composite ON attribute_fqns(namespace, attribute, value, fqn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a discrepancy in the FQN indexing strategy. This file creates idx_attribute_fqn_composite ON attribute_fqns(namespace, attribute, value, fqn)
.
However, docs/query-specific-optimizations.md
(line 75) suggests a potentially more optimal index for FQN lookups: CREATE UNIQUE INDEX idx_fqn_lookup_unique ON attribute_fqns(fqn) INCLUDE (namespace_id, attribute_id, value_id);
.
If fqn
is unique and the primary way to look up these records, an index starting with fqn
(and possibly UNIQUE
) is generally more efficient. The INCLUDE
clause is excellent for covering queries.
Could you clarify the intended primary lookup pattern for FQNs and ensure the most effective index is implemented? If fqn
itself is unique, the UNIQUE INDEX ON (fqn)
approach is often preferred.
Style Guide References
CREATE INDEX idx_attribute_namespaces_active ON attribute_namespaces(active) WHERE active = true; | ||
CREATE INDEX idx_attribute_definitions_active ON attribute_definitions(active) WHERE active = true; | ||
CREATE INDEX idx_attribute_values_active ON attribute_values(active) WHERE active = true; | ||
CREATE INDEX idx_resource_mappings_active ON resource_mappings(active) WHERE active = true; | ||
CREATE INDEX idx_subject_mappings_active ON subject_mappings(active) WHERE active = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The partial indexes on (active) WHERE active = true
(e.g., idx_attribute_namespaces_active ON attribute_namespaces(active) WHERE active = true;
) are most effective when active = true
is highly selective (i.e., filters out a large portion of the table) AND queries filter primarily or only by the active
column.
If queries often include other conditions, like SELECT ... FROM attribute_namespaces WHERE active = true AND namespace_id = ?;
, a composite partial index such as CREATE INDEX ON attribute_namespaces(namespace_id) WHERE active = true;
or CREATE INDEX ON attribute_namespaces(namespace_id, active);
(if active
isn't always true in the WHERE
clause) might be more beneficial.
Have these partial indexes been evaluated against the common query patterns to ensure they provide the best coverage? For example, if attribute_namespaces
is often queried by id
or name
along with active = true
, the current index on (active)
alone might not be fully utilized by the planner for those other columns.
Style Guide References
-- attribute_value_members | ||
CREATE INDEX idx_attribute_value_members_value_id ON attribute_value_members(value_id); | ||
CREATE INDEX idx_attribute_value_members_member_id ON attribute_value_members(member_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this table exists any more? It should've been deleted in the migration here: https://github.com/opentdf/platform/blob/main/service/policy/db/migrations/20240813000000_drop_attribute_value_members.sql
-- 7. Index for timestamp-based queries (if needed for audit/history) | ||
CREATE INDEX idx_resource_mappings_created_at ON resource_mappings(created_at); | ||
CREATE INDEX idx_subject_mappings_created_at ON subject_mappings(created_at); | ||
CREATE INDEX idx_resource_mappings_updated_at ON resource_mappings(updated_at); | ||
CREATE INDEX idx_subject_mappings_updated_at ON subject_mappings(updated_at); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should not be necessary as we never filter queries by these columns at the moment.
Added comprehensive performance enhancements, including 40+ new indexes, PostgreSQL tuning configuration, query-specific optimizations, application-level caching, and monitoring scripts. Introduced dedicated guides for index implementation, query analysis, and performance testing. Also included a performance-optimized Docker Compose file and PostgreSQL config adjustments to reduce sequential scans, lower CPU usage, and improve query execution times by 10-100x.
Proposed Changes
Checklist
Testing Instructions