Skip to content

Optimize database performance for OpenTDF platform #2300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pflynn-virtru
Copy link
Member

Added comprehensive performance enhancements, including 40+ new indexes, PostgreSQL tuning configuration, query-specific optimizations, application-level caching, and monitoring scripts. Introduced dedicated guides for index implementation, query analysis, and performance testing. Also included a performance-optimized Docker Compose file and PostgreSQL config adjustments to reduce sequential scans, lower CPU usage, and improve query execution times by 10-100x.

Proposed Changes

Checklist

  • I have added or updated unit tests
  • I have added or updated integration tests (if appropriate)
  • I have added or updated documentation

Testing Instructions

Added comprehensive performance enhancements, including 40+ new indexes, PostgreSQL tuning configuration, query-specific optimizations, application-level caching, and monitoring scripts. Introduced dedicated guides for index implementation, query analysis, and performance testing. Also included a performance-optimized Docker Compose file and PostgreSQL config adjustments to reduce sequential scans, lower CPU usage, and improve query execution times by 10-100x.
@github-actions github-actions bot added comp:db DB component comp:policy Policy Configuration ( attributes, subject mappings, resource mappings, kas registry) docs Documentation size/xl labels May 23, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @pflynn-virtru, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello! Gemini here, providing a summary of this pull request focused on significantly improving the database performance for the OpenTDF platform. The author, @pflynn-virtru, has introduced a comprehensive set of changes aimed at tackling high CPU usage, sequential scans, and slow query response times identified under load. The core strategy involves adding a large number of targeted indexes, providing detailed guides for database tuning and query optimization, adding scripts for testing and applying changes, and including a performance-optimized Docker Compose configuration.

Highlights

  • Extensive Indexing: Over 40 new indexes have been added via a dedicated migration script (service/policy/db/migrations/20250122000000_add_performance_indexes.sql). These include foreign key indexes, composite indexes for common query patterns (like FQN lookups), partial indexes for active records, and GIN indexes for JSONB fields, specifically targeting the elimination of sequential scans and speeding up JOINs and lookups.
  • PostgreSQL Configuration Tuning: A new PostgreSQL configuration file (postgres-performance.conf) is introduced with settings optimized for memory usage, query planning (especially for SSDs), parallel query execution, logging, statistics collection (pg_stat_statements), and autovacuum tuning. This config is integrated into a new performance-optimized Docker Compose file.
  • Comprehensive Documentation: Several new markdown documents (docs/DATABASE_PERFORMANCE_README.md, docs/database-performance-tuning.md, docs/query-specific-optimizations.md) provide detailed context on the performance problems, root causes, the optimization strategy, specific query analysis, recommended indexes, PostgreSQL tuning parameters, application-level optimizations (caching, prepared statements), monitoring techniques, and a rollback plan.
  • Performance Testing and Application Scripts: A new SQL script (scripts/db-performance-test.sql) is included to run baseline and post-optimization performance tests on key query patterns. A bash script (scripts/apply-performance-indexes.sh) is provided for safely applying the index migration, including backups, stats checks, and rollback script generation.
  • Docker Compose for Performance Testing: A new Docker Compose file (docker-compose-performance.yaml) is added to easily spin up the OpenTDF platform with the performance-tuned PostgreSQL configuration, allowing for easier testing and validation of the improvements.

Changelog

Click here to see the changelog
  • docker-compose-performance.yaml
    • Added a new Docker Compose file for performance testing/deployment.
    • Configured opentdfdb service with resource limits (CPU, memory), tmpfs for temporary data, sysctls for network tuning, shm_size, and mounted postgres-performance.conf.
    • Configured opentdf_pgdata volume to use local bind mount for potentially better I/O.
  • docs/DATABASE_PERFORMANCE_README.md
    • Added a new README summarizing the database performance problem, root cause, solution overview, expected improvements (10-100x faster queries, reduced CPU/sequential scans), files created, next steps, quick win index examples, and monitoring commands.
  • docs/database-performance-tuning.md
    • Added a new detailed guide covering PostgreSQL configuration tuning (memory, planner, connections, parallel queries, logging, stats, autovacuum, etc.) and application-level optimizations (connection pooling, query patterns, caching, monitoring).
  • docs/query-specific-optimizations.md
    • Added a new guide analyzing specific slow query patterns (complex attribute queries, subject mappings, FQN lookups, active filtering, JSONB).
    • Provided specific index recommendations for each query pattern.
    • Suggested query rewriting techniques (CTEs, Materialized Views) and application-level optimizations (caching, prepared statements, batching).
  • postgres-performance.conf
    • Added a new PostgreSQL configuration file with tuned parameters for memory (shared_buffers, effective_cache_size, work_mem, maintenance_work_mem), planner (random_page_cost, effective_io_concurrency), parallel queries, logging (log_min_duration_statement), statistics (pg_stat_statements), autovacuum, and more.
  • scripts/apply-performance-indexes.sh
    • Added a new bash script to automate the process of applying performance indexes.
    • Includes steps for running baseline performance tests, backing up existing indexes, checking database stats, applying the migration script, verifying index creation, running post-optimization tests, and generating a rollback script.
  • scripts/db-performance-test.sql
    • Added a new SQL script containing EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) statements for key slow queries to measure performance.
    • Includes tests for complex attribute queries, subject mappings, FQN lookups, active record filtering, complex JOINs with GROUP BY, and JSONB queries.
    • Also includes commands to show database statistics, index usage, and table sizes.
  • service/policy/db/migrations/20250122000000_add_performance_indexes.sql
    • Added a new database migration file.
    • Created over 40 new indexes, including foreign key indexes, composite indexes, partial indexes for active records, GIN indexes for JSONB, covering indexes, and indexes for timestamp fields.
    • Added ANALYZE commands at the end to update table statistics after index creation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


Indexes added, scans now few,
Queries run fast, like morning dew.
CPU sighs, takes a rest,
Performance tuning, put to the test.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive set of database performance optimizations, including new indexes, PostgreSQL configuration tuning, query optimization guides, and a performance-focused Docker Compose setup. The effort to improve database performance is commendable, and the documentation provided is extensive.

However, there are several critical and high-severity issues that need to be addressed before this PR can be merged, particularly concerning security and the robustness of the deployment scripts.

Summary of Findings

  • Security: Hardcoded Credentials: Multiple services in docker-compose-performance.yaml use hardcoded 'changeme' passwords. This is a critical security risk and must be addressed by using environment variables, .env files, or a secrets management solution.
  • Script Robustness: Rollback Mechanism: The apply-performance-indexes.sh script generates a rollback script with a hardcoded list of index names. This is fragile and prone to errors if the main migration changes. A dynamic approach to generating the rollback script is needed.
  • Configuration: Migration File Naming: The SQL migration file and its references in documentation use a future date, which is unconventional and potentially confusing.
  • Database: Indexing Strategy: There's a potential discrepancy in the FQN indexing strategy between the migration file and documentation. Also, some partial indexes on the active column might need re-evaluation for optimal query coverage if other columns are frequently part of the WHERE clause.
  • Configuration: PostgreSQL Parallelism: In postgres-performance.conf, max_parallel_workers is set higher than the CPU limit defined in docker-compose-performance.yaml, which might not be optimal.
  • Testing: Unused Table in Performance Test Script: The db-performance-test.sql script defines a temporary table performance_results that is not subsequently used.

Merge Readiness

This pull request makes substantial strides in optimizing database performance. The documentation and new configurations are very thorough. However, due to the critical security vulnerabilities from hardcoded credentials and the high risk associated with the current index rollback mechanism, I must recommend that these changes not be merged until these issues are resolved. Addressing the medium-severity items related to configuration and script robustness would also significantly improve the quality of this contribution. I am unable to approve pull requests, but I advise that these changes receive further review and approval after the critical and high severity issues are fixed.

- config_file=/etc/postgresql/postgresql.conf
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: changeme
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The opentdfdb service password is also hardcoded to 'changeme'. This is a critical security vulnerability. Please consider using environment variables loaded from a .env file or a proper secrets management solution.

POSTGRES_PASSWORD: ${OPENTDF_POSTGRES_PASSWORD}

Style Guide References

KC_HTTP_PORT: "8888"
KC_HTTPS_PORT: "8443"
KEYCLOAK_ADMIN: admin
KEYCLOAK_ADMIN_PASSWORD: changeme
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Similar to the database password, the Keycloak admin password should not be hardcoded as 'changeme'. What's the plan for managing this secret more securely, perhaps through environment variables or a secrets management tool?

KEYCLOAK_ADMIN_PASSWORD: ${KEYCLOAK_ADMIN_PASSWORD}

Style Guide References

KC_DB_URL_PORT: 5432
KC_DB_URL_DATABASE: keycloak
KC_DB_USERNAME: keycloak
KC_DB_PASSWORD: changeme
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Using default/placeholder passwords like 'changeme' in configuration files is a significant security risk, even in development-focused setups. These can easily be overlooked and deployed to staging or even production environments. Could these be configured using environment variables that are sourced from a .env file (which is gitignored) or a secrets management system, rather than being hardcoded here?

KC_DB_PASSWORD: ${KEYCLOAK_DB_PASSWORD}

Style Guide References

restart: always
user: postgres
environment:
POSTGRES_PASSWORD: changeme
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The keycloakdb service also uses a hardcoded 'changeme' password. This poses the same security risks as mentioned for other services. How can this be made more secure?

POSTGRES_PASSWORD: ${KEYCLOAK_POSTGRES_PASSWORD}

Style Guide References

Comment on lines +155 to +194
PGPASSWORD="${PGPASSWORD:-}" psql -h "$DB_HOST" -p "$DB_PORT" -U "$DB_USER" -d "$DB_NAME" -t <<EOF >> "$BACKUP_DIR/rollback_indexes.sql"
SELECT 'DROP INDEX IF EXISTS ' || schemaname || '.' || indexname || ';'
FROM pg_indexes
WHERE indexname LIKE 'idx_%'
AND schemaname = 'opentdf'
AND indexname IN (
'idx_attribute_definitions_namespace_id',
'idx_attribute_definitions_namespace_id_active',
'idx_attribute_values_attribute_definition_id',
'idx_attribute_values_attribute_definition_id_active',
'idx_attribute_value_members_value_id',
'idx_attribute_value_members_member_id',
'idx_resource_mappings_attribute_value_id',
'idx_resource_mappings_attribute_value_id_active',
'idx_subject_mappings_attribute_value_id',
'idx_subject_mappings_condition_set_id',
'idx_subject_mappings_attribute_value_id_active',
'idx_key_access_grants_namespace_id',
'idx_key_access_grants_attribute_value_id',
'idx_attribute_fqn_composite',
'idx_attribute_namespaces_active',
'idx_attribute_definitions_active',
'idx_attribute_values_active',
'idx_resource_mappings_active',
'idx_subject_mappings_active',
'idx_subject_condition_set_condition_gin',
'idx_attributes_namespace_lookup',
'idx_values_definition_lookup',
'idx_fqn_resolution',
'idx_attribute_values_groupby',
'idx_subject_mappings_aggregation',
'idx_attribute_definitions_covering',
'idx_attribute_values_covering',
'idx_resource_mappings_created_at',
'idx_subject_mappings_created_at',
'idx_resource_mappings_updated_at',
'idx_subject_mappings_updated_at',
'idx_key_access_grants_composite'
);
EOF
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The rollback script (rollback_indexes.sql) is generated by hardcoding a list of index names. This is highly problematic and error-prone:

  1. If the main migration file (20250122000000_add_performance_indexes.sql) is updated (e.g., an index name changes, an index is added/removed), this hardcoded list will become outdated, leading to incorrect rollbacks.
  2. It relies on the assumption that all these indexes were indeed created by this specific migration run, which might not always be true if the script is run multiple times or in different states.

Could this be made more robust? For example:

  • During the migration, log the exact names of indexes successfully created.
  • Then, generate the DROP INDEX statements based on that dynamic list.
    Alternatively, if your migration framework supports 'down' migrations, that would be the standard way to handle rollbacks.

Style Guide References

- "8888:8888"
- "8443:8443"
healthcheck:
test: ['CMD-SHELL', '[ -f /tmp/HealthCheck.java ] || echo "public class HealthCheck { public static void main(String[] args) throws java.lang.Throwable { System.exit(java.net.HttpURLConnection.HTTP_OK == ((java.net.HttpURLConnection)new java.net.URL(args[0]).openConnection()).getResponseCode() ? 0 : 1); } }" > /tmp/HealthCheck.java && java /tmp/HealthCheck.java http://localhost:8888/auth/health/live']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Keycloak healthcheck command is quite complex, involving echoing a Java class to a file and then compiling and running it. This seems a bit brittle and adds dependencies (JDK in the Keycloak image for java command if not already present for this purpose). Could a simpler HTTP GET request to Keycloak's native health endpoint (e.g., http://localhost:8888/auth/health/live) using curl or wget achieve the same result more reliably and with fewer dependencies?

test: ["CMD", "curl", "-f", "http://localhost:8888/auth/health/live"]

Style Guide References

Comment on lines +12 to +17
CREATE TEMP TABLE IF NOT EXISTS performance_results (
test_name TEXT,
execution_time INTERVAL,
query_plan TEXT,
run_timestamp TIMESTAMP DEFAULT NOW()
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The temporary table performance_results is created but doesn't seem to be used later in the script to store the outcomes of the EXPLAIN commands. The EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) commands will output to psql's standard output.

Was the intention to parse these JSON outputs and store specific metrics (like execution time) in this table for easier comparison, or is this table a leftover from a previous approach? If it's not used, perhaps it could be removed to avoid confusion?

Style Guide References

@@ -0,0 +1,91 @@
-- Performance optimization indexes for OpenTDF Platform
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The migration filename 20250122000000_add_performance_indexes.sql (and its references in documentation) uses a future date (January 22, 2025). Migration filenames typically use the timestamp of their creation to ensure correct ordering and reflect history. Is there a specific reason for this future dating? It might be clearer to use the actual creation date.

Style Guide References


-- 2. Indexes for Common Query Patterns
-- FQN lookups (composite index for namespace + attribute + value pattern)
CREATE INDEX idx_attribute_fqn_composite ON attribute_fqns(namespace, attribute, value, fqn);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There seems to be a discrepancy in the FQN indexing strategy. This file creates idx_attribute_fqn_composite ON attribute_fqns(namespace, attribute, value, fqn).
However, docs/query-specific-optimizations.md (line 75) suggests a potentially more optimal index for FQN lookups: CREATE UNIQUE INDEX idx_fqn_lookup_unique ON attribute_fqns(fqn) INCLUDE (namespace_id, attribute_id, value_id);.

If fqn is unique and the primary way to look up these records, an index starting with fqn (and possibly UNIQUE) is generally more efficient. The INCLUDE clause is excellent for covering queries.

Could you clarify the intended primary lookup pattern for FQNs and ensure the most effective index is implemented? If fqn itself is unique, the UNIQUE INDEX ON (fqn) approach is often preferred.

Style Guide References

Comment on lines +35 to +39
CREATE INDEX idx_attribute_namespaces_active ON attribute_namespaces(active) WHERE active = true;
CREATE INDEX idx_attribute_definitions_active ON attribute_definitions(active) WHERE active = true;
CREATE INDEX idx_attribute_values_active ON attribute_values(active) WHERE active = true;
CREATE INDEX idx_resource_mappings_active ON resource_mappings(active) WHERE active = true;
CREATE INDEX idx_subject_mappings_active ON subject_mappings(active) WHERE active = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The partial indexes on (active) WHERE active = true (e.g., idx_attribute_namespaces_active ON attribute_namespaces(active) WHERE active = true;) are most effective when active = true is highly selective (i.e., filters out a large portion of the table) AND queries filter primarily or only by the active column.

If queries often include other conditions, like SELECT ... FROM attribute_namespaces WHERE active = true AND namespace_id = ?;, a composite partial index such as CREATE INDEX ON attribute_namespaces(namespace_id) WHERE active = true; or CREATE INDEX ON attribute_namespaces(namespace_id, active); (if active isn't always true in the WHERE clause) might be more beneficial.

Have these partial indexes been evaluated against the common query patterns to ensure they provide the best coverage? For example, if attribute_namespaces is often queried by id or name along with active = true, the current index on (active) alone might not be fully utilized by the planner for those other columns.

Style Guide References

Comment on lines +13 to +15
-- attribute_value_members
CREATE INDEX idx_attribute_value_members_value_id ON attribute_value_members(value_id);
CREATE INDEX idx_attribute_value_members_member_id ON attribute_value_members(member_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this table exists any more? It should've been deleted in the migration here: https://github.com/opentdf/platform/blob/main/service/policy/db/migrations/20240813000000_drop_attribute_value_members.sql

Comment on lines +73 to +77
-- 7. Index for timestamp-based queries (if needed for audit/history)
CREATE INDEX idx_resource_mappings_created_at ON resource_mappings(created_at);
CREATE INDEX idx_subject_mappings_created_at ON subject_mappings(created_at);
CREATE INDEX idx_resource_mappings_updated_at ON resource_mappings(updated_at);
CREATE INDEX idx_subject_mappings_updated_at ON subject_mappings(updated_at);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should not be necessary as we never filter queries by these columns at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:db DB component comp:policy Policy Configuration ( attributes, subject mappings, resource mappings, kas registry) docs Documentation size/xl
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants