Fixes #38086 - Slow product last sync audit lookup #11294

sjha4 · 2025-01-29T18:07:27Z

What are the changes introduced in this pull request?

Added an index on action and created datetime for faster filtering and sorting. (Fixes #38171 - Add index on audit to speed up action queries and sorting for latest records theforeman/foreman#10426)
Use limit(1) to reduce memory footprint
Repository detail page doesn't use audit record today to display last sync if tasks have been cleaned up

Considerations taken when implementing this change?

What are the testing steps for this pull request?

Have a few products with repsositories synced.
Cleanup tasks: bundle exec rails foreman_tasks:cleanup TASK_SEARCH='label = Actions::Katello::Repository::Sync' STATES='stopped'
Go to products index page and confirm the last sync date is displayed using the audit record
Go to a repository and make sure you see the same date and not "Not Synced" if an audit record exists.

For performance testing, run this in rails console to create a million audit records for sync action:

repo = Katello::Repository.first
audit = repo.audits.last
audits_data = []
million = 1_000_000

audit_data_template = audit.attributes.slice(
  "auditable_id", "auditable_type", "user_id", "username", "action", "audited_changes", 
  "version", "comment", "associated_id", "associated_type", "request_uuid", "auditable_name", "associated_name"
)

million.times do
  audit_data = audit_data_template.dup
  audit_data["created_at"] = Time.current
  audits_data << audit_data
end

audits_data.each_slice(10_000) do |batch|
  Audited::Audit.insert_all(batch)
end

adamruzicka · 2025-01-30T10:01:17Z

app/models/katello/glue/pulp/repos.rb

+          auditable_type: Katello::Repository.name,
+          action: "sync"
+         )
+         .order(created_at: :desc)


Could you please post here an sql explain analyze of this both with and without the new index?

Good point..I tested this and the new index isn't being used for this query for my data setup..Trying to get some real data to test this since with my duplicated audit records, psql seems to not need indexing outside of created_at..

katello=# EXPLAIN ANALYZE katello-# SELECT "audits".* FROM "audits" WHERE "audits"."auditable_id" = 7 AND "audits"."auditable_type" = 'Katello::Repository' AND "audits"."action" = 'sync' ORDER BY "audits"."created_at" DESC LIMIT 1; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------- Limit (cost=0.56..4.63 rows=1 width=2224) (actual time=0.016..0.016 rows=0 loops=1) -> Index Scan using index_audits_on_auditable_type_id_action_created_at_desc on audits (cost=0.56..244.88 rows=60 width=2224) (actual time=0. 016..0.016 rows=0 loops=1) Index Cond: (((auditable_type)::text = 'Katello::Repository'::text) AND (auditable_id = 7) AND ((action)::text = 'sync'::text)) Planning Time: 0.091 ms Execution Time: 0.049 ms (5 rows) katello=# \d audits; Table "public.audits" Column | Type | Collation | Nullable | Default -----------------+-----------------------------+-----------+----------+------------------------------------ id | integer | | not null | nextval('audits_id_seq'::regclass) auditable_id | integer | | | auditable_type | character varying(255) | | | user_id | integer | | | user_type | character varying(255) | | | username | character varying(255) | | | action | character varying(255) | | | audited_changes | text | | | version | integer | | | 0 comment | character varying(255) | | | associated_id | integer | | | associated_type | character varying(255) | | | request_uuid | character varying(255) | | | created_at | timestamp without time zone | | | remote_address | character varying(255) | | | auditable_name | text | | | associated_name | character varying(255) | | | Indexes: "audits_pkey" PRIMARY KEY, btree (id) "index_audits_on_associated_type_and_associated_id" btree (associated_type, associated_id) "index_audits_on_auditable_type_and_auditable_id_and_version" btree (auditable_type, auditable_id, version) "index_audits_on_auditable_type_id_action_created_at_desc" btree (auditable_type, auditable_id, action, created_at DESC) "index_audits_on_created_at" btree (created_at) "index_audits_on_request_uuid" btree (request_uuid) "index_audits_on_user_type_and_user_id" btree (user_type, user_id) katello=# DROP INDEX IF EXISTS index_audits_on_auditable_type_id_action_created_at_desc; DROP INDEX katello=# EXPLAIN ANALYZE SELECT "audits".* FROM "audits" WHERE "audits"."auditable_id" = 7 AND "audits"."auditable_type" = 'Katello::Repository' AND "audits"."action" = 'sync' ORDER BY "audits"."created_at" DESC LIMIT 1; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------- Limit (cost=311.04..311.05 rows=1 width=2224) (actual time=1.924..1.925 rows=0 loops=1) -> Sort (cost=311.04..311.19 rows=60 width=2224) (actual time=1.924..1.924 rows=0 loops=1) Sort Key: created_at DESC Sort Method: quicksort Memory: 25kB -> Bitmap Heap Scan on audits (cost=5.22..310.74 rows=60 width=2224) (actual time=1.920..1.920 rows=0 loops=1) Recheck Cond: (((auditable_type)::text = 'Katello::Repository'::text) AND (auditable_id = 7)) Filter: ((action)::text = 'sync'::text) Rows Removed by Filter: 7 Heap Blocks: exact=1 -> Bitmap Index Scan on index_audits_on_auditable_type_and_auditable_id_and_version (cost=0.00..5.21 rows=78 width=0) (actual tim e=1.914..1.915 rows=7 loops=1) Index Cond: (((auditable_type)::text = 'Katello::Repository'::text) AND (auditable_id = 7)) Planning Time: 0.178 ms Execution Time: 1.945 ms (13 rows)

sjha4 · 2025-01-31T00:28:07Z

With my script above to duplicate audit records a million times, sql is smarter and is able to optimize using just the created_at index and sequential scanning through the records to find the 1st match to limit to..The new index is used without the limit when sql has to fetch all matching records. Also, the bulk of improvements come from plucking id before the audits query rather than id: product.repositories which creates a join first before filtering..

Marking as draft to look into this and reproducer data a little more..

Fixes #38086 - Slow product last sync audit lookup

b30acf9

pr-processor bot added the Not yet reviewed label Jan 29, 2025

sjha4 mentioned this pull request Jan 29, 2025

Fixes #38171 - Add index on audit to speed up action queries and sorting for latest records theforeman/foreman#10426

Draft

adamruzicka reviewed Jan 30, 2025

View reviewed changes

pr-processor bot removed the Not yet reviewed label Jan 30, 2025

sjha4 marked this pull request as draft January 31, 2025 00:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes #38086 - Slow product last sync audit lookup #11294

Fixes #38086 - Slow product last sync audit lookup #11294

sjha4 commented Jan 29, 2025 •

edited

Loading

adamruzicka Jan 30, 2025

sjha4 Jan 30, 2025

sjha4 Jan 30, 2025

sjha4 commented Jan 31, 2025

Fixes #38086 - Slow product last sync audit lookup #11294

Are you sure you want to change the base?

Fixes #38086 - Slow product last sync audit lookup #11294

Conversation

sjha4 commented Jan 29, 2025 • edited Loading

What are the changes introduced in this pull request?

Considerations taken when implementing this change?

What are the testing steps for this pull request?

adamruzicka Jan 30, 2025

Choose a reason for hiding this comment

sjha4 Jan 30, 2025

Choose a reason for hiding this comment

sjha4 Jan 30, 2025

Choose a reason for hiding this comment

sjha4 commented Jan 31, 2025

sjha4 commented Jan 29, 2025 •

edited

Loading