Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #38086 - Slow product last sync audit lookup #11294

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sjha4
Copy link
Member

@sjha4 sjha4 commented Jan 29, 2025

What are the changes introduced in this pull request?

  1. Added an index on action and created datetime for faster filtering and sorting. (Fixes #38171 - Add index on audit to speed up action queries and sorting for latest records theforeman/foreman#10426)
  2. Use limit(1) to reduce memory footprint
  3. Repository detail page doesn't use audit record today to display last sync if tasks have been cleaned up

Considerations taken when implementing this change?

What are the testing steps for this pull request?

  1. Have a few products with repsositories synced.
  2. Cleanup tasks: bundle exec rails foreman_tasks:cleanup TASK_SEARCH='label = Actions::Katello::Repository::Sync' STATES='stopped'
  3. Go to products index page and confirm the last sync date is displayed using the audit record
  4. Go to a repository and make sure you see the same date and not "Not Synced" if an audit record exists.

For performance testing, run this in rails console to create a million audit records for sync action:

repo = Katello::Repository.first
audit = repo.audits.last
audits_data = []
million = 1_000_000

audit_data_template = audit.attributes.slice(
  "auditable_id", "auditable_type", "user_id", "username", "action", "audited_changes", 
  "version", "comment", "associated_id", "associated_type", "request_uuid", "auditable_name", "associated_name"
)

million.times do
  audit_data = audit_data_template.dup
  audit_data["created_at"] = Time.current
  audits_data << audit_data
end

audits_data.each_slice(10_000) do |batch|
  Audited::Audit.insert_all(batch)
end

auditable_type: Katello::Repository.name,
action: "sync"
)
.order(created_at: :desc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please post here an sql explain analyze of this both with and without the new index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point..I tested this and the new index isn't being used for this query for my data setup..Trying to get some real data to test this since with my duplicated audit records, psql seems to not need indexing outside of created_at..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

katello=# EXPLAIN ANALYZE
katello-# SELECT "audits".* 
FROM "audits" 
WHERE "audits"."auditable_id" = 7 
AND "audits"."auditable_type" = 'Katello::Repository' 
AND "audits"."action" = 'sync' 
ORDER BY "audits"."created_at" DESC 
LIMIT 1;
                                                                                 QUERY PLAN                                                       
                           
--------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------
 Limit  (cost=0.56..4.63 rows=1 width=2224) (actual time=0.016..0.016 rows=0 loops=1)
   ->  Index Scan using index_audits_on_auditable_type_id_action_created_at_desc on audits  (cost=0.56..244.88 rows=60 width=2224) (actual time=0.
016..0.016 rows=0 loops=1)
         Index Cond: (((auditable_type)::text = 'Katello::Repository'::text) AND (auditable_id = 7) AND ((action)::text = 'sync'::text))
 Planning Time: 0.091 ms
 Execution Time: 0.049 ms
(5 rows)

katello=# \d audits;
                                           Table "public.audits"
     Column      |            Type             | Collation | Nullable |              Default               
-----------------+-----------------------------+-----------+----------+------------------------------------
 id              | integer                     |           | not null | nextval('audits_id_seq'::regclass)
 auditable_id    | integer                     |           |          | 
 auditable_type  | character varying(255)      |           |          | 
 user_id         | integer                     |           |          | 
 user_type       | character varying(255)      |           |          | 
 username        | character varying(255)      |           |          | 
 action          | character varying(255)      |           |          | 
 audited_changes | text                        |           |          | 
 version         | integer                     |           |          | 0
 comment         | character varying(255)      |           |          | 
 associated_id   | integer                     |           |          | 
 associated_type | character varying(255)      |           |          | 
 request_uuid    | character varying(255)      |           |          | 
 created_at      | timestamp without time zone |           |          | 
 remote_address  | character varying(255)      |           |          | 
 auditable_name  | text                        |           |          | 
 associated_name | character varying(255)      |           |          | 
Indexes:
    "audits_pkey" PRIMARY KEY, btree (id)
    "index_audits_on_associated_type_and_associated_id" btree (associated_type, associated_id)
    "index_audits_on_auditable_type_and_auditable_id_and_version" btree (auditable_type, auditable_id, version)
    "index_audits_on_auditable_type_id_action_created_at_desc" btree (auditable_type, auditable_id, action, created_at DESC)
    "index_audits_on_created_at" btree (created_at)
    "index_audits_on_request_uuid" btree (request_uuid)
    "index_audits_on_user_type_and_user_id" btree (user_type, user_id)

katello=# DROP INDEX IF EXISTS index_audits_on_auditable_type_id_action_created_at_desc;
DROP INDEX
katello=# EXPLAIN ANALYZE
SELECT "audits".* 
FROM "audits" 
WHERE "audits"."auditable_id" = 7 
AND "audits"."auditable_type" = 'Katello::Repository' 
AND "audits"."action" = 'sync' 
ORDER BY "audits"."created_at" DESC 
LIMIT 1;
                                                                                   QUERY PLAN                                                     
                               
--------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------
 Limit  (cost=311.04..311.05 rows=1 width=2224) (actual time=1.924..1.925 rows=0 loops=1)
   ->  Sort  (cost=311.04..311.19 rows=60 width=2224) (actual time=1.924..1.924 rows=0 loops=1)
         Sort Key: created_at DESC
         Sort Method: quicksort  Memory: 25kB
         ->  Bitmap Heap Scan on audits  (cost=5.22..310.74 rows=60 width=2224) (actual time=1.920..1.920 rows=0 loops=1)
               Recheck Cond: (((auditable_type)::text = 'Katello::Repository'::text) AND (auditable_id = 7))
               Filter: ((action)::text = 'sync'::text)
               Rows Removed by Filter: 7
               Heap Blocks: exact=1
               ->  Bitmap Index Scan on index_audits_on_auditable_type_and_auditable_id_and_version  (cost=0.00..5.21 rows=78 width=0) (actual tim
e=1.914..1.915 rows=7 loops=1)
                     Index Cond: (((auditable_type)::text = 'Katello::Repository'::text) AND (auditable_id = 7))
 Planning Time: 0.178 ms
 Execution Time: 1.945 ms
(13 rows)

Screenshot from 2025-01-30 13-09-28

@sjha4
Copy link
Member Author

sjha4 commented Jan 31, 2025

With my script above to duplicate audit records a million times, sql is smarter and is able to optimize using just the created_at index and sequential scanning through the records to find the 1st match to limit to..The new index is used without the limit when sql has to fetch all matching records. Also, the bulk of improvements come from plucking id before the audits query rather than id: product.repositories which creates a join first before filtering..

Marking as draft to look into this and reproducer data a little more..

@sjha4 sjha4 marked this pull request as draft January 31, 2025 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants