Fix deadlock: reduce number rows deleted at per iter, add retry #27027

dantecatalfamo · 2025-03-11T14:10:07Z

Changes file added for user-visible changes in changes/, orbit/changes/ or ee/fleetd-chrome/changes.
See Changes files for more information.
A detailed QA plan exists on the associated ticket (if it isn't there, work with the product group's QA engineer to add it)
- Make sure cron completes successfully with thousands of unsaved expired queries
Manual QA for all new/changed functionality

codecov · 2025-03-11T15:38:37Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 63.98%. Comparing base (6bbf645) to head (5ab2cd2).
Report is 11 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #27027      +/-   ##
==========================================
- Coverage   63.99%   63.98%   -0.01%     
==========================================
  Files        1704     1705       +1     
  Lines      162237   162796     +559     
  Branches     4327     4327              
==========================================
+ Hits       103822   104164     +342     
- Misses      50346    50518     +172     
- Partials     8069     8114      +45

Flag	Coverage Δ
backend	`64.74% <100.00%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

lucasmrod · 2025-03-13T11:03:12Z

server/datastore/mysql/activities.go

@@ -587,13 +589,13 @@ func (ds *Datastore) CleanupActivitiesAndAssociatedData(ctx context.Context, max
 		var rowsAffected int64

 		// Start a new transaction for each batch of deletions.
-		err := ds.withTx(ctx, func(tx sqlx.ExtContext) error {
+		err := ds.withRetryTxx(ctx, func(tx sqlx.ExtContext) error {


On environments with heavy/constant use of live queries this might increase the latency of the live queries (retries might impact transactions for live queries). Which can impact Zero Trust flows that use live queries to determine device health.

Another approach we can use to reduce chance of deadlock is to not use a transaction that modifies three tables and instead use DB reader and writer, something like this:

Use DB reader to know which queries are expired (get ids).

Use DB writer to delete queries (ids from 1).

Use DB reader to determine which distributed_query_campaigns to delete (get ids).

Use DB writer to delete distributed_query_campaigns (ids from 3).

Use DB reader to determine which distributed_query_campaign_targets to delete (get ids).

Use DB writer to delete distributed_query_campaign_targets (ids from 5).

AFAICS we can delete the queries separately from the (not in a transaction) because these are non-accessible non-saved expired queries and there's no foreign key association.

Am all ears.

lucasmrod

Left a comment to discuss a different approach.

Reduce the number of queries deleted at a time, add retry

d3bc7b0

dantecatalfamo temporarily deployed to Docker Hub March 11, 2025 14:10 — with GitHub Actions Inactive

dantecatalfamo changed the title ~~Reduce the number of queries deleted at a time, add retry~~ Fix deadlock: reduce the rows deleted at a time, add retry Mar 11, 2025

Add changes/

5ab2cd2

dantecatalfamo temporarily deployed to Docker Hub March 11, 2025 15:25 — with GitHub Actions Inactive

dantecatalfamo changed the title ~~Fix deadlock: reduce the rows deleted at a time, add retry~~ Fix deadlock: reduce number rows deleted at a time, add retry Mar 11, 2025

dantecatalfamo changed the title ~~Fix deadlock: reduce number rows deleted at a time, add retry~~ Fix deadlock: reduce number rows deleted at per iter, add retry Mar 11, 2025

dantecatalfamo assigned lucasmrod Mar 11, 2025

dantecatalfamo marked this pull request as ready for review March 11, 2025 18:29

dantecatalfamo requested a review from a team as a code owner March 11, 2025 18:29

lucasmrod reviewed Mar 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix deadlock: reduce number rows deleted at per iter, add retry #27027

Fix deadlock: reduce number rows deleted at per iter, add retry #27027

dantecatalfamo commented Mar 11, 2025 •

edited

Loading

codecov bot commented Mar 11, 2025

lucasmrod Mar 13, 2025

lucasmrod left a comment

Fix deadlock: reduce number rows deleted at per iter, add retry #27027

Are you sure you want to change the base?

Fix deadlock: reduce number rows deleted at per iter, add retry #27027

Conversation

dantecatalfamo commented Mar 11, 2025 • edited Loading

codecov bot commented Mar 11, 2025

Codecov Report

lucasmrod Mar 13, 2025

Choose a reason for hiding this comment

lucasmrod left a comment

Choose a reason for hiding this comment

dantecatalfamo commented Mar 11, 2025 •

edited

Loading