Skip to content

perf: derive last_job_host_summary from query instead of denormalized FK#16332

Open
benthomasson wants to merge 13 commits intoansible:develfrom
benthomasson:perf/remove-last-job-host-summary-updates
Open

perf: derive last_job_host_summary from query instead of denormalized FK#16332
benthomasson wants to merge 13 commits intoansible:develfrom
benthomasson:perf/remove-last-job-host-summary-updates

Conversation

@benthomasson
Copy link
Contributor

@benthomasson benthomasson commented Mar 6, 2026

Summary

  • The playbook_on_stats wrapup path bulk-updates last_job_id and last_job_host_summary_id on every host touched by a job. In the Q4CY25 scale lab this query had a median execution time of 75 seconds due to index churn on main_host (90:1 dead-to-live tuple ratio).
  • This PR eliminates 100% of the write cost by removing the bulk_update entirely and deriving the data at read time instead.
  • Adds JobHostSummary.latest_for_host(host_id) classmethod that derives the same data on demand via JobHostSummary.objects.filter(host_id=host_id).order_by('-id').first()
  • Adds _annotate_host_latest_summary(qs) helper that annotates Host querysets with the latest JobHostSummary ID (single correlated subquery)
  • Adds _prefetch_latest_summaries(hosts) that bulk-fetches full JobHostSummary objects with select_related('job', 'job__job_template') after pagination — 1 subquery + 1 IN query instead of N+1
  • Adds HostSummaryPrefetchMixin that hooks into list() to call the prefetch between pagination and serialization
  • Removes the bulk_update of last_job_id and last_job_host_summary_id in playbook_on_stats
  • Removes pre_delete/post_delete signal handlers that maintained the denormalized FKs on job deletion
  • Removes select_related through the stale FKs (eliminates the 11-table JOIN)
  • The FK fields on Host are left in place (removal requires a separate migration PR)

Query cost for host list views

Before (old code) After (this PR)
1 host query with 11-table JOIN via select_related 1 host query with 1 correlated subquery for latest summary ID
bulk_update of last_job_id + last_job_host_summary_id on every job completion (75s median at scale) No writes — data derived at read time
1 SELECT ... FROM main_jobhostsummary JOIN main_job JOIN main_jobtemplate WHERE id IN (...) per page

Files changed

File Change
awx/main/models/jobs.py Add JobHostSummary.latest_for_host() and latest_job_for_host() classmethods
awx/api/serializers.py HostSerializer._get_last_summary() with caching + prefetch support, get_summary_fields() override, to_representation overwrite of stale FK values
awx/api/views/__init__.py _annotate_host_latest_summary(qs), _prefetch_latest_summaries(hosts), HostSummaryPrefetchMixin applied to all 6 host-serving views; Dashboard: subquery for failed host count, removed stale failures_url
awx/main/models/inventory.py update_computed_fields: subquery for failed host count
awx/main/models/events.py Remove entire bulk_update of last_job_id and last_job_host_summary_id (the 75s query)
awx/main/signals.py Remove save_host_pks_before_job_delete and update_host_last_job_after_job_deleted signal handlers
awx/main/access.py Remove select_related through last_job and last_job_host_summary FKs
awx/main/managers.py Remove .defer() through the stale FK chains
awx/main/tests/functional/models/test_events.py Update assertions to use JobHostSummary.latest_for_host()

Test plan

  • Run a job and verify host detail API still shows correct last_job and last_job_host_summary in related/summary fields
  • Verify dashboard failed host count is correct after job completion
  • Verify has_active_failures is correct on host serializer
  • Verify host list views (inventory hosts, group hosts) show correct summary data
  • Delete a job and verify host API responses update correctly (no stale data)
  • Scale test: verify callback receiver completes stats wrapup faster without the bulk_update

🤖 Generated with Claude Code

Change Type

  • Bug, Docs Fix or other nominal change

@coderabbitai
Copy link

coderabbitai bot commented Mar 6, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Host last-job data denormalization removed: code now derives latest host summaries from JobHostSummary via annotations, Subquery/OuterRef, and cached lookups. Serializers, views, inventory logic, managers, events, and signals updated; JobHostSummary gained latest_for_host / latest_job_for_host helpers.

Changes

Cohort / File(s) Summary
API serializer updates
awx/api/serializers.py
Adds HostSerializer._get_last_summary(obj) and _cached_last_summary; switches get_related, get_summary_fields, to_representation, and get_has_active_failures to read last-job/summary from the cached latest JobHostSummary (use queryset annotations when present, fallback to JobHostSummary.latest_for_host).
Views / dashboard queries
awx/api/views/__init__.py
Imports Subquery, OuterRef; adds _annotate_host_latest_summary(qs) to annotate Host querysets with _latest_* fields (summary id, failed, job fields, template id/name) and applies these annotations across Host-related query paths and dashboard filters.
Host access & manager adjustments
awx/main/access.py, awx/main/managers.py
Removes 'last_job__job_template' and 'last_job_host_summary__job' from HostAccess.select_related; stops deferring last_job/artifact fields in HostManager.get_queryset, reducing reliance on denormalized Host foreign-keys.
JobHostSummary helpers
awx/main/models/jobs.py
Adds classmethods JobHostSummary.latest_for_host(host_id) and JobHostSummary.latest_job_for_host(host_id) to return the most recent summary (and its job) for a host.
Event processing changes
awx/main/models/events.py
Removes bulk updates that wrote Host.last_job_id and Host.last_job_host_summary_id from job-event summary logic; host last-job state is now derived from JobHostSummary without explicit Host field synchronization.
Inventory computed fields
awx/main/models/inventory.py
Replaces last_job_host_summary__failed filter with a per-host Subquery/OuterRef annotation (_latest_failed) computed from JobHostSummary (imports JobHostSummary locally to avoid circular imports).
Signals / denormalization removal
awx/main/signals.py
Removes pre_delete/post_delete receivers and helper functions that maintained Host.last_job and Host.last_job_host_summary; denormalization maintenance deleted in favor of deriving values from JobHostSummary.latest_for_host().
Tests updated
awx/main/tests/functional/models/test_events.py
Replaces assertions on host.last_job_id/host.last_job_host_summary with checks using JobHostSummary.latest_for_host(host.id) to validate latest summary records.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant View
  participant Annotator
  participant Serializer
  participant JobHostSummary
  participant Database

  Client->>View: request host(s) or dashboard
  View->>Annotator: apply _annotate_host_latest_summary(qs) (Subquery/OuterRef)
  Annotator->>Database: run subquery for latest JobHostSummary per host
  Database-->>Annotator: annotated host queryset (_latest_* fields)
  View->>Serializer: serialize host(s) with annotations
  Serializer->>Serializer: call _get_last_summary(host)
  alt annotations present
    Serializer-->>Serializer: build summary from host._latest_* annotations
  else annotations absent
    Serializer->>JobHostSummary: JobHostSummary.latest_for_host(host.id)
    JobHostSummary->>Database: SELECT ... ORDER BY id DESC LIMIT 1
    Database-->>JobHostSummary: summary row
    JobHostSummary-->>Serializer: summary instance
  end
  Serializer-->>View: host representation (derived last_job / last_job_host_summary)
  View-->>Client: response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.58% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and specifically describes the main optimization: deriving last_job_host_summary from queries instead of maintaining a denormalized foreign key field.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can generate a title for your PR based on the changes with custom instructions.

Set the reviews.auto_title_instructions setting to generate a title for your PR based on the changes in the PR with custom instructions.

@benthomasson benthomasson force-pushed the perf/remove-last-job-host-summary-updates branch 2 times, most recently from e474510 to d9b8734 Compare March 6, 2026 21:56
@benthomasson benthomasson force-pushed the perf/remove-last-job-host-summary-updates branch from d9b8734 to c3950f3 Compare March 6, 2026 22:00
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
awx/api/views/__init__.py (1)

207-215: ⚠️ Potential issue | 🟠 Major

Update the dashboard failed-host drilldown to use the new latest-summary logic.

failed is now counted from the subquery, but Line 213 still links to ?last_job_host_summary__failed=True. Since that FK is no longer maintained, the dashboard count and the drill-down list can diverge. Point the URL at a filter backed by the same derived latest-summary predicate instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/api/views/__init__.py` around lines 207 - 215, The dashboard drilldown
URL still uses the old FK filter query param
"last_job_host_summary__failed=True" causing divergence; update the failures_url
to use the same derived predicate used in the view by pointing it to the
annotation name created by latest_summary_failed (annotate('_latest_failed') /
user_hosts_failed) — i.e. build failures_url as reverse('api:host_list',
request=request) + "?_latest_failed=True" so the drilldown filter matches the
computed _latest_failed used to compute 'failed'.
awx/main/signals.py (1)

257-285: ⚠️ Potential issue | 🟠 Major

Don’t bail out on a disappearing latest summary during job deletion.

If latest_for_host() returns the summary for the job currently being deleted, the Job.DoesNotExist path exits without falling back to the next surviving summary. That can leave Host.last_job empty/stale even when an older summary still exists for the host.

Suggested fix
-def _update_host_last_jhs(host):
-    jhs = JobHostSummary.latest_for_host(host.pk)
+def _update_host_last_jhs(host, exclude_job_id=None):
+    jhs = JobHostSummary.objects.filter(host_id=host.pk).exclude(job_id=exclude_job_id).order_by('-id').first()
     update_fields = []
-    try:
-        last_job = jhs.job if jhs else None
-    except Job.DoesNotExist:
-        # The job (and its summaries) have already been/are currently being
-        # deleted, so there's no need to update the host w/ a reference to it
-        return
-    if host.last_job != last_job:
-        host.last_job = last_job
+    last_job_id = jhs.job_id if jhs else None
+    if host.last_job_id != last_job_id:
+        host.last_job_id = last_job_id
         update_fields.append('last_job')
     if update_fields:
         host.save(update_fields=update_fields)
@@
-        _update_host_last_jhs(host)
+        _update_host_last_jhs(host, exclude_job_id=instance.pk)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/main/signals.py` around lines 257 - 285, _update_host_last_jhs currently
bails out when accessing jhs.job raises Job.DoesNotExist, which leaves
Host.last_job stale; change the exception handler so that instead of returning
you re-query for the next surviving JobHostSummary for that host (i.e., fetch
the latest summary excluding summaries whose job PK no longer exists or equals
the deleted job), set last_job to that summary's job or None if none found, then
proceed with the existing update_fields/save logic; update the handling around
JobHostSummary.latest_for_host, the Job.DoesNotExist except block, and
Host.last_job assignment to implement this fallback.
🧹 Nitpick comments (1)
awx/main/models/inventory.py (1)

389-393: Add DB support for this latest-summary lookup shape.

This correlated subquery now sits on the wrapup path via update_computed_fields(). It relies on repeated host_id + latest id lookups, so main_jobhostsummary will become the next hotspot as history grows unless there is a matching composite index for that access pattern.

As per coding guidelines, **: -Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/main/models/inventory.py` around lines 389 - 393, The correlated subquery
using JobHostSummary (latest_summary_failed) orders by '-id' filtered by
host_id, which will become a hotspot; add a composite DB index on JobHostSummary
to support lookups by host then id (e.g. an index on ['host', 'id'] or
['host_id','id'] matching the order_by('-id')) by updating the JobHostSummary
model Meta.indexes to include a models.Index(fields=['host','id'],
name='ix_jobhostsummary_host_id') and generate/apply the migration so the
Subquery lookup (latest_summary_failed) is covered efficiently by the DB.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@awx/api/serializers.py`:
- Around line 1843-1845: The serializer is re-querying
JobHostSummary.latest_for_host multiple times and loses constructed inventory
host behavior; instead, derive last_summary from the already-built
constructed_host_summaries (and set has_active_failures) to avoid repeated DB
calls. Change the code that sets last_job_host_summary/has_active_failures to
first look up the host's summary in constructed_host_summaries (e.g. iterate
res.get('constructed_host_summaries', []) for a summary with host_id == obj.id),
use that summary if found, otherwise call JobHostSummary.latest_for_host once
and cache the result (store it in a local variable or res key) so subsequent
fields reuse it; update references to JobHostSummary.latest_for_host,
constructed_host_summaries, and has_active_failures accordingly.
- Around line 1931-1934: The code currently only clears
ret['last_job_host_summary'] when no derived summary exists, leaving a stale FK
value; change the block that calls JobHostSummary.latest_for_host(obj.id) so
that if last_summary exists you overwrite ret['last_job_host_summary'] with the
derived/serialized summary (e.g., use
JobHostSummarySerializer(last_summary).data or the appropriate representation)
and if not, set it to None; update the code around
JobHostSummary.latest_for_host and the serializer usage to always replace the FK
value returned by super().to_representation(obj).

In `@awx/main/models/jobs.py`:
- Around line 1143-1146: Add a bulk-friendly companion to latest_for_host:
create a classmethod latest_for_hosts on JobHostSummary that returns a Subquery
(or a queryset) suitable for bulk use so callers can annotate/filter hosts in
one query instead of N extra queries; implement e.g. latest_id_subquery =
Subquery(cls.objects.filter(host_id=OuterRef('pk')).order_by('-id').values('id')[:1])
and return that Subquery (or provide a queryset variant that filters cls.objects
by id__in=Subquery(...) for a list of host_ids). Keep latest_for_host unchanged
and name the new helper latest_for_hosts so serializers/endpoints can use it for
bulk endpoints.

---

Outside diff comments:
In `@awx/api/views/__init__.py`:
- Around line 207-215: The dashboard drilldown URL still uses the old FK filter
query param "last_job_host_summary__failed=True" causing divergence; update the
failures_url to use the same derived predicate used in the view by pointing it
to the annotation name created by latest_summary_failed
(annotate('_latest_failed') / user_hosts_failed) — i.e. build failures_url as
reverse('api:host_list', request=request) + "?_latest_failed=True" so the
drilldown filter matches the computed _latest_failed used to compute 'failed'.

In `@awx/main/signals.py`:
- Around line 257-285: _update_host_last_jhs currently bails out when accessing
jhs.job raises Job.DoesNotExist, which leaves Host.last_job stale; change the
exception handler so that instead of returning you re-query for the next
surviving JobHostSummary for that host (i.e., fetch the latest summary excluding
summaries whose job PK no longer exists or equals the deleted job), set last_job
to that summary's job or None if none found, then proceed with the existing
update_fields/save logic; update the handling around
JobHostSummary.latest_for_host, the Job.DoesNotExist except block, and
Host.last_job assignment to implement this fallback.

---

Nitpick comments:
In `@awx/main/models/inventory.py`:
- Around line 389-393: The correlated subquery using JobHostSummary
(latest_summary_failed) orders by '-id' filtered by host_id, which will become a
hotspot; add a composite DB index on JobHostSummary to support lookups by host
then id (e.g. an index on ['host', 'id'] or ['host_id','id'] matching the
order_by('-id')) by updating the JobHostSummary model Meta.indexes to include a
models.Index(fields=['host','id'], name='ix_jobhostsummary_host_id') and
generate/apply the migration so the Subquery lookup (latest_summary_failed) is
covered efficiently by the DB.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 28ae5786-6481-4684-87b9-ca6af2355b1f

📥 Commits

Reviewing files that changed from the base of the PR and between 212546f and e474510.

📒 Files selected for processing (8)
  • awx/api/serializers.py
  • awx/api/views/__init__.py
  • awx/main/access.py
  • awx/main/managers.py
  • awx/main/models/events.py
  • awx/main/models/inventory.py
  • awx/main/models/jobs.py
  • awx/main/signals.py
💤 Files with no reviewable changes (2)
  • awx/main/managers.py
  • awx/main/access.py

…ompletion

The playbook_on_stats wrapup path bulk-updates last_job_host_summary_id
on every host touched by a job. In the Q4CY25 scale lab this query had
a median execution time of 75 seconds due to index churn on main_host.

Replace all reads of the denormalized FK with a new classmethod
JobHostSummary.latest_for_host(host_id) that queries for the most
recent summary on demand. This eliminates the write-side bulk_update
of last_job_host_summary_id entirely.

Changes:
- Add JobHostSummary.latest_for_host() classmethod
- Serializer: use latest_for_host() instead of obj.last_job_host_summary
- Dashboard view: use subquery instead of FK traversal for failed hosts
- Inventory.update_computed_fields: use subquery for failed host count
- events.py: remove last_job_host_summary_id from bulk_update
- signals.py: simplify _update_host_last_jhs to only update last_job
- access.py/managers.py: remove select_related/defer through the FK

The FK field on Host is left in place for now (removal requires a
migration) but is no longer written to.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@benthomasson benthomasson force-pushed the perf/remove-last-job-host-summary-updates branch from c3950f3 to 46e600f Compare March 6, 2026 22:04
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
awx/api/serializers.py (2)

1955-1959: ⚠️ Potential issue | 🟠 Major

Overwrite the top-level IDs with the derived values.

super().to_representation(obj) has already serialized the model FKs. This block only clears them when no summary exists, so hosts with a newer derived summary still return stale last_job_host_summary and can diverge from the related/summary data you now derive from last_summary.

Suggested fix
         last_summary = self._get_last_summary(obj)
-        if 'last_job' in ret and not last_summary:
-            ret['last_job'] = None
-        if 'last_job_host_summary' in ret and not last_summary:
-            ret['last_job_host_summary'] = None
+        if 'last_job' in ret:
+            ret['last_job'] = last_summary.job_id if last_summary and last_summary.job_id else None
+        if 'last_job_host_summary' in ret:
+            ret['last_job_host_summary'] = last_summary.pk if last_summary else None
         return ret
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/api/serializers.py` around lines 1955 - 1959, The current
to_representation flow calls self._get_last_summary(obj) but only clears
top-level FK fields when last_summary is missing, leaving stale IDs; update the
serializer so after calling last_summary you also overwrite the top-level FK
fields (e.g., ret['last_job'] and ret['last_job_host_summary']) with the
corresponding values derived from last_summary when it exists, and keep the
existing behavior of setting them to None when last_summary is falsy; locate
this logic in the to_representation override that calls _get_last_summary(obj)
and update ret accordingly.

1848-1853: ⚠️ Potential issue | 🟠 Major

Keep constructed hosts on the constructed-summary path.

_get_last_summary() now always calls JobHostSummary.latest_for_host(obj.id). That bypasses the constructed-host behavior still used on Lines 1887-1890, so constructed inventory hosts can return the wrong last_job_host_summary / has_active_failures, and list serialization still pays one fallback query per host.

Suggested fix
     `@staticmethod`
     def _get_last_summary(obj):
         """Cache the latest JobHostSummary on the instance to avoid repeated queries."""
         if not hasattr(obj, '_cached_last_summary'):
-            obj._cached_last_summary = JobHostSummary.latest_for_host(obj.id)
+            if getattr(getattr(obj, 'inventory', None), 'kind', None) == 'constructed':
+                obj._cached_last_summary = obj.constructed_host_summaries.order_by('-id').first()
+            else:
+                obj._cached_last_summary = JobHostSummary.latest_for_host(obj.id)
         return obj._cached_last_summary
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/api/serializers.py` around lines 1848 - 1853, The _get_last_summary
method currently always calls JobHostSummary.latest_for_host(obj.id), bypassing
the earlier "constructed-summary" path used for constructed inventory hosts;
update _get_last_summary to first check for and return any constructed/cached
summary already attached to the host object (i.e., prefer an existing
constructed-summary attribute set by the constructed-host code path such as a
prepopulated _cached_last_summary or similarly named constructed-summary
attribute), and only fall back to calling JobHostSummary.latest_for_host(obj.id)
when no constructed summary exists; keep caching behavior by setting
obj._cached_last_summary when you perform the fallback query.
🧹 Nitpick comments (1)
awx/main/models/inventory.py (1)

406-408: Consider caching failed_hosts.count() to avoid duplicate query.

failed_hosts.count() is evaluated twice on lines 406 and 408, resulting in two identical database queries.

♻️ Suggested optimization
+        failed_hosts_count = failed_hosts.count()
         computed_fields = {
-            'has_active_failures': bool(failed_hosts.count()),
+            'has_active_failures': bool(failed_hosts_count),
             'total_hosts': total_hosts,
-            'hosts_with_active_failures': failed_hosts.count(),
+            'hosts_with_active_failures': failed_hosts_count,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/main/models/inventory.py` around lines 406 - 408, The code calls
failed_hosts.count() twice which issues duplicate DB queries; compute it once
into a local variable (e.g., failed_hosts_count) before building the dict and
then use that variable for both 'has_active_failures' (bool(failed_hosts_count))
and 'hosts_with_active_failures' to avoid the extra query. Ensure any existing
uses of failed_hosts.count() in the surrounding scope are replaced with the
cached variable where appropriate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@awx/api/serializers.py`:
- Around line 1861-1875: The manual reconstruction of d['last_job_host_summary']
and d['last_job'] is dropping keys previously exposed by SUMMARIZABLE_FK_FIELDS;
update the block that builds these dicts (the code that references last_summary
and last_job) to include last_summary.description and for last_job include
description, license_error and canceled_on (and any other keys
SUMMARIZABLE_FK_FIELDS originally returned) so the serializer restores the
original summary_fields contract; ensure the keys match the original names
(e.g., 'description', 'license_error', 'canceled_on') and are populated from
last_summary and last_job objects where present.

---

Duplicate comments:
In `@awx/api/serializers.py`:
- Around line 1955-1959: The current to_representation flow calls
self._get_last_summary(obj) but only clears top-level FK fields when
last_summary is missing, leaving stale IDs; update the serializer so after
calling last_summary you also overwrite the top-level FK fields (e.g.,
ret['last_job'] and ret['last_job_host_summary']) with the corresponding values
derived from last_summary when it exists, and keep the existing behavior of
setting them to None when last_summary is falsy; locate this logic in the
to_representation override that calls _get_last_summary(obj) and update ret
accordingly.
- Around line 1848-1853: The _get_last_summary method currently always calls
JobHostSummary.latest_for_host(obj.id), bypassing the earlier
"constructed-summary" path used for constructed inventory hosts; update
_get_last_summary to first check for and return any constructed/cached summary
already attached to the host object (i.e., prefer an existing
constructed-summary attribute set by the constructed-host code path such as a
prepopulated _cached_last_summary or similarly named constructed-summary
attribute), and only fall back to calling JobHostSummary.latest_for_host(obj.id)
when no constructed summary exists; keep caching behavior by setting
obj._cached_last_summary when you perform the fallback query.

---

Nitpick comments:
In `@awx/main/models/inventory.py`:
- Around line 406-408: The code calls failed_hosts.count() twice which issues
duplicate DB queries; compute it once into a local variable (e.g.,
failed_hosts_count) before building the dict and then use that variable for both
'has_active_failures' (bool(failed_hosts_count)) and
'hosts_with_active_failures' to avoid the extra query. Ensure any existing uses
of failed_hosts.count() in the surrounding scope are replaced with the cached
variable where appropriate.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 45eece19-57b2-4fa6-a566-734cd0141889

📥 Commits

Reviewing files that changed from the base of the PR and between e474510 and c3950f3.

📒 Files selected for processing (8)
  • awx/api/serializers.py
  • awx/api/views/__init__.py
  • awx/main/access.py
  • awx/main/managers.py
  • awx/main/models/events.py
  • awx/main/models/inventory.py
  • awx/main/models/jobs.py
  • awx/main/signals.py
💤 Files with no reviewable changes (1)
  • awx/main/access.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • awx/api/views/init.py
  • awx/main/models/events.py
  • awx/main/models/jobs.py
  • awx/main/managers.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
awx/api/serializers.py (3)

1884-1904: ⚠️ Potential issue | 🟠 Major

Restore the full summary_fields.last_job* payload.

This manual rebuild still drops keys declared in SUMMARIZABLE_FK_FIELDS: description for both summaries, plus license_error and canceled_on for last_job. That changes the existing response contract.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/api/serializers.py` around lines 1884 - 1904, The manual reconstruction
of last_job and last_job_host_summary in the serializer (using last_summary,
last_job) is omitting keys defined in SUMMARIZABLE_FK_FIELDS (e.g., description
on summaries, and license_error, canceled_on on last_job); restore the full
summary_fields payload by copying the original annotated dict when present or by
iterating the SUMMARIZABLE_FK_FIELDS entries to populate missing keys instead of
hardcoding a subset — (1) when last_summary.job_data exists, ensure you
deep-copy all keys from last_summary.job_data into d['last_job'] and
d['last_job_host_summary']; (2) when falling back to last_summary.job, include
the fields listed in SUMMARIZABLE_FK_FIELDS (description, license_error,
canceled_on, etc.) from last_job and last_summary into the rebuilt OrderedDicts
so the response contract matches the original summary_fields.

1873-1875: ⚠️ Potential issue | 🟠 Major

Keep constructed hosts on the constructed-summary path.

The fallback here always calls JobHostSummary.latest_for_host(obj.id). For constructed inventories, that bypasses obj.constructed_host_summaries, so nested or non-annotated serializers can return the wrong last_job_host_summary and has_active_failures.

Suggested fix
             else:
                 # Fallback for non-annotated querysets (e.g. nested serializers)
-                obj._cached_last_summary = JobHostSummary.latest_for_host(obj.id)
+                if obj.inventory.kind == 'constructed':
+                    obj._cached_last_summary = obj.constructed_host_summaries.order_by('-id').first()
+                else:
+                    obj._cached_last_summary = JobHostSummary.latest_for_host(obj.id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/api/serializers.py` around lines 1873 - 1875, The fallback should respect
constructed inventories by using the host's constructed_host_summaries when
available instead of directly calling JobHostSummary.latest_for_host; update the
else branch where obj._cached_last_summary is set to first check for and call
obj.constructed_host_summaries.latest_for_host(obj.id) (or equivalent API on
constructed_host_summaries) and only fall back to
JobHostSummary.latest_for_host(obj.id) if constructed_host_summaries is not
present.

1983-1987: ⚠️ Potential issue | 🟠 Major

Always overwrite last_job_host_summary with the derived value.

This only clears the field when no derived summary exists. When one does exist, super().to_representation(obj) can still return the stale dormant FK, so the top-level field disagrees with related and summary_fields.

Suggested fix
         last_summary = self._get_last_summary(obj)
         if 'last_job' in ret and not last_summary:
             ret['last_job'] = None
-        if 'last_job_host_summary' in ret and not last_summary:
-            ret['last_job_host_summary'] = None
+        if 'last_job_host_summary' in ret:
+            ret['last_job_host_summary'] = last_summary.id if last_summary else None
         return ret
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/api/serializers.py` around lines 1983 - 1987, The serializer's
to_representation is only clearing ret['last_job_host_summary'] when
last_summary is missing, leaving a stale FK when a derived summary exists; after
calling super().to_representation(obj) and computing last_summary via
self._get_last_summary(obj), always overwrite ret['last_job_host_summary'] with
the derived value (which may be None) instead of only clearing it — update the
assignment in to_representation so ret['last_job_host_summary'] = last_summary
(or the appropriate derived field from last_summary) to ensure
related/summary_fields and top-level agree.
🧹 Nitpick comments (1)
awx/main/models/inventory.py (1)

389-393: Reuse the failed-host count.

This annotated queryset is counted twice a few lines later. Caching it once avoids rerunning the heaviest query in update_computed_fields().

♻️ Proposed change
         latest_summary_failed = Subquery(
             JobHostSummary.objects.filter(host_id=OuterRef('pk')).order_by('-id').values('failed')[:1]
         )
         failed_hosts = active_hosts.annotate(_latest_failed=latest_summary_failed).filter(_latest_failed=True)
+        failed_hosts_count = failed_hosts.count()
         active_groups = self.groups
@@
         computed_fields = {
-            'has_active_failures': bool(failed_hosts.count()),
+            'has_active_failures': bool(failed_hosts_count),
             'total_hosts': total_hosts,
-            'hosts_with_active_failures': failed_hosts.count(),
+            'hosts_with_active_failures': failed_hosts_count,
             'total_groups': active_groups.count(),
             'has_inventory_sources': bool(active_inventory_sources.count()),
             'total_inventory_sources': active_inventory_sources.count(),
As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

Also applies to: 406-408

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/main/models/inventory.py` around lines 389 - 393, The annotated queryset
computing failed hosts (latest_summary_failed and failed_hosts) is being built
and counted twice in update_computed_fields(); annotate it once and reuse the
result to avoid rerunning the heavy query. Change the code to assign the
annotated queryset to a variable (keep latest_summary_failed and failed_hosts
names), then call failed_hosts.count() once and store that integer (e.g.,
failed_host_count) for subsequent use rather than rebuilding or recounting the
queryset; reuse the cached queryset or cached count in the later code paths
(including the other occurrence around lines 406-408) so the heavy Subquery is
executed only once.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@awx/api/serializers.py`:
- Around line 1854-1859: The annotated summary shim created as
type('AnnotatedSummary', ...) only defines id, failed, host_name, and job_id but
callers expect a pk attribute (e.g., code that reads last_summary.pk), causing
AttributeError; update the shim in the summary assignment to include 'pk' (set
to obj._latest_summary_id) so pk and id are consistent for consumers like
last_summary.pk.

In `@awx/main/models/inventory.py`:
- Around line 389-393: The latest_summary_failed Subquery currently only matches
JobHostSummary.host_id which misses summaries for constructed hosts stored in
JobHostSummary.constructed_host_id; update the Subquery/QuerySet used by
latest_summary_failed (the JobHostSummary.objects.filter(...) used to annotate
active_hosts in inventory.py) to match either host_id=OuterRef('pk') OR
constructed_host_id=OuterRef('pk') so constructed hosts are considered; ensure
the order_by('-id').values('failed')[:1] logic and the annotate name
latest_summary_failed remain unchanged so has_active_failures and
hosts_with_active_failures reflect constructed-host summaries too.

---

Duplicate comments:
In `@awx/api/serializers.py`:
- Around line 1884-1904: The manual reconstruction of last_job and
last_job_host_summary in the serializer (using last_summary, last_job) is
omitting keys defined in SUMMARIZABLE_FK_FIELDS (e.g., description on summaries,
and license_error, canceled_on on last_job); restore the full summary_fields
payload by copying the original annotated dict when present or by iterating the
SUMMARIZABLE_FK_FIELDS entries to populate missing keys instead of hardcoding a
subset — (1) when last_summary.job_data exists, ensure you deep-copy all keys
from last_summary.job_data into d['last_job'] and d['last_job_host_summary'];
(2) when falling back to last_summary.job, include the fields listed in
SUMMARIZABLE_FK_FIELDS (description, license_error, canceled_on, etc.) from
last_job and last_summary into the rebuilt OrderedDicts so the response contract
matches the original summary_fields.
- Around line 1873-1875: The fallback should respect constructed inventories by
using the host's constructed_host_summaries when available instead of directly
calling JobHostSummary.latest_for_host; update the else branch where
obj._cached_last_summary is set to first check for and call
obj.constructed_host_summaries.latest_for_host(obj.id) (or equivalent API on
constructed_host_summaries) and only fall back to
JobHostSummary.latest_for_host(obj.id) if constructed_host_summaries is not
present.
- Around line 1983-1987: The serializer's to_representation is only clearing
ret['last_job_host_summary'] when last_summary is missing, leaving a stale FK
when a derived summary exists; after calling super().to_representation(obj) and
computing last_summary via self._get_last_summary(obj), always overwrite
ret['last_job_host_summary'] with the derived value (which may be None) instead
of only clearing it — update the assignment in to_representation so
ret['last_job_host_summary'] = last_summary (or the appropriate derived field
from last_summary) to ensure related/summary_fields and top-level agree.

---

Nitpick comments:
In `@awx/main/models/inventory.py`:
- Around line 389-393: The annotated queryset computing failed hosts
(latest_summary_failed and failed_hosts) is being built and counted twice in
update_computed_fields(); annotate it once and reuse the result to avoid
rerunning the heavy query. Change the code to assign the annotated queryset to a
variable (keep latest_summary_failed and failed_hosts names), then call
failed_hosts.count() once and store that integer (e.g., failed_host_count) for
subsequent use rather than rebuilding or recounting the queryset; reuse the
cached queryset or cached count in the later code paths (including the other
occurrence around lines 406-408) so the heavy Subquery is executed only once.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d8a5fd84-31bf-4c7e-b68a-0bac481812eb

📥 Commits

Reviewing files that changed from the base of the PR and between c3950f3 and 46e600f.

📒 Files selected for processing (8)
  • awx/api/serializers.py
  • awx/api/views/__init__.py
  • awx/main/access.py
  • awx/main/managers.py
  • awx/main/models/events.py
  • awx/main/models/inventory.py
  • awx/main/models/jobs.py
  • awx/main/signals.py
💤 Files with no reviewable changes (1)
  • awx/main/access.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • awx/main/models/jobs.py

benthomasson and others added 2 commits March 6, 2026 18:33
…ublists

- Add 'pk' to AnnotatedSummary dynamic type (fixes AttributeError in get_related)
- Add job_template_id and job_template_name to subquery annotations so list
  views include these fields in summary_fields.last_job (matching detail views)
- Traverse job__ FK from JobHostSummary instead of using separate UnifiedJob
  subquery with OuterRef on another annotation (cleaner SQL, avoids alias issue)
- Annotate all host sublist views (InventoryHostsList, GroupHostsList,
  GroupAllHostsList, InventorySourceHostsList) to prevent N+1 queries

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tale FKs

Tests were asserting host.last_job_id and host.last_job_host_summary_id
which are no longer updated. Use JobHostSummary.latest_for_host() to
derive the same data, matching the new read-time derivation approach.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
awx/api/serializers.py (3)

1991-1995: ⚠️ Potential issue | 🟠 Major

Always overwrite the dormant last_job_host_summary field in the payload.

super().to_representation() still serializes Host.last_job_host_summary, but this FK is no longer maintained. This branch only clears it when no derived summary exists, so hosts with a newer derived summary still return the stale FK value.

Suggested fix
         last_summary = self._get_last_summary(obj)
-        if 'last_job' in ret and not last_summary:
-            ret['last_job'] = None
-        if 'last_job_host_summary' in ret and not last_summary:
-            ret['last_job_host_summary'] = None
+        if 'last_job' in ret:
+            ret['last_job'] = last_summary.job_id if last_summary and last_summary.job_id else None
+        if 'last_job_host_summary' in ret:
+            ret['last_job_host_summary'] = last_summary.pk if last_summary else None
         return ret
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/api/serializers.py` around lines 1991 - 1995, The payload can return a
stale Host.last_job_host_summary FK because to_representation only clears
ret['last_job_host_summary'] when last_summary is falsy; change the logic in the
serializer's to_representation (after calling super().to_representation()) to
always set ret['last_job_host_summary'] = last_summary (or None) using the value
returned by self._get_last_summary(obj) instead of only clearing it when
last_summary is missing so the derived summary always overwrites the stale FK.

1887-1912: ⚠️ Potential issue | 🟠 Major

summary_fields.last_job* dropped existing keys.

This manual rebuild no longer emits fields still declared in SUMMARIZABLE_FK_FIELDS: description for last_job_host_summary, and description, license_error, and canceled_on for last_job. That is a response-contract regression for clients reading summary_fields.

Please build these dicts from SUMMARIZABLE_FK_FIELDS or extend the annotated/fallback data so the previous keys are preserved.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/api/serializers.py` around lines 1887 - 1912, The manual dict
construction for last_job_host_summary and last_job drops keys declared in
SUMMARIZABLE_FK_FIELDS (e.g., description, license_error, canceled_on); change
the logic to build these summary dicts by iterating over SUMMARIZABLE_FK_FIELDS
(or using it as a template) and copying values from last_summary.job_data when
present or from last_summary.job / last_job in the fallback, rather than
hard-coding individual fields; preserve the existing special-case behavior for
job_template_id/name (remove them only when None) and ensure the code paths that
set d['last_job_host_summary'] and d['last_job'] reuse SUMMARIZABLE_FK_FIELDS to
include all previously exposed keys.

1852-1878: ⚠️ Potential issue | 🟠 Major

Constructed inventory hosts still lose their latest summary.

_get_last_summary() only reads host-based annotations or JobHostSummary.latest_for_host(obj.id). For constructed hosts, the latest row comes from constructed_host_summaries, and awx/api/views/__init__.py:131-149 annotates only host_id, so Line 1873 caches None and get_related() / summary_fields / has_active_failures all go blank for those hosts.

Handle obj.inventory.kind == 'constructed' before the host-based path, or add a matching constructed-host annotation/fallback.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/api/serializers.py` around lines 1852 - 1878, The code in
_get_last_summary currently only checks host-based annotations and
JobHostSummary.latest_for_host, causing constructed inventory hosts to get
obj._cached_last_summary = None; modify _get_last_summary to first check if
obj.inventory.kind == 'constructed' and, if so, use the constructed-host
annotation (constructed_host_summaries) or a constructed-specific fallback to
locate the latest summary (mirroring the existing annotated-summary object
shape) and assign it to obj._cached_last_summary; alternatively, detect and use
any annotated constructed-summary fields (analogous to
_latest_summary_id/_latest_summary_host_name) before falling back to host-based
JobHostSummary.latest_for_host so constructed hosts retain summary_fields and
has_active_failures.
🧹 Nitpick comments (1)
awx/api/views/__init__.py (1)

131-149: Performance trade-off is acceptable but worth monitoring.

The implementation uses 10 separate Subquery annotations, each becoming a correlated scalar subquery in SQL. This is a valid trade-off: one query with multiple scalar subqueries avoids the N+1 problem the PR aims to fix (each host requiring separate lookups).

For large host lists, monitor query execution time. If it becomes problematic, consider consolidating into a single subquery with values() returning multiple fields, or using Prefetch with annotation on the related manager.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/api/views/__init__.py` around lines 131 - 149, The function
_annotate_host_latest_summary currently issues 10 correlated Subquery
annotations (models.JobHostSummary via Subquery) which is acceptable but can be
costly for very large host lists; add a clear TODO comment above
_annotate_host_latest_summary documenting this performance trade-off and
instruct future maintainers to monitor query times and memory for large batches,
and include guidance to replace the multiple Subquery calls with a single
consolidated Subquery (values(...) returning multiple fields or a JSON/tuple) or
to use Prefetch with annotation on the related manager (JobHostSummary) if
monitoring shows degradation; also add a short note to add a benchmark/test that
reproduces large-host query timing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@awx/api/serializers.py`:
- Around line 1991-1995: The payload can return a stale
Host.last_job_host_summary FK because to_representation only clears
ret['last_job_host_summary'] when last_summary is falsy; change the logic in the
serializer's to_representation (after calling super().to_representation()) to
always set ret['last_job_host_summary'] = last_summary (or None) using the value
returned by self._get_last_summary(obj) instead of only clearing it when
last_summary is missing so the derived summary always overwrites the stale FK.
- Around line 1887-1912: The manual dict construction for last_job_host_summary
and last_job drops keys declared in SUMMARIZABLE_FK_FIELDS (e.g., description,
license_error, canceled_on); change the logic to build these summary dicts by
iterating over SUMMARIZABLE_FK_FIELDS (or using it as a template) and copying
values from last_summary.job_data when present or from last_summary.job /
last_job in the fallback, rather than hard-coding individual fields; preserve
the existing special-case behavior for job_template_id/name (remove them only
when None) and ensure the code paths that set d['last_job_host_summary'] and
d['last_job'] reuse SUMMARIZABLE_FK_FIELDS to include all previously exposed
keys.
- Around line 1852-1878: The code in _get_last_summary currently only checks
host-based annotations and JobHostSummary.latest_for_host, causing constructed
inventory hosts to get obj._cached_last_summary = None; modify _get_last_summary
to first check if obj.inventory.kind == 'constructed' and, if so, use the
constructed-host annotation (constructed_host_summaries) or a
constructed-specific fallback to locate the latest summary (mirroring the
existing annotated-summary object shape) and assign it to
obj._cached_last_summary; alternatively, detect and use any annotated
constructed-summary fields (analogous to
_latest_summary_id/_latest_summary_host_name) before falling back to host-based
JobHostSummary.latest_for_host so constructed hosts retain summary_fields and
has_active_failures.

---

Nitpick comments:
In `@awx/api/views/__init__.py`:
- Around line 131-149: The function _annotate_host_latest_summary currently
issues 10 correlated Subquery annotations (models.JobHostSummary via Subquery)
which is acceptable but can be costly for very large host lists; add a clear
TODO comment above _annotate_host_latest_summary documenting this performance
trade-off and instruct future maintainers to monitor query times and memory for
large batches, and include guidance to replace the multiple Subquery calls with
a single consolidated Subquery (values(...) returning multiple fields or a
JSON/tuple) or to use Prefetch with annotation on the related manager
(JobHostSummary) if monitoring shows degradation; also add a short note to add a
benchmark/test that reproduces large-host query timing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 51cb34c9-11e8-4d82-be1e-22d43505eeee

📥 Commits

Reviewing files that changed from the base of the PR and between 46e600f and fe47e18.

📒 Files selected for processing (2)
  • awx/api/serializers.py
  • awx/api/views/__init__.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
awx/main/tests/functional/models/test_events.py (1)

74-77: Keep these assertions independent of JobHostSummary.latest_for_host().

Using the new production helper in the assertions weakens the test: if latest_for_host() regresses, these tests can still stay green because they exercise the same selection path. It also adds an N+1 query loop over hosts. Prefer asserting from a queryset built directly in the test once per case.

♻️ Example pattern
-        for host in Host.objects.all():
-            latest_summary = JobHostSummary.latest_for_host(host.id)
+        summaries_by_host_id = {
+            summary.host_id: summary
+            for summary in JobHostSummary.objects.filter(job=self.job, host_id__isnull=False)
+        }
+        for host in Host.objects.all():
+            latest_summary = summaries_by_host_id.get(host.id)
             assert latest_summary is not None
             assert latest_summary.job_id == self.job.id
             assert latest_summary.host == host

Apply the same idea in the --limit test and assert missing hosts via summaries_by_host_id.get(h.id) is None.

Also applies to: 109-116

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@awx/main/tests/functional/models/test_events.py` around lines 74 - 77,
Replace assertions that call JobHostSummary.latest_for_host(host.id) with
assertions based on a queryset assembled in the test (e.g., build
summaries_by_host_id = {s.host_id: s for s in
JobHostSummary.objects.filter(job=self.job).order_by(...)}), then assert
summaries_by_host_id.get(host.id) and compare its job_id/host to expected
values; do the same change in the `--limit` test (assert missing hosts with
summaries_by_host_id.get(h.id) is None) to avoid using
JobHostSummary.latest_for_host() and eliminate the N+1 query pattern.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@awx/main/tests/functional/models/test_events.py`:
- Around line 74-77: Replace assertions that call
JobHostSummary.latest_for_host(host.id) with assertions based on a queryset
assembled in the test (e.g., build summaries_by_host_id = {s.host_id: s for s in
JobHostSummary.objects.filter(job=self.job).order_by(...)}), then assert
summaries_by_host_id.get(host.id) and compare its job_id/host to expected
values; do the same change in the `--limit` test (assert missing hosts with
summaries_by_host_id.get(h.id) is None) to avoid using
JobHostSummary.latest_for_host() and eliminate the N+1 query pattern.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 719e90cf-a77f-4644-ade4-e3833a89c968

📥 Commits

Reviewing files that changed from the base of the PR and between fe47e18 and 44c5dbd.

📒 Files selected for processing (1)
  • awx/main/tests/functional/models/test_events.py

benthomasson and others added 2 commits March 6, 2026 19:05
The failures_url linked to ?last_job_host_summary__failed=True which
filters on the now-stale FK. The dashboard count itself was already
fixed to use a subquery annotation. Since DashboardView is deprecated
and has_active_failures is a SerializerMethodField (not filterable),
remove the failures_url entirely rather than creating a custom filter.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@AlanCoding
Copy link
Member

This comes at the cost of adding a large number of annotations to every host view.

Sure, you've made the post-job processing faster by not doing those operations. But it's unclear at this point how much worse the host listings have become.

_latest_job_failed=Subquery(latest_summary.values('job__failed')[:1]),
_latest_job_finished=Subquery(latest_summary.values('job__finished')[:1]),
_latest_job_template_id=Subquery(latest_summary.values('job__job_template_id')[:1]),
_latest_job_template_name=Subquery(latest_summary.values('job__job_template__name')[:1]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not doing yourself much favors by annotating every field from JobHostSummary. I would keep _latest_summary_id only, and work on some kind of prefetch_related for the objects, as opposed to avoiding loading the object as you're doing. Saving 1 query isn't going to proportionally make much difference, and I'm far far more worried about other deeper problems with the approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in db374c1. Exactly as you suggested — now annotates only _latest_summary_id and bulk-fetches the full JobHostSummary objects with select_related('job', 'job__job_template') after pagination. 10 correlated subqueries → 1 subquery + 1 IN query.

Instead of annotating every host queryset with 10 correlated subqueries
(summary + job + job_template fields), annotate only _latest_summary_id
and bulk-fetch the full JobHostSummary objects after pagination via
select_related('job', 'job__job_template').

This reduces the SQL from 10 correlated subqueries to 1 subquery + 1 IN
query, addressing review feedback about annotation overhead on host list
views.

- _annotate_host_latest_summary: only annotates _latest_summary_id
- _prefetch_latest_summaries: bulk-fetches and attaches to host objects
- HostSummaryPrefetchMixin: hooks into list() after pagination
- Serializer uses real JobHostSummary objects (no more AnnotatedSummary)
- to_representation always overwrites stale FK values

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@benthomasson
Copy link
Contributor Author

@AlanCoding Good feedback — you're right that 10 correlated subqueries was too heavy. I've refactored in db374c1:

Before: 10 correlated subqueries annotated on every host queryset (summary + job + job_template fields extracted individually)

After: 1 correlated subquery (_latest_summary_id only) + 1 bulk IN query after pagination

The new approach:

  1. _annotate_host_latest_summary(qs) — annotates only _latest_summary_id (single subquery)
  2. _prefetch_latest_summaries(hosts) — after pagination, collects all summary IDs from the page, fetches full JobHostSummary objects in one query with select_related('job', 'job__job_template')
  3. HostSummaryPrefetchMixin — hooks into list() to call the prefetch between pagination and serialization

So for a paginated host list, the total query cost is:

  • 1 host query (with 1 correlated subquery for the latest summary ID)
  • 1 SELECT ... FROM main_jobhostsummary JOIN main_job JOIN main_jobtemplate WHERE id IN (...) for the page

The serializer gets real JobHostSummary model instances with select_related job data — no more synthetic AnnotatedSummary objects. Detail views fall back to a single JobHostSummary.objects.filter(id=...).select_related(...) query.

@benthomasson benthomasson requested a review from AlanCoding March 10, 2026 10:42
h.last_job_id = job.id
updated_hosts.add(h)
if h.id in host_mapping:
h.last_job_host_summary_id = host_mapping[h.id]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that these 2 fields would then need to be removed from the Host model, since they are unused and now inconsistently populated after migration.

Host.last_job

Host.last_job_host_summary

d = super(HostSerializer, self).get_summary_fields(obj)
try:
d['last_job']['job_template_id'] = obj.last_job.job_template.id
d['last_job']['job_template_name'] = obj.last_job.job_template.name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, if the fields won't be used again, we want the diff to flow logically from that. That would entail removing:

    'last_job': DEFAULT_SUMMARY_FIELDS + ('finished', 'status', 'failed', 'license_error', 'canceled_on'),
    'last_job_host_summary': DEFAULT_SUMMARY_FIELDS + ('failed',),

Then, it's more like, oh, that was removed, meaning that the base serializer will no longer populate those. That means that HostSerializer now needs to take full custom control of those. Logical.

Even so, this should still attempt to share logic with the base serializer, which means using DEFAULT_SUMMARY_FIELDS in a loop (like the base serializier) plus the additions instead of enumerating all the fields.

@AlanCoding
Copy link
Member

Let me give a suggestion here.

But first I want to acknowledge that there is a real non-obvious tradeoff between the models & views, because obtaining a good query execution hook is challenging. I've hit this difficulty many times, and have placed that custom-prefetch-related hook in the view for these reasons (more often, I've shoved a cache in the serializer context). However, overriding list and having to manage details of pagination is a messy solution.

So my suggestion would be this.

from django.db import models
from django.db.models import OuterRef, Subquery

from awx.main.models.jobs import JobHostSummary


class HostLatestSummaryQuerySet(models.QuerySet):
    """
    Queryset that annotates and bulk-attaches the latest JobHostSummary
    at queryset evaluation time.

    Important:
    - This is NOT streaming-safe.
    - Once evaluated, it behaves similarly to prefetch_related(): the primary
      queryset result cache must exist so related objects can be attached.
    """

    _awx_latest_summary_attached = False

    def _clone(self):
        clone = super()._clone()
        clone._awx_latest_summary_attached = self._awx_latest_summary_attached
        return clone

    def with_latest_summary_id(self):
        latest_summary = JobHostSummary.objects.filter(host_id=OuterRef('pk')).order_by('-id')
        return self.annotate(
            _latest_summary_id=Subquery(latest_summary.values('id')[:1]),
        )

    def _fetch_all(self):
        super()._fetch_all()

        if self._awx_latest_summary_attached or self._result_cache is None:
            return

        latest_summary_ids = [
            host._latest_summary_id
            for host in self._result_cache
            if getattr(host, '_latest_summary_id', None) is not None
        ]

        summaries_by_id = JobHostSummary.objects.select_related('job').in_bulk(latest_summary_ids)

        for host in self._result_cache:
            latest_summary_id = getattr(host, '_latest_summary_id', None)
            host._latest_summary_cache = summaries_by_id.get(latest_summary_id)

        self._awx_latest_summary_attached = True


class HostLatestSummaryManager(models.Manager.from_queryset(HostLatestSummaryQuerySet)):
    def get_queryset(self):
        return super().get_queryset().with_latest_summary_id()


class Host(models.Model):
    # fields...

    # Keep your real default manager unchanged.
    objects = models.Manager()

    # Dedicated manager: querysets from here always have latest summary support.
    with_latest_summary_objects = HostLatestSummaryManager()

    @property
    def latest_summary(self):
        if hasattr(self, '_latest_summary_cache'):
            return self._latest_summary_cache

        summary = (
            JobHostSummary.objects.filter(host_id=self.pk)
            .order_by('-id')
            .select_related('job')
            .first()
        )
        self._latest_summary_cache = summary
        return summary

    @property
    def latest_job(self):
        summary = self.latest_summary
        if summary is None:
            return None
        return summary.job

This is on the model level, and the queryset would need to be replaced or modified on the view level with the custom manager here.

In all cases working with a Host object, we would want to use the property host.latest_summary. If not optimized, this would be subject O(N) penalty, which will be okay in some cases, but also can be optimized in prefetch-related style as discussed.

…mmary

Per review feedback, replace the view-level HostSummaryPrefetchMixin
with a custom QuerySet that bulk-attaches summaries at evaluation time
(like prefetch_related), and a Host.latest_summary property as the
single access point.

- HostLatestSummaryQuerySet: overrides _fetch_all() to bulk-fetch
  JobHostSummary objects with select_related after queryset evaluation
- HostManager now inherits from the custom queryset via from_queryset()
- Host.latest_summary property: uses cache if available, falls back to
  individual query
- Remove _annotate_host_latest_summary, _prefetch_latest_summaries,
  HostSummaryPrefetchMixin from views — no more list() override needed
- Remove last_job/last_job_host_summary from SUMMARIZABLE_FK_FIELDS
- Serializer uses obj.latest_summary and DEFAULT_SUMMARY_FIELDS loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@benthomasson
Copy link
Contributor Author

@AlanCoding Implemented your suggestion in e58710b:

Custom QuerySet + Manager pattern:

  • HostLatestSummaryQuerySet overrides _fetch_all() to bulk-attach JobHostSummary objects after queryset evaluation (like prefetch_related internally)
  • HostManager inherits from the custom queryset via from_queryset() and auto-applies with_latest_summary_id() in get_queryset()
  • Host.latest_summary property — single access point, uses _latest_summary_cache if available, falls back to individual query

Removed from views:

  • _annotate_host_latest_summary(), _prefetch_latest_summaries(), HostSummaryPrefetchMixin
  • All get_queryset() overrides that only called the annotation helper

Removed from serializer:

  • _get_last_summary() static method — replaced by obj.latest_summary
  • last_job and last_job_host_summary from SUMMARIZABLE_FK_FIELDS
  • Summary fields use DEFAULT_SUMMARY_FIELDS loop for last_job (as suggested)

The views are now clean — they don't need to know about the summary prefetching at all. The manager handles it transparently.

@benthomasson
Copy link
Contributor Author

@AlanCoding One concern with the current implementation: putting with_latest_summary_id() in HostManager.get_queryset() applies the correlated subquery to every Host query globally — including .count(), .exists(), and internal relations that don't need the summary data.

Two options to scope it:

Option A: Separate manager (your original suggestion had with_latest_summary_objects)

class Host(models.Model):
    objects = HostManager()  # no annotation
    with_latest_summary_objects = HostLatestSummaryManager()  # annotated

Option B: Keep one manager, apply in views

# In each host-serving view:
def get_queryset(self):
    return super().get_queryset().with_latest_summary_id()

The _fetch_all() hook already guards with getattr(host, '_latest_summary_id', None), so it handles unannotated querysets gracefully — hosts just won't get the prefetch and will fall back to Host.latest_summary doing an individual query.

Which approach do you prefer?

benthomasson and others added 2 commits March 10, 2026 13:52
- Remove with_latest_summary_id() from HostManager.get_queryset() to
  avoid applying the correlated subquery to every Host query globally
  (count, exists, internal relations)
- Apply with_latest_summary_id() in get_queryset() of the 6
  host-serving views only
- Restore license_error and canceled_on to last_job summary fields
  to avoid breaking API change

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Without this guard, _fetch_all() would set _latest_summary_cache=None
on every host in non-annotated querysets (e.g. Host.objects.filter()),
masking the per-object fallback query in Host.latest_summary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@benthomasson
Copy link
Contributor Author

Added a guard in _fetch_all() (013626a) so the bulk-attach logic only runs when the queryset was actually annotated via with_latest_summary_id(). Without this, _fetch_all() on a non-annotated queryset (e.g. Host.objects.filter(...)) would set _latest_summary_cache = None on every host, masking the per-object fallback query in Host.latest_summary.

The guard checks hasattr(self._result_cache[0], '_latest_summary_id') after the existing empty-cache early return, so it is safe and zero-cost for non-annotated paths.

@benthomasson benthomasson requested a review from AlanCoding March 11, 2026 21:20
@AlanCoding
Copy link
Member

Still catching up, but initially confirming this matches my prior intent and still current preference.

Option A: Separate manager (your original suggestion had with_latest_summary_objects)

I did have:

    with_latest_summary_objects = HostLatestSummaryManager()  # annotated

We would only use this in the view, but I recognize this might have the fundamental flaw the the prefetch executes when doing a .count() which would be bad. That is, if the test case of Host.with_latest_summary_objects.count() executes a query to prefetch the job host summaries, that would be a pretty ugly bug, and a consequence specifically of the plan I laid out. This needs to be an item to verify.

@AlanCoding
Copy link
Member

This has apparently promising API results. I will put up a PR with fixes for the django-debug-toolbar in a bit. Here are results from the same hosts list, filled with 50+ hosts and 10+ jobs for each host.

Before

HTTP 200 OK
Allow: GET, POST, HEAD, OPTIONS
Content-Type: application/json
Vary: Accept
X-API-Node: awx-1
X-API-Product-Name: AWX
X-API-Product-Version: 4.6.20.dev552+gdedc17838.d20260312
X-API-Query-Count: 33
X-API-Query-Time: 0.118s
X-API-Time: 0.251s

After

HTTP 200 OK
Allow: GET, POST, HEAD, OPTIONS
Content-Type: application/json
Vary: Accept
X-API-Node: awx-1
X-API-Product-Name: AWX
X-API-Product-Version: 4.6.20.dev552+gdedc17838.d20260312
X-API-Query-Count: 34
X-API-Query-Time: 0.069s
X-API-Time: 0.209s

try:
last_job = last_summary.job
d['last_job'] = OrderedDict()
for field in DEFAULT_SUMMARY_FIELDS + ('finished', 'status', 'failed', 'license_error', 'canceled_on'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

canceled_on has been flagged as a field added to the API. Better to remove and keep current structure.

if last_summary:
d['last_job_host_summary'] = OrderedDict()
d['last_job_host_summary']['id'] = last_summary.id
d['last_job_host_summary']['name'] = last_summary.host_name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this "name" field is flagged as an API change. Better to remove for this change.

Copy link
Member

@AlanCoding AlanCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made 2 very minor requests for changing of the returned data structure.

Additionally:

  • we should have some tests added awx/main/tests/functional/models/** to test new methods available from Host.objects, mainly what might be obvious from the net new code here + address the use of .count() and that it doesn't trigger the _fetch_all logic
  • need to run a YOLO to check it, I can do this

Otherwise I think this looks in good shape, and the benefits are pretty solid.

@AlanCoding
Copy link
Member

Link related fix to Django debug toolbar #16352

…summary

Per reviewer feedback: these fields were not in the original API contract
via SUMMARIZABLE_FK_FIELDS and their addition would be an API change.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@benthomasson
Copy link
Contributor Author

Addressed both inline requests in 58047e6:

  1. Removed name from last_job_host_summary summary fields — avoids API change
  2. Removed canceled_on from last_job summary fields — keeps current structure

Regarding removing Host.last_job and Host.last_job_host_summary FK fields from the model: agreed, planning that as a follow-up migration PR once this lands.

Also added the _fetch_all() guard (013626a) to skip bulk-attach on non-annotated querysets, and the .count() case Alan mentioned is safe — .count() doesn't call _fetch_all() (it goes through get_aggregation()), so the prefetch logic never fires on count/exists queries.

Re: tests — happy to add functional tests for Host.objects.with_latest_summary_id() and verify .count() doesn't trigger _fetch_all(). Will push those next.

…mmary

Tests cover:
- with_latest_summary_id() annotation and most-recent selection
- _fetch_all() bulk-attach behavior on annotated querysets
- _fetch_all() skips non-annotated querysets (preserves fallback)
- .count() and .exists() do NOT trigger _fetch_all
- Host.latest_summary cache hits (zero queries) and fallback
- Host.latest_job property
- select_related on bulk-attached summaries (no N+1)
- Chaining preserves annotation
- Multiple jobs / partial host coverage

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@benthomasson
Copy link
Contributor Author

Added functional tests in 2c33358 (test_host_queryset.py). Coverage includes:

  • with_latest_summary_id() annotation and most-recent selection
  • _fetch_all() bulk-attach on annotated querysets
  • _fetch_all() skips non-annotated querysets (preserves fallback)
  • .count() and .exists() do NOT trigger _fetch_all()
  • Host.latest_summary cache hits (zero queries) vs fallback path
  • Host.latest_job property
  • select_related on bulk-attached summaries (no N+1)
  • Chaining preserves annotation
  • Multiple jobs / partial host coverage

@benthomasson benthomasson requested a review from AlanCoding March 19, 2026 04:39
benthomasson and others added 2 commits March 19, 2026 00:43
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ben Thomasson <bthomass@redhat.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ben Thomasson <bthomass@redhat.com>
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants