Postgres metrics for stuck getpage requests #11710

myrrc · 2025-04-24T21:21:59Z

New metrics:

compute_getpage_stuck_requests_total
compute_getpage_max_inflight_stuck_time_ms

github-actions · 2025-04-24T21:22:16Z

If this PR added a GUC in the Postgres fork or neon extension,
please regenerate the Postgres settings in the cloud repo:

make NEON_WORKDIR=path/to/neon/checkout \
  -C goapp/internal/shareddomain/postgres generate

If you're an external contributor, a Neon employee will assist in
making sure this step is done.

github-actions · 2025-04-24T22:43:57Z

8327 tests run: 7827 passed, 0 failed, 500 skipped (full report)

Flaky tests (2)

Postgres 17

test_timeline_size: debug-x86-64-without-lfc

Postgres 15

test_pg_regress[v2-4]: release-arm64-with-lfc

Code coverage* (full report)

functions: 32.7% (8958 of 27399 functions)
lines: 48.9% (78182 of 160031 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
56ab95c at 2025-04-28T18:43:20.006Z :recycle:}

compute/etc/sql_exporter/getpage_stuck_requests_total.libsonnet

pgxn/neon/libpagestore.c

pgxn/neon/neon_perf_counters.c

pgxn/neon/libpagestore.c

knizhnik

I wonder why instead just calculating maximal time we can not just use histogram as for example file_cache_write_hist.
In this In this case we do not need separate counter metrics for stuck requests.

myrrc · 2025-04-29T19:24:09Z

I've considered this as an option, but our current latency histogram treats 10s as +Inf, so either adding a new bucket or another metrics.

Or you suggest turning max_ metric into a histogram? That's also an option but seems a bit extra for me, as we won't care for any but max values, and most of the histogram would just duplicated the exiting histogram for latencies

myrrc requested review from a team as code owners April 24, 2025 21:22

myrrc requested review from bizwark, ololobus and skyzh April 24, 2025 21:22

myrrc force-pushed the myrrc/10327-stuck-getpage-metrics branch from 60142ea to 39a764a Compare April 25, 2025 09:19

ololobus reviewed Apr 25, 2025

View reviewed changes

compute/etc/sql_exporter/getpage_stuck_requests_total.libsonnet Outdated Show resolved Hide resolved

pgxn/neon/libpagestore.c Outdated Show resolved Hide resolved

myrrc changed the title ~~getpage_stuck_requests_total Postgres metric~~ Postgres metrics for stuck getpage requests Apr 25, 2025

myrrc requested a review from ololobus April 25, 2025 11:15

myrrc mentioned this pull request Apr 28, 2025

compute: metrics for stuck/failing getpage requests to alert on pageserver unavailability #10327

Closed

myrrc requested a review from MMeent April 28, 2025 13:38

MMeent reviewed Apr 28, 2025

View reviewed changes

pgxn/neon/neon_perf_counters.c Outdated Show resolved Hide resolved

pgxn/neon/libpagestore.c Outdated Show resolved Hide resolved

pgxn/neon/libpagestore.c Outdated Show resolved Hide resolved

myrrc requested a review from MMeent April 28, 2025 17:29

initial

56ab95c

myrrc force-pushed the myrrc/10327-stuck-getpage-metrics branch from b42bf49 to 56ab95c Compare April 28, 2025 17:33

knizhnik reviewed Apr 29, 2025

View reviewed changes

skyzh approved these changes Apr 29, 2025

View reviewed changes

myrrc requested review from knizhnik and removed request for bizwark April 29, 2025 20:13

knizhnik approved these changes Apr 30, 2025

View reviewed changes

MMeent approved these changes Apr 30, 2025

View reviewed changes

myrrc added this pull request to the merge queue Apr 30, 2025

Merged via the queue into main with commit 8da4ec9 Apr 30, 2025
103 checks passed

myrrc deleted the myrrc/10327-stuck-getpage-metrics branch April 30, 2025 12:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Postgres metrics for stuck getpage requests #11710

Postgres metrics for stuck getpage requests #11710

Uh oh!

myrrc commented Apr 24, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Apr 24, 2025

Uh oh!

github-actions bot commented Apr 24, 2025 •

edited

Loading

Postgres 17

Postgres 15

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

knizhnik left a comment

Uh oh!

myrrc commented Apr 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Postgres metrics for stuck getpage requests #11710

Postgres metrics for stuck getpage requests #11710

Uh oh!

Conversation

myrrc commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 24, 2025

Uh oh!

github-actions bot commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

8327 tests run: 7827 passed, 0 failed, 500 skipped (full report)

Postgres 17

Postgres 15

Code coverage* (full report)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

knizhnik left a comment

Choose a reason for hiding this comment

Uh oh!

myrrc commented Apr 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

myrrc commented Apr 24, 2025 •

edited

Loading

github-actions bot commented Apr 24, 2025 •

edited

Loading