Skip to content

fix: Flush pool as late as possible during schema cache reloading#4645

Open
mkleczek wants to merge 4 commits intoPostgREST:mainfrom
mkleczek:flush-pool-after-schema-cache-load
Open

fix: Flush pool as late as possible during schema cache reloading#4645
mkleczek wants to merge 4 commits intoPostgREST:mainfrom
mkleczek:flush-pool-after-schema-cache-load

Conversation

@mkleczek
Copy link
Collaborator

@mkleczek mkleczek commented Feb 11, 2026

DISCLAIMER:
This commit was authored entirely by a human without the assistance of LLMs.

retryingSchemaCacheLoad flushes the pool upon every retry before it starts reloading the schema. This is too early as schema reloading might take some time during which new connections might be acquired. The consequence is that:

  • upon successful schema cache reload we might have some connections created with the old schema cache
  • we close connections upon each retry and under load we will keep closing and re-opening connections until schema cache load succeeds

This change is to make sure we flush the pool only after successful schema cache querying but before loading (so that connections acquired during loading wait for it and do not interfere with timing the loading process).

@mkleczek mkleczek force-pushed the flush-pool-after-schema-cache-load branch 3 times, most recently from d5a6f71 to 7221241 Compare February 11, 2026 19:50
@steve-chavez
Copy link
Member

The change seems reasonable. Is there a way to test the previous behavior and that this is an improvement?

I'm thinking we have the schemaCacheLoads metric but perhaps we should also have one metric for pool flushes? Then we might be able to test that this change reduces the flushes.

data MetricsState =
MetricsState {
poolTimeouts :: Counter,
poolAvailable :: Gauge,
poolWaiting :: Gauge,
poolMaxSize :: Gauge,
schemaCacheLoads :: Vector Label1 Counter,
schemaCacheQueryTime :: Gauge,
jwtCacheRequests :: Counter,
jwtCacheHits :: Counter,
jwtCacheEvictions :: Counter
}


Related, just noted these two functions do the same perhaps we should deduplicate:

-- | Flush the connection pool so that any future use of the pool will
-- use connections freshly established after this call.
flushPool :: AppState -> IO ()
flushPool AppState{..} = SQL.release statePool
-- | Destroy the pool on shutdown.
destroyPool :: AppState -> IO ()
destroyPool AppState{..} = SQL.release statePool

@mkleczek
Copy link
Collaborator Author

The change seems reasonable. Is there a way to test the previous behavior and that this is an improvement?

I'm thinking we have the schemaCacheLoads metric but perhaps we should also have one metric for pool flushes? Then we might be able to test that this change reduces the flushes.

Done:

  • Added pgrst_db_pool_flushes_total metric.
  • Added HSpec module dedicated to metrics tests and a test to verify the new metric (I think it is a good idea to have one - we already have JWT cache spec tests that make use of metric state)
  • Added io tests to validate pool flushing behavior (both in normal conditions and with schema loading retries)

Related, just noted these two functions do the same perhaps we should deduplicate:

After latest changes they are no longer duplicate as destroyPool does not emit observations.

@mkleczek mkleczek force-pushed the flush-pool-after-schema-cache-load branch 2 times, most recently from a5f9ece to f9ea218 Compare February 15, 2026 21:13
@steve-chavez
Copy link
Member

steve-chavez commented Feb 17, 2026

Added pgrst_db_pool_flushes_total metric.
Added HSpec module dedicated to metrics tests and a test to verify the new metric (I think it is a good idea to have one - we already have JWT cache spec tests that make use of metric state)
Added io tests to validate pool flushing behavior (both in normal conditions and with schema loading retries)

@mkleczek Could you split those commits into another PR? 🙏 (without the fix here) That would make it easier to review.

@mkleczek mkleczek force-pushed the flush-pool-after-schema-cache-load branch from f9ea218 to 24e71b5 Compare February 17, 2026 13:14
@mkleczek
Copy link
Collaborator Author

mkleczek commented Feb 17, 2026

Added pgrst_db_pool_flushes_total metric.
Added HSpec module dedicated to metrics tests and a test to verify the new metric (I think it is a good idea to have one - we already have JWT cache spec tests that make use of metric state)
Added io tests to validate pool flushing behavior (both in normal conditions and with schema loading retries)

@mkleczek Could you split those commits into another PR? 🙏 (without the fix here) That would make it easier to review.

@steve-chavez Done. Stacked this PR on top of #4658

@mkleczek mkleczek force-pushed the flush-pool-after-schema-cache-load branch 3 times, most recently from b428649 to 3c033ab Compare February 17, 2026 20:28
@mkleczek mkleczek force-pushed the flush-pool-after-schema-cache-load branch 2 times, most recently from e601d59 to 16aedf7 Compare February 25, 2026 05:51
@mkleczek
Copy link
Collaborator Author

@mkleczek Could you split those commits into another PR? 🙏 (without the fix here) That would make it easier to review.

@steve-chavez Done. Stacked this PR on top of #4658

@steve-chavez Rebased on #4668

@mkleczek mkleczek force-pushed the flush-pool-after-schema-cache-load branch from 16aedf7 to 0a348f7 Compare February 25, 2026 18:54
@mkleczek mkleczek force-pushed the flush-pool-after-schema-cache-load branch 5 times, most recently from 2e23441 to 3e7308f Compare March 6, 2026 10:30
@mkleczek mkleczek force-pushed the flush-pool-after-schema-cache-load branch 3 times, most recently from e7045db to e82492b Compare March 13, 2026 05:27
DISCLAIMER:
This commit was authored entirely by a human without the assistance of LLMs.

Some helpers are provided for introspecting metrics already (used in JWT cache tests). This change provides facilities to additionally validate emited Observation events.
A new Spec module is also implemented, adding basic tests of schema cache reloading - their main goal is to excercise the new infrastructure.
DISCLAIMER:
This commit was authored entirely by a human without the assistance of LLMs.

This change introduces a match_log function, that implements lenient matching of log output lines. Tests of log output are retrofited to use the function.
DISCLAIMER:
This commit was authored entirely by a human without the assistance of LLMs.

Emit a dedicated PoolFlushed observation when the DB pool is released during schema cache reload.
DISCLAIMER:
This commit was authored entirely by a human without the assistance of LLMs.

retryingSchemaCacheLoad flushes the pool upon every retry before it starts reloading the schema. This is too early as schema reloading might take some time during which new connections might be acquired. The consequence is that:
* upon successful schema cache reload we might have some connections created with the old schema cache
* we close connections upon each retry and under load we will keep closing and re-opening connections until schema cache load succeeds

This change is to make sure we flush the pool only after successful schema cache querying but before loading (so that connections acquired during loading wait for it and do not interfere with timing the loading process).
@mkleczek mkleczek force-pushed the flush-pool-after-schema-cache-load branch from e82492b to ba16275 Compare March 13, 2026 06:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants