Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observability #40

Merged
merged 5 commits into from
Feb 25, 2024
Merged

Observability #40

merged 5 commits into from
Feb 25, 2024

Conversation

nikita-volkov
Copy link
Owner

Adds a stream of events describing the significant state changes in the pool for monitoring purposes.

@nikita-volkov
Copy link
Owner Author

@robx Hey Rob! Can you please give a feedback on this?

@steve-chavez
Copy link

Many thanks for this @nikita-volkov! Just started experimenting on PostgREST/postgrest#3229

Do you think hasql-pool could also expose metrics like "total waiters"? At one point (when hasql-pool used resource-pool) we had that one and other metrics, see PostgREST/postgrest#2129.

Comment on lines +87 to +88
-- If the action is not lightweight, it's recommended to use intermediate bufferring via channels like TBQueue.
-- E.g., if the action is @'atomically' . 'writeTBQueue' yourQueue@, then reading from it and processing can be done on a separate thread.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When could this happen? When the pool size is big and there are lots of connections?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about the user supplied action here. It should be quick since it will block the pool processes.

Comment on lines 6 to 9
= ConnectionEstablishedObservation UUID
| AttemptingToConnectObservation UUID
| FailedToConnectObservation UUID (Maybe ByteString)
| ConnectionReleasedObservation UUID ReleaseReason

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Why does the UUID need to be generated internally? (just wondering if it can be left out for simplifying the code)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for identifying the connection. Lets the user isolate events on a particular connection

@nikita-volkov
Copy link
Owner Author

Many thanks for this @nikita-volkov! Just started experimenting on PostgREST/postgrest#3229

Great! Thanks! Looking forward for feedback!

Do you think hasql-pool could also expose metrics like "total waiters"? At one point (when hasql-pool used resource-pool) we had that one and other metrics, see PostgREST/postgrest#2129.

Could you clarify on what you imply by "total waiters"? Unfortunately I couldn't derive that from the link.

@steve-chavez
Copy link

Could you clarify on what you imply by "total waiters"? Unfortunately I couldn't derive that from the link.

So the motivation is to be able to prevent poolAcquisitionTimeout, on the postgREST side this results in an API error (ref).

For this, I understand we could use the following metrics:

  1. pending_requests (same idea as total waiters) https://opentelemetry.io/docs/specs/semconv/database/database-metrics/#metric-dbclientconnectionspending_requests

    • If this keeps increasing it would be an indicative that a pool acquisition timeout will happen. So an admin can adjust the pool size or tune the queries to prevent it.
    • I've also seen this named as "concurrentWaiters: Number of waiting threads" (ref)
  2. pool wait time https://opentelemetry.io/docs/specs/semconv/database/database-metrics/#metric-dbclientconnectionswait_time

    • Same idea as above, if it keeps increasing then a pool timeout will happen.

@nikita-volkov
Copy link
Owner Author

Thanks for the references! I'll see what can be done.

@nikita-volkov
Copy link
Owner Author

I've taken a look. Seems like both of those metrics are already computable without the features involved in this PR.

executeObservedSession :: Pool -> Monitor -> Session -> IO ()
executeObservedSession pool monitor session = do
  Monitor.incGauge monitor "db.client.connections.pending_requests"
  startTime <- getCurrentTime
  Pool.use pool do
    liftIO $ do
      endTime <- getCurrentTime
      Monitor.observeHistogram monitor "db.client.connections.wait_time" (diffUTCTime endTime startTime)
      Monitor.decGauge monitor "db.client.connections.pending_requests"
    session

@steve-chavez Is that correct?

@steve-chavez
Copy link

@nikita-volkov You're right! Thanks for the snippet!

@steve-chavez
Copy link

Monitor.observeHistogram monitor "db.client.connections.wait_time" (diffUTCTime endTime startTime)

Curious, where does observeHistogram come from? I found decGauge on hoogle but not observeHistogram.

@nikita-volkov
Copy link
Owner Author

Ah. It was just pseudocode. I was implying something like this.

@nikita-volkov nikita-volkov merged commit 92a6ed9 into master Feb 25, 2024
3 checks passed
@nikita-volkov
Copy link
Owner Author

Thanks for cooperation guys! It's released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants