-
Notifications
You must be signed in to change notification settings - Fork 1.7k
chore(ingestion): group FOR UPDATE behind env var #32275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This PR introduces a new configuration option DISABLE_GROUP_SELECT_FOR_UPDATE
to control PostgreSQL's SELECT FOR UPDATE
locking in group queries, aiming to reduce database contention.
- Added
DISABLE_GROUP_SELECT_FOR_UPDATE
boolean config inplugin-server/src/types.ts
andconfig.ts
with default valuefalse
- Modified
upsertGroup
inproperties-updater.ts
to accept optionalforUpdate
parameter (defaults totrue
) - Updated group processing in
process-event.ts
to pass!DISABLE_GROUP_SELECT_FOR_UPDATE
to control locking behavior - Potential concern: Retry logic in
upsertGroup
could lead to infinite recursion if race conditions persist
4 file(s) reviewed, no comment(s)
Edit PR Review Bot Settings | Greptile
): Promise<void> { | ||
const result = await this.postgres.query( | ||
): Promise<number> { | ||
const result = await this.postgres.query<{ version: string }>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we returning version as string instead of number? since we are casting it after
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit of a nodejs problem. The postgres library will return a string because the column is bigint (64-bit signed) but nodejs uses double precision floats for all numbers, so there's a possibility the value from the db will overflow.
It's a bit involved to refactor it, as we're using that value in JSON later and it doesn't play well with JS bigints. We live with it because the max integer in JS is ~2^53, so it's very unlikely we'll overflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, makes total sense! 👍 rest of PR looks good
}) | ||
const createdAt = DateTime.min(group?.created_at || DateTime.now(), timestamp) | ||
const version = (group?.version || 0) + 1 | ||
const expectedVersion = (group?.version || 0) + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we have to always evaluate version? Is it because it can be Null? What are the scenarioss in which it can be null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the group does not exist group?.version
will return undefined
.
actualVersion = updatedVersion | ||
// Track the disparity between the version on the database and the version we expected | ||
// Without races, the returned version should be only +1 what we expected | ||
const versionDisparity = updatedVersion - expectedVersion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np: could be wrapped in a modulo in case some unexpected stuff happens, although I believe this is likely impossible to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! just have the string casting thing open, but it's a small thing
Important
👉 Stay up-to-date with PostHog coding conventions for a smoother review.
Problem
The consistency gained for the SELECT FOR UPDATE on Group table is not necessary and is causing resource contention on the DB
Changes
Put the FOR UPDATE part of the Select behind an env var that can be controlled
Does this work well for both Cloud and self-hosted?
How did you test this code?
local tests, it is behind env var