Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{WIP} Use SQLite as the file format for CDC streaming. #715

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

dimitri
Copy link
Owner

@dimitri dimitri commented Mar 21, 2024

This is a Work-In-Progress refactoring. The only reason a PR already exists is to allow me to focus on other bits and pieces and get back to that work later. Feedback is welcome of course, discussion is opened.

@dimitri dimitri self-assigned this Mar 21, 2024
@dimitri dimitri marked this pull request as draft March 21, 2024 11:13
Comment on lines +107 to +109
" and startpos <= $1 "
" and (endpos is null or $2 <= endpos) "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimitri I'm wondering how comparison works here? AFAIK, SQlite stores pg_lsn type as just a blob and we don't seem to provide comparison operator either. Probably because Sqlite uses memcmp to compare blobs and it works correctly on hex string?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the gist of it yeah, LSN as text in hexadecimal still allows using the usual operators to compare values... (I believe?)

Comment on lines +46 to +56
if (privateContext->timeline == 0)
{
log_error("BUG: ld_store_open_replaydb: timeline is zero");
return false;
}

sformat(replayDB->dbfile, MAXPGPATH, "%s/%08d-%08X-%08X.db",
specs->paths.dir,
privateContext->timeline,
LSN_FORMAT_ARGS(privateContext->startpos));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, it is one sqlite database per timeline? If that is the case, I afraid the sqlite database file would grow larger. Currently, we have a script in place to cleanup old files(.json,.sql) which are already applied into the target.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to re-introduce file rotation later. It just seems to much to solve at once, so I'm not sure yet if I'll be able to introduce it within the PR or if it will have to wait. The file rotation will not split within a transaction this time, though, by design.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Thanks @dimitri .

@dimitri dimitri force-pushed the feature/use-sqlite-for-cdc branch 2 times, most recently from e35a012 to b290f30 Compare April 22, 2024 15:38
@dimitri dimitri force-pushed the feature/use-sqlite-for-cdc branch 3 times, most recently from daea3f5 to 9d46a32 Compare May 30, 2024 10:54
@dimitri dimitri force-pushed the feature/use-sqlite-for-cdc branch from 9d46a32 to b83612d Compare June 4, 2024 12:26
@dimitri dimitri force-pushed the feature/use-sqlite-for-cdc branch from b83612d to eadf844 Compare June 17, 2024 12:24
@dimitri dimitri force-pushed the feature/use-sqlite-for-cdc branch 2 times, most recently from 6dd703b to bdb7593 Compare July 2, 2024 12:18
@dimitri dimitri force-pushed the feature/use-sqlite-for-cdc branch 2 times, most recently from 9520000 to cf3e7e4 Compare July 18, 2024 09:29
Add new options to `pgcopydb stream sentinel get` allowing to fetch a single
LSN value, avoiding the need to use `awk`, `jq`, or something equivalent to
then parse a single entry in the command output.
In unit tests where we use both an inject and a test services, we need the
inject service to access the same pgcopydb work directory as the test
service. For that we use an external docker volume.

The previous implementation of that idea was found to be wrong, this is
fixing it by properly using docker APIs for shared "external" volumes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants