-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support postgres xmin replication #2376
Comments
@waterworthd-cim please take a look at https://github.com/dlt-hub/verified-sources where we have postgres replication source. |
Correct me if I'm wrong but that source requires use of cdc, not xmin according to the documentation? I'm replicating from a Postgres replica, not a primary so CDC isn't available. |
@david-waterworth OK so there's something I'm not aware of :) could you point me to some docs that explain what xmin replicaiton is? we have a regular cdc and one that is right now in PR that replicates from WAL (for old postgres versions): dlt-hub/verified-sources#589 if there's a trick Airbyte does then it would be good to know. |
OK I've found it. it seems you just need to add |
Yeah I've done that with the For anyone else, this is more or less what I did
|
if by postgres source you mean
aren't you getting duplicates in your data? I think you are re-taking the last row with do you still have the exception you've got? |
@rudolfix the >= was deliberate although possibly note required. I based it off an example I found somewhere and there was a discussion around > vs >= and whether it was better to risk duplicates (which will get fixed with merge at an additional cost) vs gaps. I don't think gaps are very likely with xmin, but if you're using a low resolution timestamp column I feel like you could have multiple rows on the source with the same timestamp but not generated at the exact instance of time, and dlt could run in between these updates so only get some of them. So figured better be safe than sorry. I don't have the exception but I can probably recreate, so if I managed I'll post here. It worked fine when I used connectorx (based on this example - https://dlthub.com/blog/dlt-arrow-loading). When I migrated to
and then added
|
Feature description
We have a number of postgres sources that cannot be configured for CDC and many tables are large and don't have a suitable incremental/cursor column (ie. last_changed).
Am currently using airbyte for most our streams as they support cdc, xmin and user supplied cursor column. It would be nice if the postgres source supported all three as well (if I understand correctly it only supports cdc, you have to use sql_database for user supplied and xmin isn't supported).
I think I can use
query_adapter_callback
to implement xmin (I know this can wrap but it's a work-around until I can get last_changed implemented on our source tables) but it would be nice if the postgres source supported all three (or I've misread the documentation, in which case maybe it's not as clear as it could be?)Are you a dlt user?
Yes, I run dlt in production.
Use case
I'm currently using a mix of airbyte and dlt as neither cover all my use cases.
Proposed solution
No response
Related issues
No response
The text was updated successfully, but these errors were encountered: