-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #1001: [Lake][ETL] Implement incremental ETL pipeline #1423
Conversation
…oader implementation with additional tables
…er-to-implementation. Will begin re-working ETL tests so I can run and get them updated.
…uch that I can test the functionality
…ilable. Created tickets to address various fetching and ETL-related work.
…youts table seem to be working correctly
…w records but no update records. need to test the incremental logic now to make sure it's working.
…uch that we can get a better understanding of whats happening
… follow. ETL output is now easier to read, and let's me follow the majority of the work being done.
…that ETL can rebuild historical data without relying on subgraph
…ating that all rows are there
…t to start validating what the ETL is doing
…nternal calculations. Configured tests to do the first step of processing 1/4 of the data, such that I can control the ETL/lake, and update manually.
… make sure that clamping is working as expected
… 1, as I believe to be expected... debugging issues with the update step
I can't tell which comments are new and which old. I trust that you resolved only comments that you addressed. I would also recommend a second look from another dev on the team, just to be sure. Otherwise LGTM. |
I have unresolved the conversation on all items where I didn't implement any change. Why? either the comments:
@KatunaNorbert @kdetry can you please review, test, or provide any feedback/response? |
Tested is and I might found an issue. The |
I noticed the mermaid diagram added. Not sure if this is in the scope of the PR but here's how the table structure could be improved:
|
Answering trizin's comments in here
This is the goal (to only have bronze), this would also reduce DB size. The challenge is that we need to build all the bronze tables before removing all raw_data from DB. Now that we have Incremental ETL, this should be possible. I'm proposing we deprecate raw tables once we have more bronze tables in place, and start working towards silver (aggregate) tables Challenge: |
because we have discussed all the feedback and there were no more comments
We had multiple reviewers, discussed feedback, and agreed on how to move forward across the board. Merging. |
Pull Request Description
By the way of cherry-picking all changes found in PR #1000, this PR incorporates all updates required to implement the incremental ETL pipeline
What happened?
branch 685 was deleted... I did not realize this PR was a branch from there, so the original PR was also closed....this PR rescues all the work and prepares all changes for merging.