Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lake][Fetching vs Joining] Fetch available data from subgraph rather than joining w/ sql. #989

Closed
3 tasks
idiom-bytes opened this issue May 2, 2024 · 4 comments
Labels

Comments

@idiom-bytes
Copy link
Member

idiom-bytes commented May 2, 2024

Background / motivation

  1. It's easier to just fetch the data already from subgraph rather than perform a join (more costly too).
    These, shouldn't have to be obtain through a join with the predictions table.
    {etl_bronze_pdr_predictions_table_name}.pair as pair,
    {etl_bronze_pdr_predictions_table_name}.timeframe as timeframe,
    {etl_bronze_pdr_predictions_table_name}.source as source,

  2. The issue here too, is that in SQL there is 1 slot event being joined <= N prediction events. Again, costly. This is being done as a left join, but it would be good to check the results of bronze_pdr_slots.py as a way to verify this.
    image

TODOs / DoD

  1. Review this query and it's current results/accuracy.
  2. Simplify this query by just getting this data from subgraph.
  3. Review other subgraph queries & etl joins where this could be simplified and fix them

Tasks

  • update slots and other tables to get pair/timeframe/source info from subgraph
  • deprecate implementing this in SQL joins
  • verify that queries are generating corect/expected data
@idiom-bytes idiom-bytes added the Type: Enhancement New feature or request label May 2, 2024
@idiom-bytes idiom-bytes changed the title [Lake][Bronze Slots Table] Review bronze slots. Fetch info directly from subgraph rather than joining w/ sql. [Lake][Fetching vs Joining] Fetch info directly from subgraph rather than joining w/ sql. May 2, 2024
@idiom-bytes idiom-bytes changed the title [Lake][Fetching vs Joining] Fetch info directly from subgraph rather than joining w/ sql. [Lake][Fetching vs Joining] Fetch available data from subgraph rather than joining w/ sql. May 2, 2024
@kdetry
Copy link
Contributor

kdetry commented May 3, 2024

Actually, I couldn't get what the main motivation is.
It can process millions of rows in just a second. Why should we update raw tables now, what is the cost for us?

@idiom-bytes
Copy link
Member Author

Because it feels like the previous work wasn't quite complete.
We can get the data from subgraph and clean this up.

I tagged it as low priority.

@idiom-bytes
Copy link
Member Author

idiom-bytes commented May 14, 2024

I am now reviewing this in issue #1000 and it's coming up again.

The bronze_slots query is taking a very long time to complete, and although it may be to other reasons, I can't help but stare at this super expensive join that we can basically get for free.

This exists in a few different places, and it gets run every time we process an event... notice how silly of a join it is too.. it's basically a configuration (not some special, unique data)

Screenshot from 2024-05-14 13-34-33

@idiom-bytes
Copy link
Member Author

This is now being tracked in #1299 and this will be closed. Please reopen as we address backlog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants