Skip to content

[Lake][Fetching vs Joining] Fetch available data from subgraph rather than joining w/ sql. #989

Closed
@idiom-bytes

Description

@idiom-bytes

Background / motivation

  1. It's easier to just fetch the data already from subgraph rather than perform a join (more costly too).
    These, shouldn't have to be obtain through a join with the predictions table.
    {etl_bronze_pdr_predictions_table_name}.pair as pair,
    {etl_bronze_pdr_predictions_table_name}.timeframe as timeframe,
    {etl_bronze_pdr_predictions_table_name}.source as source,

  2. The issue here too, is that in SQL there is 1 slot event being joined <= N prediction events. Again, costly. This is being done as a left join, but it would be good to check the results of bronze_pdr_slots.py as a way to verify this.
    image

TODOs / DoD

  1. Review this query and it's current results/accuracy.
  2. Simplify this query by just getting this data from subgraph.
  3. Review other subgraph queries & etl joins where this could be simplified and fix them

Tasks

  • update slots and other tables to get pair/timeframe/source info from subgraph
  • deprecate implementing this in SQL joins
  • verify that queries are generating corect/expected data

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions