You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a few months of missing data due to the churn of updating to MDS 0.3.0. This is approximately from September 2019 to November 2019. We need to create a backfill strategy for this.
A few options:
We could backfill using the Airflow DAG (probably selecting every 12th hour, since there is significant overlap in the current ETL time windows), going back to early September when we started having these problems.
We could backfill back to the start of the year using the Airflow DAG. There is a timestamp bug in the Wheels data feed which has since been fixed, and we may be able to fix it by backfilling further.
We write an auxiliary script to perform the backfill instead of using the Airflow DAG, and directly using mds-provider. This may be more work, but could provide finer control over how exactly we want to backfill.
I am leaning towards option two, but I think we should have a conversation about the technical merits of each.
The text was updated successfully, but these errors were encountered:
ian-r-rose
changed the title
Backfill missing data
Backfill missing scooter data
Nov 26, 2019
We have a few months of missing data due to the churn of updating to MDS 0.3.0. This is approximately from September 2019 to November 2019. We need to create a backfill strategy for this.
A few options:
I am leaning towards option two, but I think we should have a conversation about the technical merits of each.
The text was updated successfully, but these errors were encountered: