Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backfill missing scooter data #111

Open
ian-r-rose opened this issue Nov 26, 2019 · 0 comments
Open

Backfill missing scooter data #111

ian-r-rose opened this issue Nov 26, 2019 · 0 comments
Assignees

Comments

@ian-r-rose
Copy link
Contributor

We have a few months of missing data due to the churn of updating to MDS 0.3.0. This is approximately from September 2019 to November 2019. We need to create a backfill strategy for this.
A few options:

  1. We could backfill using the Airflow DAG (probably selecting every 12th hour, since there is significant overlap in the current ETL time windows), going back to early September when we started having these problems.
  2. We could backfill back to the start of the year using the Airflow DAG. There is a timestamp bug in the Wheels data feed which has since been fixed, and we may be able to fix it by backfilling further.
  3. We write an auxiliary script to perform the backfill instead of using the Airflow DAG, and directly using mds-provider. This may be more work, but could provide finer control over how exactly we want to backfill.

I am leaning towards option two, but I think we should have a conversation about the technical merits of each.

@ian-r-rose ian-r-rose changed the title Backfill missing data Backfill missing scooter data Nov 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants