Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HOTFIX: Rewrite Marin data scraper to use Socrata #167

Conversation

Mr0grog
Copy link
Collaborator

@Mr0grog Mr0grog commented Jan 8, 2021

Marin now publishes COVID data in an actual data portal with a real API and changed the dashboard to use Tableau. That broke our scraper (because all the charts are now built in an entirely different way). The best way to fix was to rewrite on top of the Socrata data portal API, which we should happily expect to be much more stable.

This also adds caching to the Socrata API so we can make multiple calls to the same URL without actually making multiple HTTP requests. The Marin data is arranged such that a lot of different dimensions of data are combined (unlike most other portals where they are separated), and caching lets us keep the logic straightforward without making unnecessary repeated requests.

This also turns out to fix #162 -- unknowns are now included in the data!

Fixes #165.

Marin now publishes COVID data in an actual data portal with a real API and changed the dashboard to use Tableau. That broke our scraper (because all the charts are now built in an entirely different way). The best way to fix was to rewrite on top of the Socrata data portal API, which we should happily expect to be much more stable.

This also adds caching to the Socrata API so we can make multiple calls to the same URL without actually making multiple HTTP requests. The Marin data is arranged such that a lot of different dimensions of data are combined (unlike most other portals where they are separated), and caching lets us keep the logic straightforward without making unnecessary repeated requests.

This also turns out to fix #162 -- unknowns are now included in the data!

Fixes #165.
@Mr0grog Mr0grog requested review from rickpr and benghancock January 8, 2021 05:53
@benghancock
Copy link
Collaborator

This looks really good @Mr0grog! One question I have is whether there's a value in pulling down the "Total Hospitalized" stats. I haven't cross-checked with the state data that the hospital scraper pulls; perhaps we're already covered, although Marin has demographic breakdowns. Just asking because the data is there.

Otherwise, I think it's good to go!

@Mr0grog
Copy link
Collaborator Author

Mr0grog commented Jan 11, 2021

@benghancock good point! I think I avoided it because we don’t have any standard around it and adding another timeseries is a lot of extra data which could be an issue for the front-end project. It does seem kinda disappointing to leave out when we have it, though.

Since this is a hotfix, I’m going to take your comment as an approval and merge as-is so we’ve got something working, but I’ll file an issue to add hospitalization. You or anyone else can implement.

@Mr0grog Mr0grog merged commit c225690 into master Jan 11, 2021
@Mr0grog Mr0grog deleted the hotfix-marin-charts-are-now-tableu-slash-hallelujah-marin-published-raw-data branch January 11, 2021 05:41
@Mr0grog
Copy link
Collaborator Author

Mr0grog commented Jan 11, 2021

Filed the issue as #170.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch Marin to Use New Data Portal Marin county does not include unknown race/age/gender data
2 participants