Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Sonoma Data Scraper #57
Add Sonoma Data Scraper #57
Changes from 23 commits
10f0dfe
e5bceba
7db78f7
05dbee7
998800b
3fcb13f
ed855bd
fdab8a4
8bd2081
bd72db8
ee5a8b7
7745e5b
e7ab26f
dc9b9fe
af8bfe2
adbe419
6b71193
627e82a
a565a83
358a441
7dc3beb
6a4ead9
336e5ac
5297eeb
5093fe3
48dd3c1
2a76315
a8ce742
c9f3500
058a555
fd4e135
8310ca0
148eec8
ada9b2a
40b84e0
eb1a489
fdb2045
1e1b0a8
1f3755a
96b81b5
fb339b4
5d96031
fd09e5e
7770c89
6b4b69b
f1c7f05
5c9a9ed
41d61c4
06163e2
ba6df28
ad0e174
2bf3faf
bac1b5b
6a0ef8c
15456e1
329f92d
3822877
f070125
d1aec84
1deaa9c
869418a
6ef13b4
5fdc2aa
aed862f
898672d
a549ea4
97b72c1
28df7be
6ddf682
4a92856
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Over in news, I use
dateutil
for this:data-covid19-sfbayarea/news/san_francisco.py
Line 58 in 575f138
strptime
works, but ISO 8601 can come in a few different formats, and they could make changes that are still valid but don’t work with yourstrptime
format string.dateutil.parser
will handle them, though.Also, we should make sure the output has a time zone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this result type is worth defining a type alias or maybe a data class for? It’s fairly complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be using
get_cells()
?Or, would a better abstraction be something like (warning: untested code):
It doesn’t go so far as to to try and rename or restructure things, but at least protects you from columns being reordered and maybe makes access at least a little simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Alameda, I believe @elaguerta loaded the defaults from the data model JSON so that any race/ethnicities we don’t have values for come through as
-1
or0
instead of being unlisted. We should probably do the same here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth a case-insensitive comparison, just in case this changes in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we would use the chhs as a common source for all hospitalization records. It looks like this is where most counties are getting their information, and we can be sure to have some sort of standard across counties by going straight to the chhs API. This referenced in issue #29. This is why the data_model doesn't currently include a hospitalization sub-table, I thought we could add it separately from the individual county scrapers. I decided to exclude hospitalization data available through SF county, to avoid confusion when we do get to #29.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be useful to lower-case gender here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be useful to call
page.raise_for_status()
just in case you got an error page.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is super convenient, but I worry about the order changing here. Maybe safer to call
sonoma_soup.find(re.compile(r'^h\d$'), string='whatever')
for each?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets us the wrong format:
But we should have a list, and the group names should be slightly different: https://github.com/sfbrigade/data-covid19-sfbayarea/tree/master/data_models#cases-2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our data_model.json had inconsistent use of "age_group" values (only 1 instance was a list). I changed all the "age_group" tables to be lists in #52, in progress.
As far as using the group names, I realized I need to go do that for Alameda County and SF county. And we will probably have to manually order the list by group-name, since they're often coming through as unordered .json objects (in curly braces).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is outputting different keys than we have for hospitalizations below. We should unify them.
Here it’s plural:
In hospitalizations it’s singular: