Add Sonoma Data Scraper #57

ldtcooper · 2020-05-17T01:47:25Z

No description provided.

…into organization

…into sonoma

ldtcooper · 2020-05-17T02:00:45Z

The code is mostly done now. At this point, I'm just doing some clean-up and clarification, hence the draft status.
I know my implementation is somewhat repetitive, and I'm open to suggestions to fix that. I played around with the idea of having a generic function to get the rows and cells from a tag, loop through them, and take a function to do any extra cleaning/transformation on them, but that made things more complicated and hard-to-read IMO.

ldtcooper · 2020-06-16T18:40:53Z

It looks like it does. I'll try to fix those conflicts and linter errors tonight

…into sonoma

ldtcooper · 2020-06-17T01:03:55Z

Looks like all the linter and merge problems have been fixed @Mr0grog @elaguerta

README.md

Mr0grog · 2020-06-17T01:39:35Z

covid19_sfbayarea/data/sonoma.py

+import dateutil.parser
+from typing import List, Dict, Union
+from bs4 import BeautifulSoup, element # type: ignore
+from format_error import FormatError # type: ignore


This is broken for me. If I run ./run_scraper_data.sh sonoma or python scraper_data.py sonoma, I get:

Traceback (most recent call last): File "scraper_data.py", line 4, in <module> from covid19_sfbayarea import data as data_scrapers File "/Users/rbrackett/Dev/sfbrigade/data-covid19-sfbayarea/covid19_sfbayarea/data/__init__.py", line 4, in <module> from . import sonoma File "/Users/rbrackett/Dev/sfbrigade/data-covid19-sfbayarea/covid19_sfbayarea/data/sonoma.py", line 8, in <module> from format_error import FormatError ModuleNotFoundError: No module named 'format_error'

Does it work for you? Seems like it should be:

Suggested change

from format_error import FormatError # type: ignore

from .format_error import FormatError

(It also shouldn’t need need the # type: ignore comment.)

Also, while you’re here, maybe it’s worthwhile to make this a shared module up in covid19_sfbayarea/errors.py instead of covid19_sfbayarea/data/format_error.py? Some of the news scrapers also use an identical class, and we should really only have one implementation. Then this line would be:

Suggested change

from format_error import FormatError # type: ignore

from ..errors import FormatError

elaguerta · 2020-07-07T04:49:06Z

In addition to the HTML tables, they are also using ArcGIS dashboards. I found some endpoints:

Cases and deaths reported by Date: https://services1.arcgis.com/P5Mv5GY5S66M8Z1Q/arcgis/rest/services/NCOV_Cases_Sonoma_County/FeatureServer
Hodge Podge of Tables Appended Together, including Age Group, Gender, Test Results, and Race Ethnicity: https://services1.arcgis.com/P5Mv5GY5S66M8Z1Q/ArcGIS/rest/services/NCOV_Cases_Sonoma_County_Statistics/FeatureServer

…into sonoma

README.md

covid19_sfbayarea/data/__init__.py

covid19_sfbayarea/data/format_error.py

covid19_sfbayarea/data/sonoma.py

Mr0grog

Just calling out these notes from the last review, which didn’t seem to be addressed one way or the other. Not sure if you missed them or if you’re disagreeing (which is also fine; it just helps to be clear about it):

ldtcooper · 2020-08-11T00:27:50Z

I did just miss those. Let me take a look

ldtcooper · 2020-08-13T03:10:40Z

Okay, I think I addressed those three points

Mr0grog

Left one “food for thought” note inline, but this looks good to me overall. 👍

Mr0grog · 2020-08-13T17:24:42Z

covid19_sfbayarea/data/sonoma.py

+    """
+    return [el.text for el in row.find_all(['th', 'td'])]
+
+def row_list_to_dict(row: List[str], headers: List[str]) -> UnformattedSeriesItem:


Minor nit: one of the important things about naming (or aliasing) types like this is to change how you conceptualize your values and functions (e.g. you shouldn’t be thinking of UnformattedSeriesItem like a shortcut for Dict[str, str] here; you should be thinking of it like a subclass of dict — it should conceptually be its own separate thing).

So if you’re changing the return type to something named UnformattedSeriesItem, it’s probably a good idea to change the function name to not talk about making a dict and instead call it something like row_list_to_series_item or something.

Mr0grog · 2020-08-13T17:47:46Z

covid19_sfbayarea/data/sonoma.py

+    deaths = []
+    cumul_deaths = 0
+
+    rows = list(reversed(parse_table(cases_tag)))


Oh, forgot to comment on this in the review: I don’t think there’s any reason to convert this to a list, since you’re only iterating over it once and not returning it:

Suggested change

rows = list(reversed(parse_table(cases_tag)))

rows = reversed(parse_table(cases_tag))

Logan Cooper added 17 commits May 2, 2020 13:41

organization Merge CDM readme into readme

10f0dfe

Merge branch 'master' of github.com:sfbrigade/data-covid19-sfbayarea …

e5bceba

…into organization

organization Move data models to own folder

7db78f7

organization Replace tabs with spaces

05dbee7

sonoma Get top level metadata

998800b

Merge branch 'master' of github.com:sfbrigade/data-covid19-sfbayarea …

3fcb13f

…into sonoma

sonoma Move scraper and collect metadata

ed855bd

sonoma Add transmission types

fdab8a4

sonoma Get cases, active, recovered, and death series

8bd2081

sonoma Get case data by age

bd72db8

sonoma Fix table numbers

ee5a8b7

sonoma Add test getter

7745e5b

sonoma Factor out some common code

e7ab26f

sonoma Add cases by race

dc9b9fe

sonoma Add hospitalizations

af8bfe2

sonoma Add hospitalizations by gender

adbe419

sonoma Fix type error

6b71193

ldtcooper requested review from Mr0grog, elaguerta, kwonangela7 and kengoy May 17, 2020 01:47

Logan Cooper added 2 commits May 16, 2020 18:51

sonoma Redo definitions getter

627e82a

sonoma Add get_county function

a565a83

ldtcooper marked this pull request as draft May 17, 2020 01:57

Logan Cooper added 4 commits May 16, 2020 19:30

sonoma Add docstrings

358a441

sonoma Comment out hospitalizations by gender

7dc3beb

sonoma Add docstring for gender hospitalization

6a4ead9

sonoma Remove unused variable

336e5ac

ldtcooper marked this pull request as ready for review May 17, 2020 04:06

Logan Cooper added 2 commits June 16, 2020 17:55

Merge branch 'master' of github.com:sfbrigade/data-covid19-sfbayarea …

15456e1

…into sonoma

sonoma Correct conventions for sonoma

329f92d

Mr0grog reviewed Jun 17, 2020

View reviewed changes

root added 5 commits July 29, 2020 17:53

Merge branch 'master' of github.com:sfbrigade/data-covid19-sfbayarea …

3822877

…into sonoma

Fix conflicts

f070125

Fix error import

d1aec84

Merge branch 'master' of github.com:sfbrigade/data-covid19-sfbayarea …

1deaa9c

…into sonoma

Fix linter errors and import

869418a

Mr0grog requested changes Aug 6, 2020

View reviewed changes

root and others added 2 commits August 8, 2020 11:49

Add type aliases

6ef13b4

Use get cell function for cases

5fdc2aa

ldtcooper requested a review from Mr0grog August 8, 2020 21:05

Mr0grog reviewed Aug 10, 2020

View reviewed changes

ldtcooper added 7 commits August 10, 2020 17:30

Remove data model readme from main readme

aed862f

Add readme link

898672d

Refactor test and gender functions

a549ea4

Refactor all transforn functions but cases

97b72c1

Fix types

28df7be

Add docstrings

6ddf682

Use datetime attribute

4a92856

ldtcooper requested a review from Mr0grog August 13, 2020 03:10

Mr0grog approved these changes Aug 13, 2020

View reviewed changes

Mr0grog reviewed Aug 13, 2020

View reviewed changes

ldtcooper merged commit 8fed7ed into master Aug 18, 2020

ldtcooper deleted the sonoma branch August 18, 2020 23:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Sonoma Data Scraper #57

Add Sonoma Data Scraper #57

ldtcooper commented May 17, 2020

ldtcooper commented May 17, 2020

ldtcooper commented Jun 16, 2020

ldtcooper commented Jun 17, 2020

Mr0grog Jun 17, 2020 •

edited

Loading

elaguerta commented Jul 7, 2020 •

edited

Loading

Mr0grog left a comment

ldtcooper commented Aug 11, 2020

ldtcooper commented Aug 13, 2020

Mr0grog left a comment

Mr0grog Aug 13, 2020

Mr0grog Aug 13, 2020

	from format_error import FormatError # type: ignore
	from .format_error import FormatError

	from format_error import FormatError # type: ignore
	from ..errors import FormatError

	rows = list(reversed(parse_table(cases_tag)))
	rows = reversed(parse_table(cases_tag))

Add Sonoma Data Scraper #57

Add Sonoma Data Scraper #57

Conversation

ldtcooper commented May 17, 2020

ldtcooper commented May 17, 2020

ldtcooper commented Jun 16, 2020

ldtcooper commented Jun 17, 2020

Mr0grog Jun 17, 2020 • edited Loading

Choose a reason for hiding this comment

elaguerta commented Jul 7, 2020 • edited Loading

Mr0grog left a comment

Choose a reason for hiding this comment

ldtcooper commented Aug 11, 2020

ldtcooper commented Aug 13, 2020

Mr0grog left a comment

Choose a reason for hiding this comment

Mr0grog Aug 13, 2020

Choose a reason for hiding this comment

Mr0grog Aug 13, 2020

Choose a reason for hiding this comment

Mr0grog Jun 17, 2020 •

edited

Loading

elaguerta commented Jul 7, 2020 •

edited

Loading