Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment with NYC geocoder for missing geoms #36

Open
danrademacher opened this issue Dec 4, 2021 · 1 comment
Open

Experiment with NYC geocoder for missing geoms #36

danrademacher opened this issue Dec 4, 2021 · 1 comment
Assignees

Comments

@danrademacher
Copy link
Member

danrademacher commented Dec 4, 2021

After the most recent round of fixes for null geometries, we still have about 10% of records with no geometries:

SELECT count(cartodb_id) FROM crashes_all_prod 

1,950,062 crashes in the whole universe.

SELECT count(cartodb_id) FROM crashes_all_prod 
where 
latitude is  null

187,456 without geometry. That's 9.6%

Of those, 40,557 have neither on_street nor off_street, so they could never be located. That's 2% of total.

SELECT count(cartodb_id) FROM crashes_all_prod 
where 
latitude is  null and 
length(on_street_name)=0 and 
length(off_street_name)=0

35,483 have both cross streets and could be candidates for geocoding. That's 1.8% of the total.

SELECT count(cartodb_id) FROM crashes_all_prod 
where 
latitude is  null and 
length(on_street_name)>0 and 
length(off_street_name)>0

111,416 have either on_street or off_street but not both. That's 5.7% of total.

SELECT count(cartodb_id) FROM crashes_all_prod 
where 
latitude is  null and 
(length(on_street_name)=0 OR 
length(off_street_name)=0) and not
(length(on_street_name)=0 AND 
length(off_street_name)=0)

That's the hardest set. In some cases, one could get to borough and maybe even smaller, but geocoding these would put a point at a location on the map that is almost certainly inaccurate. Most geocoders would pick the geospatial center of the length of a street or avenue. This point would then be included in all area calculations and fall into whatever boundaries happen to overlap with that center. This is a common problem where, for example, the center of the US, somewhere in Kansas, gets assigned all kinds of stuff that's clearly not there.

I recommended to Chrstine we not pursue this, but if she wants the 1.8% increase, next step is a manual try at geocoding those to see how the results look.

┆Issue is synchronized with this Asana task

@danrademacher danrademacher self-assigned this Dec 4, 2021
@danrademacher
Copy link
Member Author

If we do geocode, take a look at https://labs-geosearch-docs.netlify.app/docs/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant