-
-
Notifications
You must be signed in to change notification settings - Fork 846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update br cnefe #5482
Update br cnefe #5482
Conversation
Updates:
The script can't parse six states (MG 31, PR 41, RJ 33, RS 43, SP 35, BA 29), probably because of the size of addresses zipfile. These are the large states, I'll look into that. This is almost ready for a review. In my previous comment I attached a results file for Sergipe. Is this the best to send sample data for review? Another question, is it possible to host the result files at data.openaddresses.io? I don't have a place to host them. |
Status update: fixed issue with large files. All states were parsed and available here: https://www.dropbox.com/sh/q3qhiut0fy3hwr9/AAC1oTK4WmZPaDO02xa0czd1a?dl=0 Next step is to compare them with the current version, I'll post updates here. About hosting the new source files, I should be able to keep them in my Dropbox account. |
Status update - did a visual inspection in state capitals and noticed large areas without addresses. This is São Paulo: This seems to be happening because there are a lot of addresses in 2019 files that doesn't have a corresponding face centroid as their id are not present in 2019 or 2010 shapefiles. In the script output I see 250k invalid face ids for São Paulo, other states also have a good portion of missing faces. I'll look around a lit bit more, but at the moment this seems a problem with source data. In 2010 data these areas are not missing, one possible solution would be to merge 2019 with 2010, using the later as a fallback where the 2019 addresses are broken. Next steps:
|
Thank you for going through the work of building this and submitting a PR. We're in the process of updating to use 2022 census data, so I'm going to close for now. |
Contributes to #2303. This is a working in progress, opening for visibility.
The updated script (based on @astoff work) unpacks and parses all states, please check the README for run instructions.
CSV output has coordinates but is not complete yet, need to add municipality and state. Also intend to improve logging.
Here is results sample data from Sergipe state:
se.csv.zip
Current OA file is:
https://data.openaddresses.io/cache/uploads/astoff/a4ac0f/br-28.zip