Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update br cnefe #5482

Closed
wants to merge 12 commits into from
Closed

Update br cnefe #5482

wants to merge 12 commits into from

Conversation

vgeorge
Copy link

@vgeorge vgeorge commented Feb 17, 2021

Contributes to #2303. This is a working in progress, opening for visibility.

The updated script (based on @astoff work) unpacks and parses all states, please check the README for run instructions.

CSV output has coordinates but is not complete yet, need to add municipality and state. Also intend to improve logging.

Here is results sample data from Sergipe state:

se.csv.zip

Current OA file is:

https://data.openaddresses.io/cache/uploads/astoff/a4ac0f/br-28.zip

@vgeorge
Copy link
Author

vgeorge commented Feb 21, 2021

Updates:

  • Uses statewide address file instead of smaller ones as they don't reference 2019 street faces
  • Loads midpoints from 2010 and 2019 shapefiles because the addresses files reference from both versions
  • Creates log files for missing shapefiles, faces and admin areas, for each state
  • Adds admin areas to adddresses

The script can't parse six states (MG 31, PR 41, RJ 33, RS 43, SP 35, BA 29), probably because of the size of addresses zipfile. These are the large states, I'll look into that.

This is almost ready for a review. In my previous comment I attached a results file for Sergipe. Is this the best to send sample data for review?

Another question, is it possible to host the result files at data.openaddresses.io? I don't have a place to host them.

@vgeorge
Copy link
Author

vgeorge commented Mar 22, 2021

Status update: fixed issue with large files. All states were parsed and available here:

https://www.dropbox.com/sh/q3qhiut0fy3hwr9/AAC1oTK4WmZPaDO02xa0czd1a?dl=0

Next step is to compare them with the current version, I'll post updates here. About hosting the new source files, I should be able to keep them in my Dropbox account.

@vgeorge
Copy link
Author

vgeorge commented Apr 26, 2021

Status update - did a visual inspection in state capitals and noticed large areas without addresses. This is São Paulo:

Screenshot 2021-04-26 at 08 50 21

This seems to be happening because there are a lot of addresses in 2019 files that doesn't have a corresponding face centroid as their id are not present in 2019 or 2010 shapefiles. In the script output I see 250k invalid face ids for São Paulo, other states also have a good portion of missing faces. I'll look around a lit bit more, but at the moment this seems a problem with source data. In 2010 data these areas are not missing, one possible solution would be to merge 2019 with 2010, using the later as a fallback where the 2019 addresses are broken.

Next steps:

cc @Playzinho @willemarcel

@data-please
Copy link

data-please bot commented Sep 13, 2023

@iandees
Copy link
Member

iandees commented May 28, 2024

Thank you for going through the work of building this and submitting a PR.

We're in the process of updating to use 2022 census data, so I'm going to close for now.

@iandees iandees closed this May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants