Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured data for street addresses #21

Closed
paulschreiber opened this issue May 5, 2021 · 3 comments · May be fixed by #22
Closed

Structured data for street addresses #21

paulschreiber opened this issue May 5, 2021 · 3 comments · May be fixed by #22

Comments

@paulschreiber
Copy link

OSEP #6: Structured data for street addresses

Author(s) @paulschreiber
Implementer(s) TODO
Status Draft
Draft PR(s) #18
Approval PR(s)
Created TODO
Updated TODO

Abstract

Open States provided currently provides a single field for address data. It would be beneficial to have separate street, city, state and zip code fields.

Within the current address field, the information available, spacing and delimiters vary by state, making parsing difficult.

Specification

Data Model Changes

  • Under contact details, for both district and capitol offices, add the fields street, city, state and zip
  • For backwards compatibility, existing address field will be preserved (temporarily).

Scraping

Data availability and format varies by state

  • some states provide the data in separate fields
  • some states do not provide the state name
  • some states inconsistently uses state names and state abbreviations

Observed cases can easily be parsed with regular expressions.

Rationale

This would allow for easier searching, sorting, geocoding and other data uses.

Drawbacks

  • In the short term, duplicate data is stored.

Implementation Plan

I will assist with updating scrapers to output the structured data.

The address field will be a composite of the structured data (using format strings).

Several team members, and hopefully some community members, will help to contribute updated committee scrapers.

Copyright

This document has been placed in the public domain per the Creative Commons CC0 1.0 Universal license.

@jamesturk
Copy link
Member

Thanks for this!

Initial thoughts:

The rationale could use a bit more work. This is maybe only the second or third request for this in many years, I think that more clear examples of where the current irregularities cause problems would be helpful. Especially as we discuss alternatives prior to accepting this.

For example, one other alternative we discussed was keeping the address field, but having formal delimiters. If the proposal is deciding against that I think it should be discussed in the rationale.

I'd add to Drawbacks:

  • this complicates scrapers, which will now be responsible for splitting addresses in most cases.. a hard problem
    • we could consider an alternative where this work is done post-scrape
  • merging 7000+ addresses will be a big undertaking. Updating the scrapers alone doesn't handle that.
  • currently all fields are atomic, which makes merging/linting easier, but now we need to be sure to treat these new fields as a unit

And one small copy/paste typo in implementation plan, "committee scrapers".

@jamesturk
Copy link
Member

Also, if you could submit this as a PR, that'd let us have the discussion on the PR itself and have updates/etc. which would be useful I think.

paulschreiber added a commit to paulschreiber/enhancement-proposals that referenced this issue May 17, 2021
@jamesturk
Copy link
Member

closing in favor of #22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants