Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracting data from regulatory agency databases #6

Open
3 tasks
waldoj opened this issue Jun 10, 2014 · 4 comments
Open
3 tasks

Extracting data from regulatory agency databases #6

waldoj opened this issue Jun 10, 2014 · 4 comments

Comments

@waldoj
Copy link
Member

waldoj commented Jun 10, 2014

Local, state, and (to a lesser extent) federal agencies rely on proprietary records management software in order to track regulatory compliance. A state agriculture agency that inspects agriculture facilities has records management software to track each inspection of every slaughterhouse, E. coli test, and scale. A transportation agency that inspects bridges and roads has records management software to track every bridge inspection, every pothole, etc. A lot of this software is really quite bad—Windows 3.1 software dragged along to compatibility with the software of 2014—although some is modern and decent.

It is my understanding that very few of these programs store or can export their data in any open formats. Export As → CSV isn't a thing. That makes it impossible for agencies to publish open data, even with a legal mandate to do so.

(Note that, personally, I know very little about this world of software. I have no experience with it.)

I think that the following things need to be done:

  • survey this software world and produce a report of who the major vendors are for each type or regulatory agency and what their data storage and export functionality is like right now
  • determine how to go about persuading these vendors to add the functionality to export data in an open format (anecdotally, these vendors are not interested in doing this for its own sake)
  • for those vendors who cannot be persuaded to support open formats, create programs that can extract the data and transform it into an open format
@scuerda
Copy link

scuerda commented Aug 15, 2014

This is on point and it dovetails with issue #5, Easy to deploy data publishing platforms will require ETL interfaces that can consume "non-open" formats. An additional consideration is that for datasets that contain potentially private information (e.g. education data governed by FERPA), a key challenge for state/local agencies is applying aggregation / redaction in order to meet their legal obligations. Ideally, a solution to this problem can handle this task as well.

@waldoj
Copy link
Member Author

waldoj commented Aug 15, 2014

Great point re: PII, @scuerda. I've opened #12 to that end.

@scuerda
Copy link

scuerda commented Aug 15, 2014

Excellent. These are issues that we are discussing in Connecticut. I'll be sure to loop y'all in on any interim solutions / strategies we come up with.

@waldoj
Copy link
Member Author

waldoj commented Aug 15, 2014

Please do! Or if there's any other way that @opendata can be useful, please let me know. It's our job. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants