Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull meta descriptions for each dataset #24

Open
derekeder opened this issue Feb 13, 2013 · 2 comments
Open

Pull meta descriptions for each dataset #24

derekeder opened this issue Feb 13, 2013 · 2 comments

Comments

@derekeder
Copy link
Collaborator

The City does a pretty good job of documenting what each of these datasets are. Can these be pulled from the API?

Example: https://data.cityofchicago.org/Buildings/Building-Footprints/w2v3-isjw

Also, they usually link to a document that defines all the data fields and what they mean. Can we somehow either link to or read this in?

Example: https://data.cityofchicago.org/api/assets/003C600C-3A66-4605-8E7E-2477AAE95E16

@mccc
Copy link
Collaborator

mccc commented Feb 13, 2013

It would be cool if we could automatically pull those PDFs, but they seem to usually be linked in free-form 'Description' text fields. It might just involve some super-simple scanning for bit.ly links, if that's the city's standard practice over hundreds of datasets.

Re: scraping that document — if only there was a schema for the data dictionary for my schema... [I think this is what Heidegger called "the hermeneutic circle"]

@danxoneil
Copy link

Unfortunately, a scan for bit.ly links would not be comprehensive. Here's a example: https://data.cityofchicago.org/Health-Human-Services/Public-Health-Statistics-Gonorrhea-cases-for-femal/cgjw-mn43?.

This is a dataset currently included in this project.

Perhaps just follow and slurp all of the links in each metadata field, them associate them with that dataset, and sort it out by hand later?

Screen Shot 2013-02-14 at 11 46 15 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants