Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraper candidate: City of Chicago Landmarks database #8

Open
derekeder opened this issue Jan 11, 2013 · 2 comments
Open

Scraper candidate: City of Chicago Landmarks database #8

derekeder opened this issue Jan 11, 2013 · 2 comments
Labels

Comments

@derekeder
Copy link
Collaborator

The City of Chicago has an online tool for looking up historical landmarks. These should be pretty easy to scrape.

A couple hundred historical landmarks with descriptions and images:
http://webapps.cityofchicago.org/landmarksweb/web/listings.htm

A database of 17,000 Chicago buildings including address, architect, type, color code, major tenant (probably outdated), and PIN.

Selecting a blank value for Architect will return the whole list (I think)
http://webapps.cityofchicago.org/landmarksweb/search/home.htm

This may have already been released as this dataset:
https://data.cityofchicago.org/Historic-Preservation/Individual-Landmarks-Shapefiles/2h6e-2yk6
https://data.cityofchicago.org/Historic-Preservation/Individual-Landmarks/tdab-kixi

@jpvelez
Copy link
Collaborator

jpvelez commented Jan 12, 2013

The datasets are different for some reason. We should figure out who the researchers are at DHED that know about these.

Juan-Pablo Velez
312-218-5448

On Friday, January 11, 2013 at 4:43 PM, Derek Eder wrote:

The City of Chicago has an online tool for looking up historical landmarks. These should be pretty easy to scrape.
A couple hundred historical landmarks with descriptions and images:
http://webapps.cityofchicago.org/landmarksweb/web/listings.htm
A database of 17,000 Chicago buildings including address, architect, type, color code, major tenant (probably outdated), and PIN.
Selecting a blank value for Architect will return the whole list (I think)
http://webapps.cityofchicago.org/landmarksweb/search/home.htm
This may have already been released as this dataset:
https://data.cityofchicago.org/Historic-Preservation/Individual-Landmarks-Shapefiles/2h6e-2yk6
https://data.cityofchicago.org/Historic-Preservation/Individual-Landmarks/tdab-kixi


Reply to this email directly or view it on GitHub (#8).

@danxoneil
Copy link

Yes, these are different datasets.

The "historical landmarks" first reffed above are the same items in the dataset at the bottom of above: https://data.cityofchicago.org/Historic-Preservation/Individual-Landmarks-Shapefiles/2h6e-2yk6
https://data.cityofchicago.org/Historic-Preservation/Individual-Landmarks/tdab-kixi

These are "a list of individual Chicago Landmarks designated by City Council upon recommendation of the Commission on Chicago Landmarks". In other words, they went through a formal process for designation and made the final cut.

The data from this process is the scrapable PDFs of monthly meeting minutes published by the Commission, five years of which are published here. This is a good candidate for scraping-- well-formed addresses with large blocks of descriptive narrative associated with each address. Very rich information that can inform decision-making in the future. I will add that in a separate issue.

Though there may be internal documents of the Commission that are more structured than these meeting meeting minutes, it's not likely that the City would ever go back and attempt to turn these PDFs into publishable datasets on the data portal. There are far more worthwhile dataset candidates than this one.

However, turning these meeting minutes into structured data might be a good project for a non-programmer to get involved in edifice. A tool like http://tabula.nerdpower.org/ wouldn't really work, because it's not tabular to begin with.

At EveryBlock, we had a custom tool for doing this (pull in text, highlight proposed addresses and blocks of text associated with it, allow a human to confirm/ fix, and publish). See screenshot. It seems like it would be a good thing to do that in this project. Anyone want to make that?

7916353320_501ac54f0b_b

The middle item reffed above are all items from the "inventory of architecturally and historically significant structures". This is a completely separate dataset, and super-useful to this project. Added that as #26 (could someone with access please add the "scraper" label to that issue?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants