Skip to content

SocialScienceDataLab/building-infrastructure-for-data-driven-research

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 

Repository files navigation

Building Infrastructure for Data-Driven Research

  • Speaker: Philipp Zumstein font-awesome_4-7-0_github_24_0_000000_none font-awesome_4-7-0_twitter_24_0_007dff_none
  • Venue: Social Science Data Lab, MZES, Mannheim
  • Date: March 15th, 2017, at 12 noon
  • Location: MZES, A-231

Abstract

Most methods for data-driven research (including Big Data, Data Science, and Digital Humanities) work primarily on text data or numbers. However, there is also a lot of information which is only available in printed books or newspapers. This information has to be first digitized and then further processed to extract the text or data. The main focus of the talk is optical character recognition (OCR). We will see the OCR workflow in general, discuss some OCR software, and how you can use these tools practically. Building such an infrastructure or performing these initial steps may need a reasonable amount of time and resources, or also be a project itself. The Mannheim University Library has in this area some infrastructure projects which are briefly mentioned.

Keywords

Slides

Links

Feedback, Questions, Discussion

Feel free to ask also questions here by opening up a new issue and we can continue discussion.