stanford_pipeline

Program to run scraped news stories through Stanford's CoreNLP program.

The program pulls stories added to the database within the past day and that aren't currently parsed using CoreNLP. Once parsed, the parsetrees are placed back into the database. The program is currently set to proccess the first six sentences of a story.

This program makes extensive use of Brendan O'Connor's wrapper for CoreNLP. The current install comes from my (John Beieler) fork. The config file for CoreNLP makes use of the shift-reduce parser introduced in CoreNLP 3.4.

CoreNLP Setup

This pipeline depends on having CoreNLP 3.4 with the shift-reduce parser. Download the models like this:

wget http://nlp.stanford.edu/software/stanford-corenlp-full-2014-06-16.zip
unzip stanford-corenlp-full-2014-06-16.zip
mv stanford-corenlp-full-2014-06-16 stanford-corenlp
cd stanford-corenlp
wget http://nlp.stanford.edu/software/stanford-srparser-2014-07-01-models.jar

If errors persist, try changing the path in default_config.ini from the relative path ~/stanford-corenlp to the full path (e.g.) /home/ahalterman/stanford-corenlp.

Configuration

The default_config.ini file has several options that can be changed, including the MongoDB database and collection of stories to process and whether all unparsed stories should be processed or just the stories added in the last day.

Usage

python process.py

Up to a minute of [Errno 111] Connection refused messages are normal during startup.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.gitignore		.gitignore
README.md		README.md
default_config.ini		default_config.ini
parser.py		parser.py
process.py		process.py
process_sched.py		process_sched.py
requirements.txt		requirements.txt
stanford_config.ini		stanford_config.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

stanford_pipeline

CoreNLP Setup

Configuration

Usage

About

Releases 1

Packages

Contributors 3

Languages

openeventdata/stanford_pipeline

Folders and files

Latest commit

History

Repository files navigation

stanford_pipeline

CoreNLP Setup

Configuration

Usage

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages