-
Notifications
You must be signed in to change notification settings - Fork 267
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Python 3.6 and Keras 2.0 update of scripts
- Loading branch information
Nils Reimers
committed
Jul 11, 2017
1 parent
68718f9
commit 104d2a7
Showing
53 changed files
with
1,905,742 additions
and
0 deletions.
There are no files selected for viewing
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,152 @@ | ||
# Deep Learning for NLP - July 2017 | ||
|
||
This GIT repository accompanies the seminar on Deep Learning for Natural Language Processing. | ||
|
||
In contrast to other seminars, this seminar focuses on the **usage of deep learning methods**. As programming infrastructure we use Python in combination with [Keras](https://keras.io). The published code can be used with Python 2.7 or Python 3.6, Keras 2.0.5, and Theano (0.9.0) or TensorFlow (1.2.1) backend. You should ensure that you have the frameworks installed in the right version (note: they change quickly). | ||
|
||
This seminar is structured into 4 sessions: | ||
|
||
1. Feed-Forward Networks for Sequence Classification (e.g. POS, NER, Chunking) | ||
2. Convolutional Neural Network for Sentence / Text Classification (e.g. sentiment classification) | ||
3. Convolutional Neural Network for Relation Extraction (e.g. semantic relation extration) | ||
4. Long-Short-Term-Memory (LSTM)-Networks for Sequence Classificaiton | ||
|
||
The seminar is inspired by an engineering mindset: The beautiful math and complexity of the topic is sometimes neglected to provide instead an easy-to-understand and easy-to-use approach **to use Deep Learning for NLP tasks** (we use what works without providing a full background on every aspect). | ||
|
||
At the end of the seminar you should be able to understand the most important aspect of deep learning for NLP and be able to programm and train your own deep neural networks. | ||
|
||
In case of questions, feel free to approach [Nils Reimers](https://www.ukp.tu-darmstadt.de/people/doctoral-researchers/nils-reimers/). | ||
|
||
# Setting up the Development Environment | ||
|
||
The codes in this folder were developed for Python 2.7 (and Python 3.6), Keras 2.0.5. As backend for Keras you can use Theano 0.9.0 or TensorFlow 1.2.1. | ||
|
||
You can setup a virtual environment in the following way: | ||
|
||
You can install the required packages in the following way: | ||
``` | ||
pip install -r requirements.txt | ||
``` | ||
|
||
Alternatively, it should be sufficient to just install Keras in version 2.0.5 and TensorFlow: | ||
``` | ||
pip install Keras==2.0.5 TensorFlow==1.2.1 | ||
``` | ||
|
||
## Virtual Environment | ||
It can be useful, to run Python in a virtual environment for this seminar. | ||
|
||
Create a virtualenv in the following way: | ||
``` | ||
virtualenv .env | ||
source .env/bin/activate | ||
``` | ||
If you operate in the virtual environment, you can run pip to install the needed packages in the following way: | ||
``` | ||
.env/bin/pip install -r requirements.txt | ||
``` | ||
|
||
|
||
## Docker | ||
The folder `docker` contains a dockerfile that bundles an environment needed to run the experiments in this folder. It installs Python 3.6, Keras 2.0.5 and Tensorflow 1.2.1. | ||
|
||
First, you need to build the docker container: | ||
``` | ||
docker build ./docker -t dl4nlp | ||
``` | ||
|
||
Than, in this folder you can start the container and mount files in this container to the docker container: | ||
``` | ||
docker run -it -v ${PWD}:/usr/src/app dl4nlp bash | ||
``` | ||
|
||
This will start a bash inside the dl4nlp container, where Python is installed. Through the mounting, you can modify the files in this folder and run them inside the docker container. | ||
|
||
|
||
## Recommended Readings on Deep Learning | ||
The following is a short list with good introductions to different aspects of deep learning. | ||
* 2009, Yoshua Bengio, [Learning Deep Architectures for AI by Yoshua Bengio](http://www.iro.umontreal.ca/~bengioy/papers/ftml_book.p) | ||
* 2013, Richard Socher and Christopher Manning, [Deep Learning for Natural Language Processing (slides and recording from NAACL 2013)](http://nlp.stanford.edu/courses/NAACL2013/) | ||
* 2015, Yoshua Bengio et al., [Deep Learning - MIT Press book in preparation](http://www.iro.umontreal.ca/~bengioy/dlbook/) | ||
* 2015, Richard Socher, [CS224d: Deep Learning for Natural Language Processing](http://cs224d.stanford.edu/syllabus.html) | ||
* 2015, Yoav Goldberg, [A Primer on Neural Network Models for Natural Language Processing](http://u.cs.biu.ac.il/~yogo/nnlp.pdf) | ||
|
||
## Theory 1 - Introduction to Deep Learning | ||
**Slides:** [pdf](./1_Theory_Introduction.pdf) | ||
|
||
The first theory lesson covers the fundamentals of deep learning. | ||
|
||
## Theory 2 - Introduction to Word Embeddings | ||
**Slides:** [pdf](./3_Theory_Word_Embeddings.pdf) | ||
|
||
## Theory 3 - Introduction to Deep Learning Frameworks | ||
**Slides:** [pdf](./3_Theory_Frameworks.pdf) | ||
|
||
The second lesson gives an overview of deep learning frameworks. Hint: Use [Keras](http://keras.io) and have a look at Theano and TensorFlow. | ||
|
||
## Code Session 1 - SENNA Architecture for Sequence Classification | ||
**Slides:** [pdf](./Session%201%20-%20SENNA/SENNA.pdf) | ||
|
||
**Code:** See folder [Session 1 - SENNA](./Session%201%20-%20SENNA) | ||
|
||
The first code session is about the SENNA architecture ([Collobert et al., 2011, NLP (almost) from scratch](https://arxiv.org/abs/1103.0398)). In the folder you can find Python code for the preprocessing as well as Keras code to train and evaluate a deep learning model. The folder contains an example for Part-of-Speech tagging, which require the English word embeddings from either [Levy et al.](https://levyomer.wordpress.com/2014/04/25/dependency-based-word-embeddings/) or from [Komninos et al.](https://www.cs.york.ac.uk/nlp/extvec/). | ||
|
||
You can find in this folder also an example for German NER, based on the [GermEval 2014 dataset](https://sites.google.com/site/germeval2014ner/). To run the German NER code, you need the [word embeddings for German from our website](https://www.ukp.tu-darmstadt.de/research/ukp-in-challenges/germeval-2014/). | ||
|
||
**Recommended Readings:** | ||
* [CS224d - Lecture 2](https://www.youtube.com/watch?v=T8tQZChniMk) | ||
* [CS224d - Lecture 3](https://www.youtube.com/watch?v=T1j2Q9_FgTM) | ||
|
||
## Theory 4 - Introduction to Convolutional Neural Networks | ||
**Slides:** [pdf](./4_Theory_Convolutional_NN.pdf) | ||
|
||
This is an introduction to Convolutional Neural Networks. | ||
|
||
**Recommended Readings:** | ||
* [CS224d - Lecture 13](https://www.youtube.com/watch?v=EevTPpQvxiU) | ||
* [Kim, 2014, Convolutional Neural Networks for Sentence Classification](http://arxiv.org/abs/1408.5882) | ||
|
||
|
||
## Code Session 2 - Convolutional Neural Networks for Text Classification | ||
**Slides:** [pdf](./Session%202%20-%20Sentence%20CNN/Sentence_CNN.pdf) | ||
|
||
**Code:** See folder [Session 2 - Sentence CNN](./Session%202%20-%20Sentence%20CNN) | ||
|
||
This is a Keras implementation of the [Kim, 2014, Convolutional Neural Networks for Sentence Classification](http://arxiv.org/abs/1408.5882). We use the same preprocessing as provided by Kim in his [github repository](https://github.com/yoonkim/CNN_sentence) but then implement the rest using Keras. | ||
|
||
|
||
## Code Session 3 - Convolutional Neural Networks for Relation Extraction | ||
**Slides:** [pdf](./Session%203%20-%20Relation%20CNN/Relation_CNN.pdf) | ||
|
||
**Code:** See folder [Session 3 - Relation CNN](./Session%203%20-%20Relation%20CNN) | ||
|
||
This is an implementation for relation extraction. We use the [SemEval 2010 - Task 8](https://docs.google.com/document/d/1QO_CnmvNRnYwNWu1-QCAeR5ToQYkXUqFeAJbdEhsq7w/preview) dataset on semantic relations. We model the task as a pairwise classification task. | ||
|
||
**Recommended Readings:** | ||
* [Zeng et al., 2014, Relation Classification via Convolutional Deep Neural Network](http://www.aclweb.org/anthology/C14-1220) | ||
* [dos Santos et al., 2015, Classifying Relations by Ranking with Convolutional Neural Networks](https://arxiv.org/abs/1504.06580) | ||
|
||
|
||
## Theory 5 - Introduction to LSTM | ||
** This section was not yet ported to. Please have a look at the November 2016 version ** | ||
|
||
**Slides:** [pdf](https://github.com/UKPLab/deeplearning4nlp-tutorial/raw/master/2016-11_Seminar/4_Theory_Recurrent Neural Networks.pdf) | ||
|
||
**Code:** See folder [Session 4 - LSTM Sequence Classification](https://github.com/UKPLab/deeplearning4nlp-tutorial/tree/master/2016-11_Seminar/Session%204%20-%20LSTM%20Sequence%20Classification) | ||
|
||
LSTMs are a powerful model and became very popular in 2015 / 2016. | ||
|
||
**Recommended Readings:** | ||
* [RNN Effectivness](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) | ||
* [RNN Effectivness - Video](https://skillsmatter.com/skillscasts/6611-visualizing-and-understanding-recurrent-networks) | ||
* [Understanding LSTMs](http://colah.github.io/posts/2015-08-Understanding-LSTMs/) | ||
* [C224d Lecture 7](https://www.youtube.com/watch?v=rFVYTydGLr4) | ||
|
||
## Code Session 4 - LSTM for Sequence Classification | ||
**Slides:** [pdf](https://github.com/UKPLab/deeplearning4nlp-tutorial/raw/master/2016-11_Seminar/Session%204%20-%20LSTM%20Sequence%20Classification/LSTM%20for%20Sequence%20Classification.pdf) | ||
|
||
The folder contains a Keras implementation to perfrom sequence classification using LSTM. We use the [GermEval 2014 dataset](https://sites.google.com/site/germeval2014ner/) for German NER. But you can adapt the code easily to any other sequence classification problem (POS, NER, Chunking etc.). Check the slides for more information. | ||
|
||
|
||
|
||
|
Binary file not shown.
Binary file not shown.
62 changes: 62 additions & 0 deletions
62
2017-07_Seminar/Session 1 - SENNA/code for NER/BIOF1Validation.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
""" | ||
Computes the F1 score on BIO tagged data | ||
@author: Nils Reimers | ||
""" | ||
|
||
|
||
#Method to compute the accruarcy. Call predict_labels to get the labels for the dataset | ||
def compute_f1(predictions, dataset_y, idx2Label): | ||
|
||
|
||
label_y = [idx2Label[element] for element in dataset_y] | ||
pred_labels = [idx2Label[element] for element in predictions] | ||
|
||
|
||
|
||
prec = compute_precision(pred_labels, label_y) | ||
rec = compute_precision(label_y, pred_labels) | ||
|
||
f1 = 0 | ||
if (rec+prec) > 0: | ||
f1 = 2.0 * prec * rec / (prec + rec); | ||
|
||
return prec, rec, f1 | ||
|
||
|
||
def compute_precision(guessed, correct): | ||
correctCount = 0 | ||
count = 0 | ||
|
||
idx = 0 | ||
while idx < len(guessed): | ||
if guessed[idx][0] == 'B': #A new chunk starts | ||
count += 1 | ||
|
||
if guessed[idx] == correct[idx]: | ||
idx += 1 | ||
correctlyFound = True | ||
|
||
while idx < len(guessed) and guessed[idx][0] == 'I': #Scan until it no longer starts with I | ||
if guessed[idx] != correct[idx]: | ||
correctlyFound = False | ||
|
||
idx += 1 | ||
|
||
if idx < len(guessed): | ||
if correct[idx][0] == 'I': #The chunk in correct was longer | ||
correctlyFound = False | ||
|
||
|
||
if correctlyFound: | ||
correctCount += 1 | ||
else: | ||
idx += 1 | ||
else: | ||
idx += 1 | ||
|
||
precision = 0 | ||
if count > 0: | ||
precision = float(correctCount) / count | ||
|
||
return precision |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,127 @@ | ||
# -*- coding: utf-8 -*- | ||
""" | ||
This is an example for performing sequence tagging with Keras. | ||
We use the GermEval 2014 NER dataset (German) and implement the SENNA architecture (Collobert et al., NLP (almost) from scratch). | ||
The code can easily be changed to any other sequence tagging task. | ||
Performance after 10 epochs (GermEval 2014 German NER): | ||
Development F1-score: 70.3% | ||
Test F1-score: 69.9% | ||
Code was written & tested with: | ||
- Python 2.7 & Python 3.6 | ||
- Theano 0.9.0 and tensorflow 1.2.1 | ||
- Keras 2.0.5 | ||
@author: Nils Reimers, www.deeplearning4nlp.com | ||
""" | ||
from __future__ import print_function | ||
import numpy as np | ||
import time | ||
import gzip | ||
|
||
import sys | ||
if (sys.version_info > (3, 0)): | ||
import pickle as pkl | ||
else: #Python 2.7 imports | ||
import cPickle as pkl | ||
|
||
|
||
import keras | ||
from keras.models import Model | ||
from keras.layers import Input, Dense, Dropout, Activation, Flatten, concatenate | ||
from keras.layers import Embedding | ||
|
||
import BIOF1Validation | ||
|
||
|
||
|
||
numHiddenUnits = 100 | ||
|
||
|
||
f = gzip.open('pkl/embeddings.pkl.gz', 'rb') | ||
embeddings = pkl.load(f) | ||
f.close() | ||
|
||
label2Idx = embeddings['label2Idx'] | ||
wordEmbeddings = embeddings['wordEmbeddings'] | ||
caseEmbeddings = embeddings['caseEmbeddings'] | ||
|
||
#Inverse label mapping | ||
idx2Label = {v: k for k, v in label2Idx.items()} | ||
|
||
f = gzip.open('pkl/data.pkl.gz', 'rb') | ||
train_tokens, train_case, train_y = pkl.load(f) | ||
dev_tokens, dev_case, dev_y = pkl.load(f) | ||
test_tokens, test_case, test_y = pkl.load(f) | ||
f.close() | ||
|
||
##################################### | ||
# | ||
# Create the Network | ||
# | ||
##################################### | ||
|
||
|
||
# Create the train and predict_labels function | ||
n_in = train_tokens.shape[1] | ||
n_out = len(label2Idx) | ||
|
||
words_input = Input(shape=(n_in,), dtype='int32', name='words_input') | ||
words = Embedding(input_dim=wordEmbeddings.shape[0], output_dim=wordEmbeddings.shape[1], input_length=n_in, weights=[wordEmbeddings], trainable=False)(words_input) | ||
words = Flatten()(words) | ||
|
||
|
||
casing_input = Input(shape=(n_in,), dtype='int32', name='casing_input') | ||
casing = Embedding(input_dim=caseEmbeddings.shape[0], output_dim=caseEmbeddings.shape[1], input_length=n_in, weights=[caseEmbeddings], trainable=False)(casing_input) | ||
casing = Flatten()(casing) | ||
|
||
output = concatenate([words, casing]) | ||
output = Dense(units=numHiddenUnits, activation='tanh')(output) | ||
output = Dense(units=n_out, activation='softmax')(output) | ||
|
||
#Create our model and compile it using Nadam optimizer with categorical cross-entropy for sparse y-labels | ||
model = Model(inputs=[words_input, casing_input], outputs=[output]) | ||
model.compile(loss='sparse_categorical_crossentropy', optimizer='nadam') | ||
model.summary() | ||
|
||
|
||
|
||
print(train_tokens.shape[0], ' train samples') | ||
print(train_tokens.shape[1], ' train dimension') | ||
print(test_tokens.shape[0], ' test samples') | ||
|
||
|
||
|
||
################################## | ||
# | ||
# Training of the Network | ||
# | ||
################################## | ||
|
||
|
||
|
||
number_of_epochs = 10 | ||
minibatch_size = 128 | ||
print("%d epochs" % number_of_epochs) | ||
|
||
|
||
def predict_classes(prediction): | ||
return prediction.argmax(axis=-1) | ||
|
||
for epoch in range(number_of_epochs): | ||
print("\n------------- Epoch %d ------------" % (epoch+1)) | ||
model.fit([train_tokens, train_case], train_y, epochs=1, batch_size=minibatch_size, verbose=True, shuffle=True) | ||
|
||
|
||
# Compute precision, recall, F1 on dev & test data | ||
pre_dev, rec_dev, f1_dev = BIOF1Validation.compute_f1(predict_classes(model.predict([dev_tokens, dev_case])), dev_y, idx2Label) | ||
pre_test, rec_test, f1_test = BIOF1Validation.compute_f1(predict_classes(model.predict([test_tokens, test_case])), test_y, idx2Label) | ||
|
||
print("%d. epoch: F1 on dev: %f, F1 on test: %f" % (epoch+1, f1_dev, f1_test)) | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# NER using the SENNA Architecture | ||
|
||
This is a simple Named Entity Recoginizer for German based on the SENNA architecture as presented by Collobert and Weston in the paper 'Natural Language Processing (almost) from Scratch. | ||
|
||
We use the data from the GermEval-2014 contest (https://sites.google.com/site/germeval2014ner/data). | ||
|
||
The code was developed and tested with: | ||
- Python 2.7 | ||
- Theano 0.8.2 | ||
- Keras 1.1.1 | ||
|
||
# 1. Step: Word Embeddings | ||
A critical feature for nearly every system in NLP are good word embeddings. For English, there are three pre-trained word embeddings we can use: | ||
- Word2Vec: https://code.google.com/p/word2vec/ | ||
- Glove: http://nlp.stanford.edu/projects/glove/ | ||
- Levy Word2Vec on Dependencies: https://levyomer.wordpress.com/2014/04/25/dependency-based-word-embeddings/ | ||
|
||
For German, you can use the word embeddings we trained for the GermEval-2014 contest: | ||
https://www.ukp.tu-darmstadt.de/research/ukp-in-challenges/germeval-2014/ | ||
|
||
# 2. Reducing the size of the embedding matrix | ||
The full embedding matrix can become quite large, the unzipped file for the German word embeddings with min. word count of 5 has a size 3.3GB. Reading this and storing it in memory would cost quite some time. | ||
|
||
Most of the word embeddings will not be needed during training and evaluation time. So it is a nice trick to first only extract the word embeddings we are going to need for our neural network. The provided CreateWordList.py reads in the dataset and extracts all words from our train, dev and test files. | ||
|
||
After that, we can execute the CreateSubCorpus.py, which extracts from the large .vocab-file only the word embeddings we actually gonna need. | ||
|
||
The reduced embeddings file can be found in at embeddings/GermEval.vocab.gz | ||
|
||
# 3. Create a NER | ||
Most of the code deals with reading in the dataset, creating X- and Y-matrices for our neural network and evaluating the final result. | ||
|
||
- BIOF1Validation.py: Provides methods to compute the F1-score on BIO encoded data | ||
- GermEvalReader.py: Reads in the tsv-data from the GermEval task and outputs them as matrices | ||
|
||
# 4. Hints | ||
Updating the word embeddings layer takes significant time and is not necessary, as we have pre-trained word embeddings. You can disable the update of the embeddings by setting the trainable weights for this layer to an empty set. | ||
|
||
# 5. Performance and Runtime | ||
On my computer, the Keras implementation with a window size of 2 runs for about 10.0 seconds per epoch. | ||
|
||
Adding the casing information to Keras increases the runtime to about 12 seconds/epoch. | ||
|
||
The performance after 10 epochs is: | ||
- Without case information: 10 epoch: F1 on dev: 0.690786, F1 on test: 0.684517 | ||
- With case information: 10 epoch: F1 on dev: 0.720520, F1 on test: 0.705799 | ||
|
||
|
||
Our system for the GermEval-2014 competition achieved a score of F1=75.1% | ||
https://www.ukp.tu-darmstadt.de/fileadmin/user_upload/Group_UKP/publikationen/2014/2014_GermEval_Nested_Named_Entity_Recognition_with_Neural_Networks.pdf | ||
|
||
|
Oops, something went wrong.