This is my tensorflow implementation of the Dynamic Coattention Network applied to question answering for the SQuAD database (tested with tensorflow version 1.1 and 1.2). The network gets a Wikipedia article and a question as inputs and should predict a segment (or span) of the article that answers the question.

The data in the data/squad folder was downloaded and preprocessed via the starter code from assignment 4 of the Stanford Course CS224n: Natural Language Processing with Deep Learning.

If you just want to have a look at the DCN implementation check out DCN_model.py, it is only around 200 lines long.

To implement the model I had to explore some tensorflow functions like tf.gather_nd and tf.map_fn. I did my experiments with these functions on toy data in this notebook in the Experimentation_Notebooks folder.

The best result so far is 48% EM (exact match) and 64% F1 score on the validation set. Training was started via

python code/train.py --rnn_state_size=150

Note:

You will need the tqdm package to run the code
Right now the project is on ice, due to the high costs for training on AWS instances. I might continue the project once I get a proper graphics card.

TODO:

The hyperparameter search is not finished (e.g.: How much can using 300 dimensional word vectors improve performance compared to 100 dimensional word vectors?)
Check influence of LSTM vs GRU

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Files

README.md

Latest commit

History

README.md

File metadata and controls