Skip to content

Latest commit

 

History

History
22 lines (14 loc) · 1.64 KB

File metadata and controls

22 lines (14 loc) · 1.64 KB

This is my tensorflow implementation of the Dynamic Coattention Network applied to question answering for the SQuAD database (tested with tensorflow version 1.1 and 1.2). The network gets a Wikipedia article and a question as inputs and should predict a segment (or span) of the article that answers the question.

The data in the data/squad folder was downloaded and preprocessed via the starter code from assignment 4 of the Stanford Course CS224n: Natural Language Processing with Deep Learning.

If you just want to have a look at the DCN implementation check out DCN_model.py, it is only around 200 lines long.

To implement the model I had to explore some tensorflow functions like tf.gather_nd and tf.map_fn. I did my experiments with these functions on toy data in this notebook in the Experimentation_Notebooks folder.

The best result so far is 48% EM (exact match) and 64% F1 score on the validation set. Training was started via

python code/train.py --rnn_state_size=150

Note:

  • You will need the tqdm package to run the code
  • Right now the project is on ice, due to the high costs for training on AWS instances. I might continue the project once I get a proper graphics card.

TODO:

  • The hyperparameter search is not finished (e.g.: How much can using 300 dimensional word vectors improve performance compared to 100 dimensional word vectors?)
  • Check influence of LSTM vs GRU