This repo contains the implementation of our paper.
The original code was cloned from from this repo, which is a PyTorch implementation for paper Position-aware Attention and Supervised Data Improve Slot Filling.
The TACRED dataset: Details on the TAC Relation Extraction Dataset can be found on this dataset website.
- Python 3 (tested on 3.6.2)
- PyTorch (tested on 1.0.0)
- tqdm, bert_as_service, maybe a couple others
- unzip, wget (for downloading only)
First, download and unzip GloVe vectors from the Stanford website, with:
chmod +x download.sh; ./download.sh
Then tokenize data to run with BERT or SciBERT with:
python data/data_tok.py
Train using the commands in cmdcheat.txt
. You need two terminal windows open, or two separate tmux
sessions. Run the corresponding bert-as-service
command and then run the python train.py
command, both with the appropriate flags listed in cmdcheat.txt
.
Model checkpoints and logs will be saved to ./saved_models/00
.
Run evaluation on the test set with:
python eval.py saved_models/00 --dataset test
This will use the best_model.pt
by default. Use --model checkpoint_epoch_10.pt
to specify a model checkpoint file. Add --out saved_models/out/test1.pkl
to write model probability output to files (for ensemble, etc.).
You will need bert-as-service
running for the test phase as well.
Please see the example script ensemble.sh
.
All work contained in this package is licensed under the Apache License, Version 2.0. See the included LICENSE file.