Evaluating consistency of Question-Answering Models

This repository contains code for creating implications and evaluating the consistency of question-answering models, as described in the following paper:

Are Red Roses Red? Evaluating Consistency of Question-Answering Models
Marco Tulio Ribeiro, Carlos Guestrin, Sameer Singh
Association for Computational Linguistics (ACL), 2019

Installation

Clone this repository and cd to the folder:

git clone [email protected]:marcotcr/qa_consistency.git
cd qa_consistency

Create and activate a virtual environment, e.g.:

virtualenv -p python3.6 env
source env/bin/activate

Run the following, replacing [gpu] with [cpu] if you don't have a gpu.

pip install cython numpy
pip install benepar[gpu]
pip install -e .
cd qa_consistency
git clone https://github.com/kelvinguu/qanli.git
cd ..
python -c "import benepar;benepar.download('benepar_en_small')"
python -m spacy download en_core_web_sm

Generating implications:

VQA

import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.Session(config=config)
import qa_consistency.implication
gen = qa_consistency.implication.ImplicationsVQA()
gen.implications('How many birds?', '3')

[('Are there 3 birds ?', 'yes', 'yeseqcount'),
('Are there 4 birds ?', 'no', 'n+1'),
('Are there any birds ?', 'yes', 'ans>0 implies some')]

SQuAD

import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.Session(config=config)
import qa_consistency.implication
gen = qa_consistency.implication.ImplicationsSquad()
passage = 'Kublai originally named his eldest son, Zhenjin, as the Crown Prince, \
but he died before Kublai in 1285.'
gen.implications('When did Zhenjin die?', '1285', passage)

[('Who died in 1285?', 'Zhenjin', 'subj')]

Evaluating the consistency of models

VQA

Download and extract precomputed implications here. Create a folder for the consistency dataset (CONSISTENCY_FOLDER). Output your model predictions into a json file (PRED_FILE) in the VQA format. Then run:

import qa_consistency.dataset_utils
all_imps = pickle.load(open('vqa_imps.pkl', 'rb'))
vqa = qa_consistency.dataset_utils.load_vqa(vqa_path, 'validation')
# Uncomment the line below if you want vqa v2
# vqa = qa_consistency.dataset_utils.load_vqav2(vqa_path, 'validation')
qa_consistency.dataset_utils.generate_implication_vqa(vqa, PRED_FILE, all_imps, CONSISTENCY_FOLDER)

This will write CONSISTENCY_FOLDER/{questions,annotations}.json. At this point you should run your model on these files, and generate a new prediction file (CONSISTENCY_PRED_FILE), and then run:

stats = qa_consistency.dataset_utils.evaluate_consistency_vqa(CONSISTENCY_FOLDER, CONSISTENCY_PREDS_FILE)
print('Consistency by implication type:')
print()
for x, v in stats.items():
    if x == 'all':
        continue
    print('%s : %.1f' % (x, 100* v))
print()
print('Avg  : %.1f' % (100 * stats['all']))

SQuAD

Download and extract precomputed implications here. Let SQUAD_PATH be a pointer to the original squad dev set json (dev-v1.1.json), PRED_FILE be the predictions json on the dev set from your model in the SQuAD official format (dictionary of id : answer). Run:

import qa_consistency.dataset_utils
all_imps = pickle.load(open('squad_imps.pkl', 'rb'))
qa_consistency.dataset_utils.generate_implication_squad(
SQUAD_PATH, PRED_FILE, all_imps, NEW_SQUAD_JSON)

This will generate a new dataset in the SQuAD format in the NEW_SQUAD_JSON path. At this point you should run your model on this file, and generate a new prediction file (CONSISTENCY_PRED_FILE), and then run:

stats = qa_consistency.dataset_utils.evaluate_consistency_squad(NEW_SQUAD_JSON, CONSISTENCY_PRED_FILE)
print('Consistency by implication type:')
print()
for x, v in stats.items():
    if x == 'all':
        continue
    print('%s : %.1f' % (x, 100* v))
print()
print('Avg  : %.1f' % (100 * stats['all']))

Notebooks where we bring it all together

Code of Conduct

Microsoft Open Source Code of Conduct

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
notebooks		notebooks
precomputed		precomputed
qa_consistency		qa_consistency
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
__init__.py		__init__.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluating consistency of Question-Answering Models

Installation

Generating implications:

VQA

SQuAD

Evaluating the consistency of models

VQA

SQuAD

Notebooks where we bring it all together

Code of Conduct

About

Releases

Packages

Contributors 2

Languages

License

marcotcr/qa_consistency

Folders and files

Latest commit

History

Repository files navigation

Evaluating consistency of Question-Answering Models

Installation

Generating implications:

VQA

SQuAD

Evaluating the consistency of models

VQA

SQuAD

Notebooks where we bring it all together

Code of Conduct

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages