Long Form Question Answer is to automatically search for relevant answers among many responses provided for a given question (Answer Selection), and search for relevant questions to reuse their existing answers (Question Retrieval).
Dataset | Domain | #Question | #Answer |
---|---|---|---|
TRECQA | Open-domain | 1,229 | 5,3417 |
WikiQA | Open-domain | 3,047 | 29,258 |
InsuranceQA | Insurance | 12,889 | 21,325 |
FiQA | Financial | 6,648 | 57,641 |
Yahoo! Answers | Open-domain | 50,112 | 253,440 |
SemEval-2015 Task 3 | Open-domain | 2,600 | 16,541 |
SemEval-2016 Task 3 | Open-domain | 4,879 | 36,198 |
SemEval-2017 Task 3 | Open-domain | 4,879 | 36,198 |
-
TRECQA dataset is created by Wang et. al. from TREC QA track 8-13 data, with candidate answers automatically selected from each question’s document pool using a combination of overlapping non-stop word counts and pattern matching. This data set is one of the most widely used benchmarks for answer sentence selection.
-
WikiQA is a publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering by Microsoft Research.
-
InsuranceQA is a non-factoid QA dataset from the insurance domain. Question may have multiple correct answers and normally the questions are much shorter than the answers. The average length of questions and answers in tokens are 7 and 95, respectively. For each question in the development and test sets, there is a set of 500 candidate answers.
-
FiQA is a non-factoid QA dataset from the financial domain which has been recently released for WWW 2018 Challenges. The dataset is built by crawling Stackexchange, Reddit and StockTwits in which part of the questions are opinionated, targeting mined opinions and their respective entities, aspects, sentiment polarity and opinion holder.
-
Yahoo! Answers is a web site where people post questions and answers, all of which are public to any web user willing to browse or download them. The data we have collected is the Yahoo! Answers corpus as of 10/25/2007. This is a benchmark dataset for communitybased question answering that was collected from Yahoo Answers. In this dataset, the answer lengths are relatively longer than TrecQA and WikiQA.
-
SemEval-2015 Task 3 consists of two sub-tasks. In Subtask A, given a question (short title + extended description), and several community answers, classify each of the answer as definitely relevance (good), potentially useful (potential), or bad or irrelevant (bad, dialog, non-english other). In Subtask B, given a YES/NO question (short title + extended description), and a list of community answers, decide whether the global answer to the question should be yes, no, or unsure.
-
SemEval-2016 Task 3 consists two sub-tasks, namely Question-Comment Similarity and Question-Question Similarity. In the Question-Comment Similarity task, given a question from a question-comment thread, rank the comments according to their relevance with respect to the question. In Question-Question Similarity task, given the new question, rerank all similar questions retrieved by a search engine.
-
SemEval-2017 Task 3 contains two sub-tasks, namely Question Similarity and Relevance Classification. Given the new question and a set of related questions from the collection, the Question Similarity task is to rank the similar questions according to their similarity to the original question. While the Relevance Classification is to rank the answer posts according to their relevance with respect to the question based on a question-answer thread.