Diversity Over Size: On the Effect of Sample and Topic Sizes for Topic-Dependent Argument Mining Datasets
This repository provides the means to download the newly created Few-Shot-150T Corpus (FS150T-Corpus), introduced in the paper "Diversity Over Size: On the Effect of Sample and Topic Sizes for Topic-Dependent Argument Mining Datasets".
Abstract: Topic-Dependent Argument Mining (TDAM), that is extracting and classifying argument components for a specific topic from large document sources, is an inherently difficult task for machine learning models and humans alike, as large TDAM datasets are rare and recognition of argument components requires expert knowledge. The task becomes even more difficult if it also involves stance detection of retrieved arguments. In this work, we investigate the effect of TDAM dataset composition in few- and zero-shot settings. Our findings show that, while fine-tuning is mandatory to achieve acceptable model performance, using carefully composed training samples and reducing the training sample size by up to almost 90% can still yield 95% of the maximum performance. This gain is consistent across three TDAM tasks on three different datasets.
Contact person: Benjamin Schiller
UKP Lab | TU Darmstadt | summetix
Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.
Due to license reasons, we cannot provide the download to the full dataset files directly. Instead, all the sentences have to be retrieved from Common Crawl WARC files and are missing in the dataset files in this repository. To download the sentences, follow these instructions:
First, create a virtual environment and install the requirements:
python3.12 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Next, use src/main.py to complete the datasets:
python src/main.py -i dataset/test_full_no_sents.tsv -o dataset/test_full.tsv
python src/main.py -i dataset/dev_full_no_sents.tsv -o dataset/dev_full.tsv
python src/main.py -i dataset/train_full_no_sents.tsv -o dataset/train_full.tsv
The code checks via hashes if the retrieved sentences are correct and prints out a message if not. The code also checks if all sentences were retrieved at the end.
Retrieving all sentences can take up to 4 hours and the retrieval process may get interrupted. Hence, every 500 rows, a checkpoint file will be saved (e.g. test_full.chkpt500.tsv). In case of an interruption, this file can be used as in-file (-i) to start the process from the checkpoint.
The code was tested with Python 3.12. In case you cannot retrieve the dataset, please request it from us at https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/4353.
Download the other corpora used in the paper at:
Please use the following citation:
@inproceedings{schiller-etal-2024-diversity,
title = "Diversity Over Size: On the Effect of Sample and Topic Sizes for Topic-Dependent Argument Mining Datasets",
author = "Schiller, Benjamin and
Daxenberger, Johannes and
Waldis, Andreas and
Gurevych, Iryna",
editor = "Al-Onaizan, Yaser and
Bansal, Mohit and
Chen, Yun-Nung",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-main.608",
pages = "10870--10887",
abstract = "Topic-Dependent Argument Mining (TDAM), that is extracting and classifying argument components for a specific topic from large document sources, is an inherently difficult task for machine learning models and humans alike, as large TDAM datasets are rare and recognition of argument components requires expert knowledge. The task becomes even more difficult if it also involves stance detection of retrieved arguments. In this work, we investigate the effect of TDAM dataset composition in few- and zero-shot settings. Our findings show that, while fine-tuning is mandatory to achieve acceptable model performance, using carefully composed training samples and reducing the training sample size by up to almost 90{\%} can still yield 95{\%} of the maximum performance. This gain is consistent across three TDAM tasks on three different datasets. We also publish a new dataset and code for future benchmarking.",
}
This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.