Missci: Reconstructing the Fallacies in Misrepresented Science (ACL 2024)

Abstract: Health-related misinformation on social networks can lead to poor decision-making and real-world dangers. Such misinformation often misrepresents scientific publications and cites them as "proof" to gain perceived credibility. To effectively counter such claims automatically, a system must explain how the claim was falsely derived from the cited publication. Current methods for automated fact-checking or fallacy detection neglect to assess the (mis)used evidence in relation to misinformation claims, which is required to detect the mismatch between them. To address this gap, we introduce Missci, a novel argumentation theoretical model for fallacious reasoning together with a new dataset for real-world misinformation detection that misrepresents biomedical publications. Unlike previous fallacy detection datasets, Missci (i) focuses on implicit fallacies between the relevant content of the cited publication and the inaccurate claim, and (ii) requires models to verbalize the fallacious reasoning in addition to classifying it. We present Missci as a dataset to test the critical reasoning abilities of large language models (LLMs), that are required to reconstruct real-world fallacious arguments, in a zero-shot setting. We evaluate two representative LLMs and the impact of different levels of detail about the fallacy classes provided to the LLM via prompts. Our experiments and human evaluation show promising results for GPT 4, while also demonstrating the difficulty of this task.

Contact person: Max Glockner

UKP Lab | TU Darmstadt

This repository contains Missci, a novel dataset with reconstructed fallacious arguments that misrepresent scientific publications. We provide all necessary code to reproduce and evaluate our results and use LLMs in reconstructing the fallacious arguments. Don't hesitate to send us an e-mail or report an issue, if you have further questions.

Setup

Follow these instructions to recreate thy python environment used for all our experiments. All experiments ran on A100 GPUs.

We use python version 3.10. To create a python environment with all necessary dependencies run:

python -m venv missci
source missci/bin/activate
pip install -r requirements.txt

For Llama 2 / GPT 4 prompting edit the llm-config.json file:

{
  "gpt-4": {
      "AZURE_OPENAI_ENDPOINT": "<endpoint string>",
      "OPENAI_API_KEY": "<api key>"
  },
  "llama2": {
    "directory": "<llama2 directory>"
  }
}

Structure

The Missci dataset can be downloaded from the dataset directory.
Our human evaluation and analysis results are provided in the human-analysis directory.
Predictions will be stored in the predictions directory. For the different experiments separate subdirectories exist. The unparsed LLM outputs are stored in subdirectories named <directory>-raw:
- For argument reconstruction: predictions/generate-classify
- For LLM consistency results: predictions/consistency
- For LLM classification of gold fallacious premises: predictions/classify-given-gold-premise
- For LLM classification without any fallacious premise: predictions/only-classify
Prompt templates are provided in the prompt-templates directory. It contains three subdirectories:
- The prompts for argument reconstruction (premise generation and fallacy classification): gen_cls
- The prompts for fallacy classification based on a provided fallacious premise: cls_with_premise
- The prompts for fallacy classification without a provided fallacious premise: cls_without_premise

How to use

Argument Reconstruction (Baselines)

Run run-argument-reconstruction.py to re-create the results for argument reconstruction with LLMs or the random baseline.

To run the baselines run

python run-argument-reconstruction.py eval-random claim
python run-argument-reconstruction.py eval-random p0

Baselines will randomly select a fallacy class and predict the "claim" or "p0" as the fallacious premise. If not specified otherwise, each baseline will run five times with the seeds [1,2,3,4,5]. The predictions and evaluations will be stored in the generate-classify directory.

Argument Reconstruction (LLM)

Prompts for argument reconstruction via LLMs are in the gen_cls directory. To prompt Llama2 or GPT 4 to reconstruct fallacious arguments run the run-argument-reconstruction.py script:

 python run-argument-reconstruction.py llama <prompt-template> <model-size> [<seed>] [--dev]
 python run-argument-reconstruction.py gpt4 <prompt-template> [--dev] [--overwrite]

To parse and evaluate the LLM output use:

 python run-argument-reconstruction.py parse-llm-output <file> <k> [--dev]

Arguments:

Name	Description	Example
`<prompt-template>`	Path to the prompt template (relative to the "prompt_templates" directory)	`gen_cls/p4-connect-D.txt`
`<model-size>`	Model size for Llama 2	One of (`"70b"`, `"13b"`, `"7b"`)
`<seed>`	Optional random seed (default=`1`)	`42`
`<file>`	Name (not path) of the file containing the raw LLM outputs for evaluation.	`missci_gen_cls--p4-connect-D_70b__test.jsonl`
`<k>`	For evaluation, consider the top k results.	`1`
`--dev`	If set, only instances on the validation set are used (otherwise test instances).	`--dev`
`--overwrite`	If set, existing GPT 4 predictions are not re-used but re-generated.	`--overwrite`

The LLM output will be stored in the generate-classify-raw directory. The evaluation results and predictions will be stored in the generate-classify directory.

Example:

To run LLMs using the Definition prompt template run

 python run-argument-reconstruction.py llama gen_cls/p4-connect-D.txt 70b
 python run-argument-reconstruction.py gpt4 gen_cls/p4-connect-D.txt

And to evaluate the Llama2 output run:

 python run-argument-reconstruction.py parse-llm-output missci_gen_cls--p4-connect-D_70b__test.jsonl 1

Consistency

To measure the LLM consistency by prompting LLMs to re-classify the fallacy over their generated fallacious premises use the un-get-consistency.py file:

 python run-get-consistency.py llama <file> <prompt-template> <prefix> [--dev]
 python run-get-consistency.py gpt4 <file> <prompt-template> <prefix> [--dev] [--overwrite]

Arguments:

Name	Description	Example
`<file>`	Path to the input file within the "predictions/generate-classify" directory.	`missci_gen_cls--p4-connect-D_70b__testk-1.jsonl`
`<prompt-template>`	Path to the prompt template (relative to the "prompt_templates" directory)	`cls_with_premise/classify-D.txt`
`<prefix>`	Prefix to be used when storing the results to avoid naming conflicts.	`_p4-D`
`--dev`	If set, only instances on the validation set are used (otherwise test instances).	`--dev`
`--overwrite`	If set, existing GPT 4 predictions are not re-used but re-generated.	`--overwrite`

Example:

To assess the consistency of Llama2 using the Definition prompt template run:

 python run-get-consistency.py llama missci_gen_cls--p4-connect-D_70b__testk-1.jsonl cls_with_premise/classify-D.txt _p4-D

To parse and evaluate the resulting outputs run:

 python run-get-consistency.py consistency-parse  missci_p4-D_cls_with_premise--classify-D_70b__test.jsonl

Fallacy classification (over gold premises)

To prompt LLMs to classify the fallacies over the provided gold fallacious premises run the run-fallacy-classification-with-gold-premise.py script:

 python run-fallacy-classification-with-gold-premise.py llama <prompt-template> <model-size> [<seed>] [--dev] 
 python run-fallacy-classification-with-gold-premise.py gpt4  <prompt-template> [--dev] [--overwrite]

A list of available prompts is provided in the cls_with_premise directory. Parse and evaluate with:

Arguments

Name	Description	Example
`<prompt-template>`	Path to the prompt template (relative to the "prompt_templates" directory)	`cls_with_premise/classify-D.txt`
`<model-size>`	Model size for Llama 2	One of (`"70b"`, `"13b"`, `"7b"`)
`<seed>`	Optional random seed (default=`1`)	`42`
`--dev`	If set, only instances on the validation set are used (otherwise test instances).	`--dev`
`--overwrite`	If set, existing GPT 4 predictions are not re-used but re-generated.	`--overwrite`

Example:

To run Llama2 using the Definition prompt template run:

 python run-fallacy-classification-with-gold-premise.py llama cls_with_premise/classify-D.txt 70b

To parse and evaluate the results, run:

 python run-fallacy-classification-with-gold-premise.py parse-llm-output missci_cls_with_premise--classify-D_70b__test.jsonl

Fallacy classification (without premise)

To prompt LLMs to classify the fallacies without fallacious premises run the run-fallacy-classification-without-premise.py script:

 python run-fallacy-classification-without-premise.py llama <prompt-template> <model-size> [<seed>] [--dev]
 python run-fallacy-classification-without-premise.py gpt4  <prompt-template> [--dev] [--overwrite]

A list of available prompts is provided in the cls_without_premise directory.

Arguments

Name	Description	Example
`<prompt-template>`	Path to the prompt template (relative to the "prompt_templates" directory)	`cls_without_premise/p4-connect-cls-D.txt`
`<model-size>`	Model size for Llama 2	One of (`"70b"`, `"13b"`, `"7b"`)
`<seed>`	Optional random seed (default=`1`)	`42`
`--dev`	If set, only instances on the validation set are used (otherwise test instances).	`--dev`
`--overwrite`	If set, existing GPT 4 predictions are not re-used but re-generated.	`--overwrite`

Example:

To run Llama2 using the Definition prompt template run:

 python run-fallacy-classification-without-premise.py llama cls_without_premise/p4-connect-cls-D.txt 70b

To parse and evaluate the results, run:

 python run-fallacy-classification-without-premise.py parse-llm-output missci_cls_without_premise--p4-connect-cls-D_70b__test.jsonl

Citation

When using our dataset or code, please cite us with

@inproceedings{glockner-etal-2024-missci,
    title = "Missci: Reconstructing Fallacies in Misrepresented Science",
    author = "Glockner, Max  and
      Hou, Yufang  and
      Nakov, Preslav  and
      Gurevych, Iryna",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.acl-long.240",
    doi = "10.18653/v1/2024.acl-long.240",
    pages = "4372--4405"
}

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Missci: Reconstructing the Fallacies in Misrepresented Science (ACL 2024)

Setup

Structure

How to use

Argument Reconstruction (Baselines)

Argument Reconstruction (LLM)

Consistency

Fallacy classification (over gold premises)

Fallacy classification (without premise)

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Missci: Reconstructing the Fallacies in Misrepresented Science (ACL 2024)

Setup

Structure

How to use

Argument Reconstruction (Baselines)

Argument Reconstruction (LLM)

Consistency

Fallacy classification (over gold premises)

Fallacy classification (without premise)

Citation