Skip to content

A Benchmark Dataset for Multimodal Scientific Fact Checking

License

Notifications You must be signed in to change notification settings

IIT-DM/Fin-Fact

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

logo

Fin-Fact - Multimodal Financial Fact-Checking Dataset

GitHub Issues GitHub Stars GitHub Forks License

Table of Contents

Overview

Welcome to the Fin-Fact repository! Fin-Fact is a comprehensive dataset designed specifically for financial fact-checking and explanation generation. This README provides an overview of the dataset, how to use it, and other relevant information. Click here to access the paper.

Dataset Description

  • Name: Fin-Fact
  • Purpose: Fact-checking and explanation generation in the financial domain.
  • Labels: The dataset includes various labels, including Claim, Author, Posted Date, Sci-digest, Justification, Evidence, Evidence href, Image href, Image Caption, Visualisation Bias Label, Issues, and Claim Label.
  • Size: The dataset consists of 3562 claims spanning multiple financial sectors.
  • Additional Features: The dataset goes beyond textual claims and incorporates visual elements, including images and their captions.

Dataset Usage

Fin-Fact is a valuable resource for researchers, data scientists, and fact-checkers in the financial domain. Here's how you can use it:

  1. Download the Dataset: You can download the Fin-Fact dataset here or via the Hugging Face Hub. You can also load the dataset by using the following code:
from datasets import load_dataset
dataset = load_dataset("amanrangapur/Fin-Fact")
  1. Exploratory Data Analysis: Perform exploratory data analysis to understand the dataset's structure, distribution, and any potential biases.

  2. Natural Language Processing (NLP) Tasks: Utilize the dataset for various NLP tasks such as fact-checking, claim verification, and explanation generation.

  3. Fact Checking Experiments: Train and evaluate machine learning models, including text and image analysis, using the dataset to enhance the accuracy of fact-checking systems.

Dependencies

We recommend you create an anaconda environment:

conda create --name finfact python=3.6 conda-build

Then, install Python requirements:

pip install -r requirements.txt

Run models for paper metrics

We provide scripts let you easily run our dataset on existing state-of-the-art models and re-create the metrics published in paper. You should be able to reproduce our results from the paper by following these instructions. Please post an issue if you're unable to do this. To run existing ANLI models for fact checking.

Run:

  1. BART
python anli.py --model_name 'ynie/bart-large-snli_mnli_fever_anli_R1_R2_R3-nli' --data_file finfact.json --threshold 0.5
  1. RoBERTa
python anli.py --model_name 'ynie/roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli' --data_file finfact.json --threshold 0.5
  1. ELECTRA
python anli.py --model_name 'ynie/electra-large-discriminator-snli_mnli_fever_anli_R1_R2_R3-nli' --data_file finfact.json --threshold 0.5
  1. AlBERT
python anli.py --model_name 'ynie/albert-xxlarge-v2-snli_mnli_fever_anli_R1_R2_R3-nli' --data_file finfact.json --threshold 0.5
  1. XLNET
python anli.py --model_name 'ynie/xlnet-large-cased-snli_mnli_fever_anli_R1_R2_R3-nli' --data_file finfact.json --threshold 0.5
  1. GPT-2
python gpt2_nli.py --model_name 'fractalego/fact-checking' --data_file finfact.json

Contribution

We welcome contributions from the community to help improve Fin-Fact. If you have suggestions, bug reports, or want to contribute code or data, please check our CONTRIBUTING.md file for guidelines.

License

Fin-Fact is released under the MIT License. Please review the license before using the dataset.

Contact

For questions, feedback, or inquiries related to Fin-Fact, please contact [email protected].

We hope you find Fin-Fact valuable for your research and fact-checking endeavors. Happy fact-checking!