ProVe (Provenance Verification for Wikidata claims)

Overview

ProVe is a system designed to automatically verify claims and references in Wikidata. It extracts claims from Wikidata entities, fetches the referenced URLs, processes the HTML content, and uses NLP models to determine whether the claims are supported by the referenced content.

System Architecture

The RQV system consists of several key components:

Data Collection and Processing:
- WikidataParser: Extracts claims and URLs from Wikidata based on QID (item identifier)
- HTMLFetcher: Collects HTML content from reference URLs
- HTMLSentenceProcessor: Converts HTML to sentences for analysis
Evidence Selection and Verification:
- EvidenceSelector: Selects relevant sentences as evidence
- ClaimEntailmentChecker: Verifies entailment relationship between claims and evidence
NLP Models:
- TextualEntailmentModule: Checks textual entailment relationships
- SentenceRetrievalModule: Retrieves relevant sentences
- VerbModule: Handles verbalization processing
Data Storage:
- MongoDB: Stores HTML content, entailment results, parser statistics, and status information
- SQLite: Stores verification results for API access
Service Structure:
- ProVe_main_service.py: Main service logic
- ProVe_main_process.py: Entity processing logic
- background_processing.py: Background processing tasks

Setup Instructions

1. Install Dependencies

pip install -r requirements.txt

2. Download NLP Models

The 'base' folder contains essential NLP models for the RQV tool, including pre-trained & fine-tuned BERT, T5, and related parsers and NLP models.

Download from:

https://emckclac-my.sharepoint.com/:f:/r/personal/k2369089_kcl_ac_uk/Documents/base?csf=1&web=1&e=TBo3nE

Place the downloaded 'base' folder in the project root directory.

3. Configure the System

Review and modify the config.yaml file to adjust database settings, HTML fetching parameters, and evidence selection thresholds.

Usage

Processing a Single Entity

from ProVe_main_process import initialize_models, process_entity

# Initialize models
models = initialize_models()

# Process entity by QID
qid = 'Q44'  # Example: Barack Obama
html_df, entailment_results, parser_stats = process_entity(qid, models)

Running the Service

The main service can be started by running:

python ProVe_main_service.py

This will start the MongoDB handler and schedule background processing tasks.

Background Processing

The system can automatically process:

Top viewed Wikidata items
Items from a pagepile list
Random QIDs

Configuration

The config.yaml file contains important settings:

Database configurations
Algorithm version
HTML fetching parameters (batch size, delay, timeout)
Text processing settings
Evidence selection parameters

Data Flow

A Wikidata QID is provided to the system
The system extracts claims and reference URLs from the entity
HTML content is fetched from the reference URLs
The HTML is processed into sentences
Relevant sentences are selected as evidence
NLP models verify if the evidence supports the claims
Results are stored in the database

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
api		api
front		front
scripts		scripts
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
ProVe_heuristic_service.py		ProVe_heuristic_service.py
ProVe_main_process.py		ProVe_main_process.py
ProVe_main_service.py		ProVe_main_service.py
README.md		README.md
background_processing.py		background_processing.py
claim_entailment.py		claim_entailment.py
config.yaml		config.yaml
functions.py		functions.py
index.html		index.html
properties_to_remove.json		properties_to_remove.json
refs_html_collection.py		refs_html_collection.py
refs_html_to_evidences.py		refs_html_to_evidences.py
requirements.txt		requirements.txt
swagger.json		swagger.json
test_functions.py		test_functions.py
wikidata_parser.py		wikidata_parser.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ProVe (Provenance Verification for Wikidata claims)

Overview

System Architecture

Setup Instructions

1. Install Dependencies

2. Download NLP Models

3. Configure the System

Usage

Processing a Single Entity

Running the Service

Background Processing

Configuration

Data Flow

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

King-s-Knowledge-Graph-Lab/ProVe

Folders and files

Latest commit

History

Repository files navigation

ProVe (Provenance Verification for Wikidata claims)

Overview

System Architecture

Setup Instructions

1. Install Dependencies

2. Download NLP Models

3. Configure the System

Usage

Processing a Single Entity

Running the Service

Background Processing

Configuration

Data Flow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages