Tathya is a comprehensive fact-checking system designed to verify claims by autonomously gathering and analyzing evidence from multiple sources. The name "Tathya" (तथ्य) comes from Sanskrit, meaning "truth" or "reality" - perfectly embodying the system's purpose of discovering factual accuracy through a rigorous, agent-driven process. It uses a sophisticated agent powered by LLMs and LangChain to dynamically select tools, conduct research, and synthesize findings, ultimately delivering a verdict with a confidence score and detailed explanation.
- 🤖 Agentic Workflow: Employs an AI agent to manage the entire fact-checking process, from claim analysis to final synthesis.
- 🛠️ Dynamic Tool Selection: The agent intelligently chooses the best tools (Search Engines, Wikidata, News APIs, Web Scrapers) based on the claim and intermediate findings.
- 🔍 Multi-source Evidence Collection: Gathers information from diverse sources like Tavily, Google Search (via Gemini), DuckDuckGo, Wikidata, and NewsAPI.
- 🧩 Claim Decomposition: Automatically breaks down complex claims into simpler, verifiable sub-questions using LLMs.
- 📊 Confidence Scoring: Provides a numerical confidence score (0.0-1.0) alongside the final verdict (TRUE, FALSE, PARTIALLY TRUE/MIXTURE, UNCERTAIN).
- 📝 Detailed Explanation: Offers a comprehensive summary explaining the agent's reasoning, citing the evidence gathered.
- 🔗 Source Attribution: Transparently lists all sources consulted and the tools used to access them.
- 🖥️ Modern Dark Mode Interface: Clean, user-friendly Streamlit interface with dark mode support.
- 🪜 Multi-step Verification Process: Shows the user the agent's step-by-step reasoning and evidence gathering process.
Tathya leverages an agentic architecture, orchestrated using principles often found in frameworks like LangGraph. Instead of a fixed pipeline, a central Fact-Checking Agent dynamically plans and executes tasks using a suite of available tools:
- Core Agent: An LLM-based agent responsible for:
- Understanding the claim.
- Planning the verification strategy.
- Selecting and invoking appropriate tools.
- Analyzing tool outputs (evidence).
- Synthesizing findings into a final verdict and explanation.
- Tool Suite: Functions the agent can call upon:
claim_decomposition_tool
: Breaks down complex claims.tavily_search
,gemini_google_search_tool
,duckduckgo_search
: General web search tools.news_search
: Queries NewsAPI for recent articles.wikidata_entity_search
: Retrieves structured data from Wikidata.scrape_webpages_tool
: Extracts content from specific URLs identified during search.- (Other potential tools)
- State Manager: Maintains the context of the investigation, including the original claim, gathered evidence, agent's thoughts, and past actions.
- REST API: Exposes the agent's fact-checking capabilities.
- Streamlit UI: Provides the user interface for interaction and result presentation.
The diagram below represents a high-level overview of the components the agent interacts with, rather than a strict linear pipeline.
The fact-checking process is now driven by the agent's autonomous reasoning:
- User Input: A user submits a factual claim via the Streamlit UI.
- API Request: The frontend sends the claim to the backend API, initiating the agent.
- Phase 1: Initial Analysis & First Search:
- The agent analyzes the claim. If complex, it uses the
claim_decomposition_tool
to break it down. - It plans and executes an initial broad search using a tool like
tavily_search
orgemini_google_search_tool
. - The agent evaluates the initial results for relevance and credibility.
- The agent analyzes the claim. If complex, it uses the
- Phase 2: Deep Investigation:
- Based on the initial findings, the agent plans its next step.
- It iteratively selects and uses tools (
duckduckgo_search
,news_search
,wikidata_entity_search
,scrape_webpages_tool
, etc.) to gather more specific evidence, analyze contradictions, or explore different angles. - After each tool call, the agent analyzes the new evidence and refines its plan. This continues until sufficient evidence (typically from at least 3 distinct sources) is gathered.
- Phase 3: Final Synthesis:
- Once the agent determines it has enough high-quality evidence, it concludes the investigation.
- It synthesizes all gathered information, determines the final verdict (TRUE, FALSE, etc.), calculates a confidence score, and writes a detailed explanation justifying the conclusion, referencing key evidence.
- Presentation: The final verdict, confidence score, explanation, step-by-step agent trace (intermediate thoughts and actions), and list of sources are presented to the user in the Streamlit interface.
- Python 3.8+
- Required API keys stored securely (e.g., in a
.env
file):- OpenAI API key (or Azure OpenAI endpoint details)
- Google AI (Gemini) API key
- Tavily API key
- NewsAPI key
-
Clone the repository:
git clone https://github.com/Kaos599/tathya-fact-checking-system.git cd tathya-fact-checking-system
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables by creating a
.env
file in the root directory:Ensure you have the necessary keys for the tools you intend the agent to use.
-
Start the backend API server:
# Navigate to the API directory if your structure requires it # cd fact_check_system/api uvicorn fact_check_system.api.main:app --reload --host 0.0.0.0 --port 8000 # Or if using Flask/other framework, adjust the command accordingly # python fact_check_system/api/main.py
The API will typically be available at
http://127.0.0.1:8000
. Check the console output. -
Start the Streamlit frontend in a separate terminal:
streamlit run app.py
The app will usually be available at
http://localhost:8501
.
Challenge the agent with various claims:
- "Does India have the largest population as of mid-2024?"
- "Is the boiling point of water always 100 degrees Celsius?"
- "Did the James Webb Space Telescope launch before 2022?"
- "Elon Musk is the CEO of Neuralink."
- "Which team won the last FIFA World Cup?"
The system provides a REST API endpoint to trigger the fact-checking agent:
POST /check
Content-Type: application/json
{
"claim": "Your claim text here",
"language": "en" // Optional, defaults might apply
}
Example Response:
{
"claim": "Your claim text here",
"verdict": "PARTIALLY TRUE/MIXTURE", // Or TRUE, FALSE, UNCERTAIN
"confidence_score": 0.75,
"explanation": "Detailed explanation generated by the agent, summarizing the evidence and reasoning...",
"intermediate_steps": [ // Optional: Could include agent's thought process
{ "thought": "Initial thought...", "action": "ToolX", "input": "...", "observation": "..." },
// ... more steps
],
"sources": [
{
"url": "https://example.com/source1",
"title": "Source Title 1",
"snippet": "Relevant excerpt from source 1...",
"tool_used": "tavily_search"
},
{
"url": "https://newssite.com/article",
"title": "Recent News Article",
"snippet": "Latest developments...",
"tool_used": "news_search"
}
// ... other sources
]
}
(Note: The exact response structure might vary based on implementation details, especially regarding intermediate steps.)
Contributions are welcome! If you have suggestions, bug reports, or want to add new tools or features, please feel free to:
- Open an issue to discuss the change.
- Fork the repository.
- Create a new branch (
git checkout -b feature/YourFeature
). - Make your changes.
- Commit your changes (
git commit -m 'Add some feature'
). - Push to the branch (
git push origin feature/YourFeature
). - Open a Pull Request.
This project is licensed under the MIT License - see the LICENSE
file for details.