A curated list of papers about hallucination in multi-modal large language models (MLLMs)
This section collects the survey papers about MLLM's hallucination.
-
A Survey on Hallucination in Large Vision-Language Models [paper]
Arxiv 2024/02
-
Hallucination of Multimodal Large Language Models: A Survey [paper]
Arxiv 2024/04
This section collects the benchmark papers on evaluating MLLM's hallucination.
-
Evaluating Object Hallucination in Large Vision-Language Models [paper] [code]
EMNLP 2023
-
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models [paper] [code]
CVPR 2024
-
Aligning Large Multimodal Models with Factually Augmented RLHF [paper] [code]
Arxiv 2023/09
-
An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation [paper] [code]
Arxiv 2023/11
-
Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges [paper] [code]
Arxiv 2023/11
-
Hallucination Benchmark in Medical Visual Question Answering [paper]
Arxiv 2024/01
-
The Instinctive Bias: Spurious Images lead to Hallucination in MLLMs [paper] [code]
Arxiv 2024/02
-
Unified Hallucination Detection for Multimodal Large Language Models [paper] [code]
Arxiv 2024/02
-
Visual Hallucinations of Multi-modal Large Language Models [paper] [code]
Arxiv 2024/02
-
Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models [paper]
Arxiv 2024/02
-
PhD: A Prompted Visual Hallucination Evaluation Dataset [paper] [code]
Arxiv 2024/03
-
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models [paper] [code]
Arxiv 2024/04
-
THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models [paper]
Arxiv 2024/05
-
Evaluating the Quality of Hallucination Benchmarks for Large Vision-Language Models [paper] [code]
Arxiv 2024/06
-
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning [paper][code]
Arxiv 2024/07
-
Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs [paper][code]
Arxiv 2024/08
-
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models [paper] [code]
Arxiv 2024/06
-
Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models [paper]
Arxiv 2024/08
-
PhD: A Prompted Visual Hallucination Evaluation Dataset [paper]
Arxiv 2024/08
-
Understanding Multimodal Hallucination with Parameter-Free Representation Alignment (Pfram) [paper] [code]
Arxiv 2024/09
-
Pre-Training Multimodal Hallucination Detectors with Corrupted Grounding Data [paper]
Arxiv 2024/09
-
Explore the Hallucination on Low-level Perception for MLLMs [paper]
Arxiv 2024/09
-
ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models [paper]
Arxiv 2024/09
-
FIHA: Autonomous Hallucination Evaluation in Vision-Language Models with Davidson Scene Graphs [paper] [code]
Arxiv 2024/09
-
EventHallusion: Diagnosing Event Hallucinations in Video LLMs [paper] [code]
Arxiv 2024/09
-
AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models [paper] [code]
Arxiv 2024/10
-
Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language Models [paper] [code]
Arxiv 2024/10
-
LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models [paper] [code]
Arxiv 2024/10
-
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio [paper] [code]
Arxiv 2024/10
-
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models [paper] [code]
Arxiv 2024/10
-
UNIFIED TRIPLET-LEVEL HALLUCINATION EVALUATION FOR LARGE VISION-LANGUAGE MODELS [paper] [code]
Arxiv 2024/11
-
H-POPE: Hierarchical Polling-based Probing Evaluation of Hallucinations in Large Vision-Language Models [paper]
Arxiv 2024/11
-
VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation [paper] [code]
Arxiv 2024/11
-
ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models [paper]
Arxiv 2024/11
This section collects the papers on mitigating the MLLM's hallucination.
-
Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning [paper] [code]
ICLR 2024
-
Analyzing and Mitigating Object Hallucination in Large Vision-Language Models [paper] [code]
ICLR 2024
-
VIGC: Visual Instruction Generation and Correction [paper][code]
AAAI 2024
-
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation [paper] [code]
CVPR 2024
-
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding [paper] [code]
CVPR 2024
-
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model [paper]
CVPR 2024
-
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback [paper] [code]
CVPR 2024
-
Detecting and Preventing Hallucinations in Large Vision Language Models [paper]
Arxiv 2023/08
-
Evaluation and Analysis of Hallucination in Large Vision-Language Models [paper][code]
Arxiv 2023/08
-
CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning [paper]
Arxiv 2023/09
-
Evaluation and Mitigation of Agnosia in Multimodal Large Language Models [paper]
Arxiv 2023/09
-
Aligning Large Multimodal Models with Factually Augmented RLHF [paper] [code]
Arxiv 2023/09
-
HallE-Switch: Rethinking and Controlling Object Existence Hallucinations in Large Vision Language Models for Detailed Caption [paper]
Arxiv 2023/10
-
Woodpecker: Hallucination Correction for Multimodal Large Language Models [paper] [code]
Arxiv 2023/10
-
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data [paper] [code]
Arxiv 2023/11
-
VOLCANO: Mitigating Multimodal Hallucination through Self-Feedback Guided Revision [paper] [code]
Arxiv 2023/11
-
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization [paper]
Arxiv 2023/11
-
Mitigating Hallucination in Visual Language Models with Visual Supervision [paper]
Arxiv 2023/11
-
Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites [paper] [code]
Arxiv 2023/12
-
MOCHa: Multi-Objective Reinforcement Mitigating Caption Hallucinations [paper] [code]
Arxiv 2023/12
-
Temporal Insight Enhancement: Mitigating Temporal Hallucination in Multimodal Large Language Models [paper]
Arxiv 2024/01
-
On the Audio Hallucinations in Large Audio-Video Language Models [paper]
Arxiv 2024/01
-
Skip \n: A simple method to reduce hallucination in Large Vision-Language Models [paper]
Arxiv 2024/02
-
Unified Hallucination Detection for Multimodal Large Language Models [paper] [code]
Arxiv 2024/02
-
Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance [paper]
Arxiv 2024/02
-
EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models [paper]
Arxiv 2024/02
-
Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models [paper] [code]
Arxiv 2024/02
-
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective [paper] [code]
Arxiv 2024/02
-
Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding [paper]
Arxiv 2024/02
-
IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding [paper]
Arxiv 2024/02
-
HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding [paper] [code]
Arxiv 2024/03
-
Evaluating and Mitigating Number Hallucinations in Large Vision-Language Models: A Consistency Perspective [paper]
Arxiv 2024/03
-
Debiasing Large Visual Language Models [paper]
Arxiv 2024/03
-
AIGCs Confuse AI Too: Investigating and Explaining Synthetic Image-induced Hallucinations in Large Vision-Language Models [paper]
Arxiv 2024/03
-
What if...?: Counterfactual Inception to Mitigate Hallucination Effects in Large Multimodal Models [paper]
Arxiv 2024/03
-
Multi-Modal Hallucination Control by Visual Information Grounding [paper]
Arxiv 2024/03
-
Pensieve: Retrospect-then-Compare Mitigates Visual Hallucination [paper] [code]
Arxiv 2024/03
-
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art [paper]
Arxiv 2024/03
-
Cartoon Hallucinations Detection: Pose-aware In Context Visual Learning [paper]
Arxiv 2024/03
-
Visual Hallucination: Definition, Quantification, and Prescriptive Remediations [paper]
Arxiv 2024/03
-
Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models [paper]
Arxiv 2024/03
-
Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding [paper]
Arxiv 2024/03
-
Automated Multi-level Preference for MLLMs [paper]
Arxiv 2024/05
-
CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models [paper]
Arxiv 2024/05
-
VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap [paper]
Arxiv 2024/05
-
Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization [paper]
Arxiv 2024/05
-
Mitigating Dialogue Hallucination for Large Vision Language Models via Adversarial Instruction Tuning [paper]
Arxiv 2024/05
-
RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in LVLMs [paper]
Arxiv 2024/05
-
MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification [paper]
Arxiv 2024/05
-
Mitigating Object Hallucination via Data Augmented Contrastive Tuning [paper]
Arxiv 2024/05
-
NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models [paper] [code]
Arxiv 2024/06
-
CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models [paper] [code]
Arxiv 2024/06
-
Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models [paper]
Arxiv 2024/06
-
Detecting and Evaluating Medical Hallucinations in Large Vision Language Models [paper]
Arxiv 2024/06
-
AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models [paper]
Arxiv 2024/06
-
Hallucination Mitigation Prompts Long-term Video Understanding [paper] [code]
Arxiv 2024/06
-
Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning? [paper]
Arxiv 2024/06
-
Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models? [paper]
Arxiv 2024/06
-
VGA: Vision GUI Assistant - Minimizing Hallucinations through Image-Centric Fine-Tuning [paper]
Arxiv 2024/06
-
AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention [paper] [code]
Arxiv 2024/06
-
Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models [paper] [code]
Arxiv 2024/06
-
Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification [paper]
Arxiv 2024/06
-
Multi-Object Hallucination in Vision-Language Models [paper] [code]
Arxiv 2024/07
-
Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models [paper] [code]
Arxiv 2024/07
-
BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models [paper] [code]
Arxiv 2024/07
-
Interpreting and Mitigating Hallucination in MLLMs through Multi-agent Debate [paper] [code]
Arxiv 2024/07
-
Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs [paper][code]
Arxiv 2024/08
-
Mitigating Multilingual Hallucination in Large Vision-Language Models [paper] [code]
Arxiv 2024/08
-
Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation [paper]
Arxiv 2024/08
-
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models [paper] [code]
Arxiv 2024/08
-
Mitigating Hallucinations in Large Vision-Language Models (LVLMs) via Language-Contrastive Decoding (LCD) [paper]
Arxiv 2024/08
-
Reference-free Hallucination Detection for Large Vision-Language Models [paper]
Arxiv 2024/08
-
Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models [paper]
Arxiv 2024/08
-
CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs [paper]
Arxiv 2024/08
-
ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models [paper] [code]
Arxiv 2024/08
-
Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning [paper]
Arxiv 2024/08
-
Mitigating Hallucination in Visual-Language Models via Re-Balancing Contrastive Decoding [paper]
Arxiv 2024/09
-
EventHallusion: Diagnosing Event Hallucinations in Video LLMs [paper] [code]
Arxiv 2024/09
-
A Unified Hallucination Mitigation Framework for Large Vision-Language Models [paper]
Arxiv 2024/09
-
HELPD: Mitigating Hallucination of LVLMs by Hierarchical Feedback Learning with Vision-enhanced Penalty Decoding [paper] [code]
Arxiv 2024/09
-
INTERPRETING AND EDITING VISION-LANGUAGE REPRESENTATIONS TO MITIGATE HALLUCINATIONS [paper] [code]
Arxiv 2024/10
-
Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models [paper] [code]
Arxiv 2024/10
-
Investigating and Mitigating Object Hallucinations in Pretrained Vision-Language (CLIP) Models [paper] [code]
Arxiv 2024/10
-
Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models [paper]
Arxiv 2024/10
-
Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality [paper] [code]
Arxiv 2024/10
-
DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination [paper]
Arxiv 2024/10
-
From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models [paper]
Arxiv 2024/10
-
Data-augmented phrase-level alignment for mitigating object hallucination [paper]
Arxiv 2024/10
-
Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs [paper] [code]
Arxiv 2024/10
-
Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions [paper]
Arxiv 2024/10
-
MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation [paper] [code]
Arxiv 2024/10
-
Mitigating Hallucinations in Large Vision-Language Models via Summary-Guided Decoding [paper] [code]
Arxiv 2024/10
-
Mitigating Object Hallucination via Concentric Causal Attention [paper] [code]
Arxiv 2024/10
-
Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning [paper]
Arxiv 2024/10
-
V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization [paper] [code]
Arxiv 2024/11
-
Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization [paper]
Arxiv 2024/11
-
Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs [paper]
Arxiv 2024/11
-
Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination [paper] [code]
Arxiv 2024/11
-
CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs [paper]
Arxiv 2024/11