VLDBench: Vision Language Models Disinformation Detection Benchmark

1Vector Institute for Artificial Intelligence 2University of Central Florida 3The University of Texas at Austin 4Sheridan College 5York University 6University of Groningen

Disinformation in the age of generative AI is no longer confined to isolated claims or doctored images; it increasingly manifests as complex, multimodal narratives that mimic credible journalism. VLDBench addresses this challenge by offering the first large-scale, human-verified benchmark for disinformation detection spanning both text-only and image-text formats. Built in alignment with global AI governance standards, it serves as a resource for evaluating model reliability, robustness, and interpretability across real-world scenarios.

VLDBench diagram

To address this, we introduce VLDBench.

VLDBench Framework

Figure: VLDBench Framework system comprises five stages: (1) Define Task – formalizing the detection objective; (2) Data Pipeline – curating and preprocessing real-world multimodal news content; (3) Annotation Pipeline – generating labels via human and LLM-assisted review; (4) Human Review – validating annotations through expert oversight; and (5) Benchmarking – evaluating models for accuracy, reasoning, and risk mitigation across fine-tuning, zero-shot, and robustness scenarios.



Abstract

The rise of AI-generated content has amplified the challenge of detecting multimodal disinformation—i.e., online posts/articles that contain images and texts with fabricated information, is specially designed to deceive. While prior AI safety benchmarks focus on bias and toxicity, multimodal disinformation detection remains underexplored. To address this challenge, we present the Vision-Language Disinformation Detection Benchmark \textbf{(VLDBench)}, the first comprehensive benchmark for detecting disinformation across both unimodal (text-only) and multimodal (text and image) content, comprising 31,000 news article-image pairs, spanning 13 distinct categories, for robust evaluation. \textbf{VLDBench} features a rigorous semi-automated data curation pipeline, with 22 domain experts dedicating 300+ hour\textbf{s} to annotate the entire 31k samples, achieving a strong inter-annotator agreement (Cohen’s $\kappa = 0.78$). We extensively evaluate state-of-the-art Large Language Models (LLMs) and Vision-Language Models (VLMs), demonstrating that integrating textual and visual cues in multimodal news posts improves disinformation detection accuracy by 5–35\% compared to unimodal models. Developed in alignment with AI governance frameworks such as the EU AI Act, NIST guidelines, and the MIT AI Risk Repository 2024, \textbf{VLDBench} is expected to become a benchmark for detecting disinformation in online multi-modal contents.


Main contributions:
  1. VLDBench: We present VLDBench, the largest human-verified benchmark for disinformation detection in both unimodal and multimodal settings. It contains 31.3K news articles with paired images, sourced from 58 outlets and spanning 13 categories, all collected under rigorous ethical guidelines.
  2. Task coverage: VLDBench offers 62K labeled instances supporting two evaluation formats: (i) binary classification for text or text-image pairs and (ii) open-ended multimodal reasoning.
  3. Expert annotation quality: Twenty-two domain experts devoted more than 500 hours to annotation, yielding substantial agreement (Cohen’s κ = 0.78) and ensuring high data reliability.
  4. Comprehensive benchmarking: We evaluate nineteen state-of-the-art open-source models (ten vision–language models and nine language models) on VLDBench, exposing systematic performance gaps and model-specific failure modes that inform responsible-AI governance and risk monitoring.

VLDBench Dataset Overview

VLDBench is a comprehensive classification multimodal benchmark for disinformation detection in news articles. We categorized our data into 13 unique news categories by providing image-text pairs to GPT-4o.

Table: Comparison of VLDBench with contemporary datasets. The Annotation (👤) means Manual and (👤,⚙️) indicates Hybrid (Human+AI). Access (✔️) refers to open-source and () indicates Request required. The Real is defined as (✔️) if it’s real-world data and () if it’s Synthetic data. *Multiple includes Politics, National, Business & Finance, International, Local/Regional, Entertainment, Opinion/Editorial, Health, Other, Sports, Technology, Weather & Environment, Science.

Data Annotation Pipeline

Figure: Summary statistics for VLDBench. Each article is annotated twice, once as text-only and once as text +image, yielding 62,678 labelled instances.

Figure: Summary statistics for VLDBench. Each article is annotated twice, once as text-only and once as text +image, yielding 62,678 labelled instances.

Figure: Disinformation Trends Across News Categories generated by GPT-4o based on disinformation narratives and confidence levels.



LLMs and VLMs used for Benchmarking

Language-Only LLMs Vision-Language Models (VLMs)
Phi-3-mini-128k-instruct Phi-3-Vision-128k-Instruct
Vicuna-7B-v1.5 LLaVA-v1.5-Vicuna7B
Mistral-7B-Instruct-v0.3 LLaVA-v1.6-Mistral-7B
Qwen2-7B-Instruct Qwen2-VL-7B-Instruct
InternLM2-7B InternVL2-8B
DeepSeek-V2-Lite-Chat Deepseek-VL2-small
GLM-4-9B-chat GLM-4V-9B
LLaMA-3.1-8B-Instruct LLaMA-3.2-11B-Vision
LLaMA-3.2-1B-Instruct Deepseek Janus-Pro-7B
Pixtral


Experimental results on VLDBench

Instruction Fine-Tuning on VLDBench Improves Detection Performance

Figure: Comparison of zero-shot vs. instruction-fine-tuned (IFT) performance, with 95% confidence intervals computed from three independent runs.

Adversarial Robustness: Combined Modality is More Vulnerable

Figure: Textual Perturbations. We describe three controlled text perturbations—Synonym Substitution, Misspelling, and Negation—and analyse how each distorts meaning. Our evaluation shows that text negation flips factual statements most often into disinformation, driving the largest drop in model accuracy

Figure: Visual Perturbations. We introduce five image attacks—Gaussian Blur, Additive Noise, Resizing, Cross-Modal Mismatch (C-M), and Both-Modality (B-M)—and report that the cross-modal and combined attacks cause the greatest misclassification under multimodal setting.

Human Evaluation Demonstrates Reliability and Reasoning Depth

Figure: Human evaluation results on a 500-sample test set. Models were tasked with classifying disinformation and justifying their predictions. PC = prediction correctness, RC = reasoning clarity (mean ± std.).

AI Risk Mitigation and Governance Alignment

Figure: Mapping of VLDBench components to risk mitigation strategies outlined in the AI MIT Risk and Responsibility Repository. Each pipeline component addresses specific risks related to privacy, misinformation, discrimination, robustness, and interpretability.

Conclusion

VLDBench addresses the urgent challenge of AI-era disinformation through responsible data stewardship, prioritizing human-centered design, ethical data sourcing, and governance-aligned evaluation (EU AI Act, NIST). Unlike existing benchmarks, it is the first to explicitly evaluate modern LLMs/ VLMs on disinformation topic, with 62k multimodal samples spanning 13 topical categories (e.g., sports, politics). While compatible with traditional ML models, its design focuses on emerging multimodal threats. Some limitations need discussion : (1) reliance on pre-verified news sources risks sampling bias, (2) hybrid AI-human annotations may inherit annotator biases, and (3) the English-only corpus limits multilingual applicability. Future work should expand to adversarial cross-modal attacks (e.g., deepfake text-image contradictions) and low-resource languages. Despite these constraints, VLDBench establishes a foundational step toward systematic disinformation benchmarking, enabling researchers to stress-test models agains real-world deception tactics while adhering to AI governance frameworks.


For additional details about VLDBench evaluation and experimental results, please refer to our main paper. Thank you!

BibTeX

@misc{raza2025vldbenchvisionlanguagemodels,
        title={VLDBench: Vision Language Models Disinformation Detection Benchmark}, 
        author={Shaina Raza and Ashmal Vayani and Aditya Jain and Aravind Narayanan and Vahid Reza Khazaie and Syed Raza Bashir and Elham Dolatabadi and Gias Uddin and Christos Emmanouilidis and Rizwan Qureshi and Mubarak Shah},
        year={2025},
        eprint={2502.11361},
        archivePrefix={arXiv},
        primaryClass={cs.CL},
        url={https://arxiv.org/abs/2502.11361}, 
  }