VLDBench
Evaluating Multimodal Disinformation with Regulatory Alignment

Shaina Raza¹, Ashmal Vayani², Aditya Jain³, Aravind Narayanan¹, Vahid Reza Khazaie¹ Syed Raza Bashir⁴, Elham Dolatabad⁵, Gias Uddin⁵, Christos Emmanouilidis⁶, Rizwan Qureshi², Mubarak Shah²

¹Vector Institute for Artificial Intelligence ²University of Central Florida ³The University of Texas at Austin ⁴Sheridan College ⁵York University ⁶University of Groningen

arXiv Dataset Code BibTex

Figure: VLDBench Framework system comprises five stages: (1) Define Task – formalizing the detection objective; (2) Data Pipeline – curating and preprocessing real-world multimodal news content; (3) Annotation Pipeline – generating labels via human and LLM-assisted review; (4) Human Review – validating annotations through expert oversight; and (5) Benchmarking – evaluating models for accuracy, reasoning, and risk mitigation across fine-tuning, zero-shot, and robustness scenarios.

Abstract

The rise of AI-generated content has amplified the challenge of detecting multimodal disinformation—i.e., online posts/articles that contain images and texts with fabricated information, is specially designed to deceive. While prior AI safety benchmarks focus on bias and toxicity, multimodal disinformation detection remains underexplored. To address this challenge, we present the Vision-Language Disinformation Detection Benchmark (VLDBench), the first comprehensive benchmark for detecting disinformation across both unimodal (text-only) and multimodal (text and image) content, comprising 31,000 news article-image pairs, spanning 13 distinct categories, for robust evaluation. VLDBench features a rigorous semi-automated data curation pipeline, with 22 domain experts dedicating 500+ hours to annotate the entire 31k samples, achieving a strong inter-annotator agreement (Cohen’s kappa = 0.78). We extensively evaluate state-of-the-art Large Language Models (LLMs) and Vision-Language Models (VLMs), demonstrating that integrating textual and visual cues in multimodal news posts improves disinformation detection accuracy by 5–35% compared to unimodal models. Developed in alignment with AI governance frameworks such as the EU AI Act, NIST guidelines, and the MIT AI Risk Repository 2024, VLDBench is expected to become a benchmark for detecting disinformation in online multi-modal contents.

Main contributions:

VLDBench: We present VLDBench, the largest human-verified benchmark for disinformation detection in both unimodal and multimodal settings. It contains 31.3K news articles with paired images, sourced from 58 outlets and spanning 13 categories, all collected under rigorous ethical guidelines.
Task coverage: VLDBench offers 62K labeled instances supporting two evaluation formats: (i) binary classification for text or text-image pairs and (ii) open-ended multimodal reasoning.
Expert annotation quality: Twenty-two domain experts devoted more than 500 hours to annotation, yielding substantial agreement (Cohen’s κ = 0.78) and ensuring high data reliability.
Comprehensive benchmarking: We evaluate nineteen state-of-the-art open-source models (ten vision–language models and nine language models) on VLDBench, exposing systematic performance gaps and model-specific failure modes that inform responsible-AI governance and risk monitoring.

VLDBench is a comprehensive classification multimodal benchmark for disinformation detection in news articles. We categorized our data into 13 unique news categories by providing image-text pairs to GPT-4o.

Table: Comparison of VLDBench with contemporary datasets. The Annotation (👤) means Manual and (👤,⚙️) indicates Hybrid (Human+AI). Access (✔️) refers to open-source and (❗) indicates Request required. The Real is defined as (✔️) if it’s real-world data and (❌) if it’s Synthetic data. ^*Multiple includes Politics, National, Business & Finance, International, Local/Regional, Entertainment, Opinion/Editorial, Health, Other, Sports, Technology, Weather & Environment, Science.

Data Annotation Pipeline

Figure: Summary statistics for VLDBench. Each article is annotated twice, once as text-only and once as text +image, yielding 62,678 labelled instances.

Figure: Disinformation Trends Across News Categories generated by GPT-4o based on disinformation narratives and confidence levels.

LLMs and VLMs used for Benchmarking

Language-Only LLMs	Vision-Language Models (VLMs)
Phi-3-mini-128k-instruct	Phi-3-Vision-128k-Instruct
Vicuna-7B-v1.5	LLaVA-v1.5-Vicuna7B
Mistral-7B-Instruct-v0.3	LLaVA-v1.6-Mistral-7B
Qwen2-7B-Instruct	Qwen2-VL-7B-Instruct
InternLM2-7B	InternVL2-8B
DeepSeek-V2-Lite-Chat	Deepseek-VL2-small
GLM-4-9B-chat	GLM-4V-9B
LLaMA-3.1-8B-Instruct	LLaMA-3.2-11B-Vision
LLaMA-3.2-1B-Instruct	Deepseek Janus-Pro-7B
	Pixtral

Experimental results on VLDBench

Instruction Fine-Tuning on VLDBench Improves Detection Performance

Figure: Comparison of zero-shot vs. instruction-fine-tuned (IFT) performance, with 95% confidence intervals computed from three independent runs.

Adversarial Robustness: Combined Modality is More Vulnerable

Figure: Textual Perturbations. We describe three controlled text perturbations—Synonym Substitution, Misspelling, and Negation—and analyse how each distorts meaning. Our evaluation shows that text negation flips factual statements most often into disinformation, driving the largest drop in model accuracy

Figure: Visual Perturbations. We introduce five image attacks—Gaussian Blur, Additive Noise, Resizing, Cross-Modal Mismatch (C-M), and Both-Modality (B-M)—and report that the cross-modal and combined attacks cause the greatest misclassification under multimodal setting.

Human Evaluation Demonstrates Reliability and Reasoning Depth

Figure: Human evaluation results on a 500-sample test set. Models were tasked with classifying disinformation and justifying their predictions. PC = prediction correctness, RC = reasoning clarity (mean ± std.).

AI Risk Mitigation and Governance Alignment

Figure: Mapping of VLDBench components to risk mitigation strategies outlined in the AI MIT Risk and Responsibility Repository. Each pipeline component addresses specific risks related to privacy, misinformation, discrimination, robustness, and interpretability.

Conclusion

VLDBench addresses the urgent challenge of AI-era disinformation through responsible data stewardship, prioritizing human-centered design, ethical data sourcing, and governance-aligned evaluation (EU AI Act, NIST). Unlike existing benchmarks, it is the first to explicitly evaluate modern LLMs/ VLMs on disinformation topic, with 62k multimodal samples spanning 13 topical categories (e.g., sports, politics). While compatible with traditional ML models, its design focuses on emerging multimodal threats. Some limitations need discussion : (1) reliance on pre-verified news sources risks sampling bias, (2) hybrid AI-human annotations may inherit annotator biases, and (3) the English-only corpus limits multilingual applicability. Future work should expand to adversarial cross-modal attacks (e.g., deepfake text-image contradictions) and low-resource languages. Despite these constraints, VLDBench establishes a foundational step toward systematic disinformation benchmarking, enabling researchers to stress-test models agains real-world deception tactics while adhering to AI governance frameworks.

For additional details about VLDBench evaluation and experimental results, please refer to our main paper. Thank you!

BibTeX

@misc{raza2025vldbenchvisionlanguagemodels,
        title={VLDBench: Vision Language Models Disinformation Detection Benchmark}, 
        author={Shaina Raza and Ashmal Vayani and Aditya Jain and Aravind Narayanan and Vahid Reza Khazaie and Syed Raza Bashir and Elham Dolatabadi and Gias Uddin and Christos Emmanouilidis and Rizwan Qureshi and Mubarak Shah},
        year={2025},
        eprint={2502.11361},
        archivePrefix={arXiv},
        primaryClass={cs.CL},
        url={https://arxiv.org/abs/2502.11361}, 
  }

VLDBench Evaluating Multimodal Disinformation with Regulatory Alignment