Skip to content

Papers

Selected publications and preprints from the AIXpert project. Each entry links to arXiv where available.


AIXpert project papers

TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems

Paper (AI Open, Elsevier 2026) · AI Open

Authors: Shaina Raza, Ranjan Sapkota, Manoj Karkee, Christos Emmanouilidis.

A review of trust, risk, and security management (TRiSM) in LLM-based agentic and multi-agent systems.


Evaluating and Regulating Agentic AI: A Study of Benchmarks, Metrics and Regulation

Paper (Information Fusion, forthcoming) · arXiv Code · GitHub Project · Project

Authors: Azib Farooq, Shaina Raza, Nazmul Karim, Hasan Iqbal, Athanasios V. Vasilakos, Christos Emmanouilidis.

Survey of benchmarks, metrics, and governance for evaluating agentic AI in single- and multi-agent systems, toward trustworthy and auditable agents.


Just as Humans Need Vaccines, So Do Models: Model Immunization to Combat Falsehoods

Paper (WCCI 2026, IJCNN) · arXiv Code · GitHub Project · Project

Authors: Shaina Raza, Rizwan Qureshi, Azib Farooq, Marcelo Lotif, Aman Chadha, Deval Pandya, Christos Emmanouilidis.

Position paper on model immunization: SFT with small doses of (false claim, correction) pairs alongside truthful data to supervise falsehoods directly. Across four open-weight families, +12 TruthfulQA and +30 misinformation-rejection points with negligible capability loss.


From Features to Actions: Explainability in Traditional and Agentic AI

Paper · arXiv Code · GitHub Project · Project

Authors: Sindhuja Chaduvula, Jessee Ho, Kina Kim, Aravind Narayanan, Mahshid Alinoori, Muskan Garg, Dhanesh Ramachandram, Shaina Raza.

Compares attribution-based explanations (SHAP, LIME) with trace-based diagnostics across static and agentic settings. Attribution is stable for static prediction (Spearman ρ = 0.86) but fails to diagnose agentic failures; trace-grounded rubrics localize breakdowns (e.g. state-tracking inconsistency 2.7× more in failed runs, −49% success), motivating trajectory-level explainability for agentic systems.


SONIC-O1: A Real-World Benchmark for Evaluating Multimodal Large Language Models on Audio-Video Understanding

Paper · arXiv Code · GitHub Dataset · Hugging Face Leaderboard · Leaderboard

Authors: Ahmed Y. Radwan, Christos Emmanouilidis, Hina Tabassum, Deval Pandya, Shaina Raza.

SONIC-O1, a fully human-verified real-world audio-video benchmark with 4,958 annotations across 13 conversational domains. We evaluate multimodal models on video summarization, evidence-grounded QA, and temporal event localization, and release an extensible evaluation suite to support reproducible benchmarking and robustness analysis.


Reducing Hallucinations in LLMs via Factuality-Aware Preference Learning

Paper (ACL 2026 Findings) · arXiv Code · GitHub Dataset · Hugging Face Project · Project

Authors: Sindhuja Chaduvula, Ahmed Y. Radwan, Azib Farooq, Yani Ioannou, Shaina Raza.

Preference-learning method (F-DPO) that targets factuality directly, improving factuality scores while reducing hallucination rates across multiple open-weight LLMs.


Transparency in Agentic AI: A Survey of Interpretability, Explainability, and Governance

Paper · arXiv Project · Project

Authors: Shaina Raza, Ahmed Y. Radwan, Sindhuja Chaduvula, Mahshid Alinoori, Christos Emmanouilidis.

Agentic AI systems—LLM-based agents with planning, memory, and tool use—introduce transparency challenges that are poorly served by explainability methods designed for single-step predictions. This article surveys and synthesizes interpretability and explainability techniques relevant to agentic behavior across the agent lifecycle, organized using a five-axis taxonomy: cognitive objects being inspected, assurance objectives being targeted, mechanisms employed, lifecycle stages, and stakeholders served.


Bias in the Picture: Benchmarking VLMs with Social-Cue News Images and LLM-as-Judge Assessment

Paper (NeurIPS 2025 LLM-eval Workshop) · arXiv Code · GitHub

Authors: Aravind Narayanan, Vahid Reza Khazaie, Shaina Raza.

Benchmarking vision-language models with social-cue news images and LLM-as-judge assessment.


Responsible Agentic Reasoning and AI Agents—A Critical Survey

Paper · arXiv

Authors: Shaina Raza (Vector Institute), Ranjan Sapkota, Manoj Karkee (Cornell University), Christos Emmanouilidis (University of Groningen).

Critical survey of responsible agentic reasoning and AI agents.