Papers¶
Selected publications and preprints from the AIXpert project. Each entry links to arXiv where available.
AIXpert project papers¶
TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems¶
Paper (AI Open, Elsevier 2026) ·
Authors: Shaina Raza, Ranjan Sapkota, Manoj Karkee, Christos Emmanouilidis.
A review of trust, risk, and security management (TRiSM) in LLM-based agentic and multi-agent systems.
Evaluating and Regulating Agentic AI: A Study of Benchmarks, Metrics and Regulation¶
Paper (Information Fusion, forthcoming) · Code ·
Project ·
Authors: Azib Farooq, Shaina Raza, Nazmul Karim, Hasan Iqbal, Athanasios V. Vasilakos, Christos Emmanouilidis.
Survey of benchmarks, metrics, and governance for evaluating agentic AI in single- and multi-agent systems, toward trustworthy and auditable agents.
Just as Humans Need Vaccines, So Do Models: Model Immunization to Combat Falsehoods¶
Paper (WCCI 2026, IJCNN) · Code ·
Project ·
Authors: Shaina Raza, Rizwan Qureshi, Azib Farooq, Marcelo Lotif, Aman Chadha, Deval Pandya, Christos Emmanouilidis.
Position paper on model immunization: SFT with small doses of (false claim, correction) pairs alongside truthful data to supervise falsehoods directly. Across four open-weight families, +12 TruthfulQA and +30 misinformation-rejection points with negligible capability loss.
From Features to Actions: Explainability in Traditional and Agentic AI¶
Authors: Sindhuja Chaduvula, Jessee Ho, Kina Kim, Aravind Narayanan, Mahshid Alinoori, Muskan Garg, Dhanesh Ramachandram, Shaina Raza.
Compares attribution-based explanations (SHAP, LIME) with trace-based diagnostics across static and agentic settings. Attribution is stable for static prediction (Spearman ρ = 0.86) but fails to diagnose agentic failures; trace-grounded rubrics localize breakdowns (e.g. state-tracking inconsistency 2.7× more in failed runs, −49% success), motivating trajectory-level explainability for agentic systems.
SONIC-O1: A Real-World Benchmark for Evaluating Multimodal Large Language Models on Audio-Video Understanding¶
Paper · Code ·
Dataset ·
Leaderboard ·
Authors: Ahmed Y. Radwan, Christos Emmanouilidis, Hina Tabassum, Deval Pandya, Shaina Raza.
SONIC-O1, a fully human-verified real-world audio-video benchmark with 4,958 annotations across 13 conversational domains. We evaluate multimodal models on video summarization, evidence-grounded QA, and temporal event localization, and release an extensible evaluation suite to support reproducible benchmarking and robustness analysis.
Reducing Hallucinations in LLMs via Factuality-Aware Preference Learning¶
Paper (ACL 2026 Findings) · Code ·
Dataset ·
Project ·
Authors: Sindhuja Chaduvula, Ahmed Y. Radwan, Azib Farooq, Yani Ioannou, Shaina Raza.
Preference-learning method (F-DPO) that targets factuality directly, improving factuality scores while reducing hallucination rates across multiple open-weight LLMs.
Transparency in Agentic AI: A Survey of Interpretability, Explainability, and Governance¶
Authors: Shaina Raza, Ahmed Y. Radwan, Sindhuja Chaduvula, Mahshid Alinoori, Christos Emmanouilidis.
Agentic AI systems—LLM-based agents with planning, memory, and tool use—introduce transparency challenges that are poorly served by explainability methods designed for single-step predictions. This article surveys and synthesizes interpretability and explainability techniques relevant to agentic behavior across the agent lifecycle, organized using a five-axis taxonomy: cognitive objects being inspected, assurance objectives being targeted, mechanisms employed, lifecycle stages, and stakeholders served.
Bias in the Picture: Benchmarking VLMs with Social-Cue News Images and LLM-as-Judge Assessment¶
Paper (NeurIPS 2025 LLM-eval Workshop) · Code ·
Authors: Aravind Narayanan, Vahid Reza Khazaie, Shaina Raza.
Benchmarking vision-language models with social-cue news images and LLM-as-judge assessment.
Responsible Agentic Reasoning and AI Agents—A Critical Survey¶
Authors: Shaina Raza (Vector Institute), Ranjan Sapkota, Manoj Karkee (Cornell University), Christos Emmanouilidis (University of Groningen).
Critical survey of responsible agentic reasoning and AI agents.