Papers¶

Selected publications and preprints from the AIXpert project. Each entry links to arXiv where available.

AIXpert project papers¶

TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems¶

Paper (AI Open, Elsevier 2026) ·

Authors: Shaina Raza, Ranjan Sapkota, Manoj Karkee, Christos Emmanouilidis.

A review of trust, risk, and security management (TRiSM) in LLM-based agentic and multi-agent systems.

Evaluating and Regulating Agentic AI: A Study of Benchmarks, Metrics and Regulation¶

Paper (Information Fusion, forthcoming) · Code · Project ·

Authors: Azib Farooq, Shaina Raza, Nazmul Karim, Hasan Iqbal, Athanasios V. Vasilakos, Christos Emmanouilidis.

Survey of benchmarks, metrics, and governance for evaluating agentic AI in single- and multi-agent systems, toward trustworthy and auditable agents.

Just as Humans Need Vaccines, So Do Models: Model Immunization to Combat Falsehoods¶

Paper (WCCI 2026, IJCNN) · Code · Project ·

Authors: Shaina Raza, Rizwan Qureshi, Azib Farooq, Marcelo Lotif, Aman Chadha, Deval Pandya, Christos Emmanouilidis.

Position paper on model immunization: SFT with small doses of (false claim, correction) pairs alongside truthful data to supervise falsehoods directly. Across four open-weight families, +12 TruthfulQA and +30 misinformation-rejection points with negligible capability loss.

From Features to Actions: Explainability in Traditional and Agentic AI¶

Paper · Code · Project ·

Authors: Sindhuja Chaduvula, Jessee Ho, Kina Kim, Aravind Narayanan, Mahshid Alinoori, Muskan Garg, Dhanesh Ramachandram, Shaina Raza.

Compares attribution-based explanations (SHAP, LIME) with trace-based diagnostics across static and agentic settings. Attribution is stable for static prediction (Spearman ρ = 0.86) but fails to diagnose agentic failures; trace-grounded rubrics localize breakdowns (e.g. state-tracking inconsistency 2.7× more in failed runs, −49% success), motivating trajectory-level explainability for agentic systems.

SONIC-O1: A Real-World Benchmark for Evaluating Multimodal Large Language Models on Audio-Video Understanding¶

Paper · Code · Dataset · Leaderboard ·

Authors: Ahmed Y. Radwan, Christos Emmanouilidis, Hina Tabassum, Deval Pandya, Shaina Raza.

SONIC-O1, a fully human-verified real-world audio-video benchmark with 4,958 annotations across 13 conversational domains. We evaluate multimodal models on video summarization, evidence-grounded QA, and temporal event localization, and release an extensible evaluation suite to support reproducible benchmarking and robustness analysis.

Reducing Hallucinations in LLMs via Factuality-Aware Preference Learning¶

Paper (ACL 2026 Findings) · Code · Dataset · Project ·

Authors: Sindhuja Chaduvula, Ahmed Y. Radwan, Azib Farooq, Yani Ioannou, Shaina Raza.

Preference-learning method (F-DPO) that targets factuality directly, improving factuality scores while reducing hallucination rates across multiple open-weight LLMs.

Transparency in Agentic AI: A Survey of Interpretability, Explainability, and Governance¶

Paper · Project ·

Authors: Shaina Raza, Ahmed Y. Radwan, Sindhuja Chaduvula, Mahshid Alinoori, Christos Emmanouilidis.

Agentic AI systems—LLM-based agents with planning, memory, and tool use—introduce transparency challenges that are poorly served by explainability methods designed for single-step predictions. This article surveys and synthesizes interpretability and explainability techniques relevant to agentic behavior across the agent lifecycle, organized using a five-axis taxonomy: cognitive objects being inspected, assurance objectives being targeted, mechanisms employed, lifecycle stages, and stakeholders served.

Paper (NeurIPS 2025 LLM-eval Workshop) · Code ·

Authors: Aravind Narayanan, Vahid Reza Khazaie, Shaina Raza.

Benchmarking vision-language models with social-cue news images and LLM-as-judge assessment.

Responsible Agentic Reasoning and AI Agents—A Critical Survey¶

Paper ·

Authors: Shaina Raza (Vector Institute), Ranjan Sapkota, Manoj Karkee (Cornell University), Christos Emmanouilidis (University of Groningen).

Critical survey of responsible agentic reasoning and AI agents.