From Features to Actions: Explainability in Traditional and Agentic AI Systems

Sindhuja Chaduvula¹, Jessee Ho¹, Kina Kim², Aravind Narayanan¹, Mahshid Alinoori¹, Muskan Garg³, Dhanesh Ramachandram¹, Shaina Raza¹

¹ Vector Institute for Artificial Intelligence, Toronto, Canada

² Independent Researcher

³ Mayo Clinic, Rochester, MN, USA

arXiv Code BibTeX

Comparison of Minimal Explanation Packet (MEP) structure across static and agentic paradigms

Abstract

Over the last decade, explainable AI has primarily focused on interpreting individual model predictions, producing post-hoc explanations that relate inputs to outputs under a fixed decision structure. Recent advances in large language models (LLMs) have enabled agentic AI systems whose behaviour unfolds over multi-step trajectories, where success and failure are determined by sequences of decisions rather than a single output. While effective for static predictions, it remains unclear how existing explanation approaches translate to agentic settings where behaviour emerges over time. In this work, we bridge the gap between static and agentic explainability by comparing attribution-based explanations with trace-based diagnostics across both settings. We contrast attribution methods used in static classification tasks with trace-based diagnostics applied to agentic benchmarks (TAU-bench Airline and AssistantBench). Our results show that attribution methods produce stable feature rankings in static settings (Spearman ρ = 0.86) but fail to diagnose execution-level failures in agentic trajectories. By contrast, trace-based evaluation consistently localizes behaviour breakdowns and reveals that state-tracking inconsistency is 2.7× more prevalent in failed runs and reduces success probability by 49%. These findings motivate a shift toward trajectory-level explainability for evaluating and diagnosing autonomous agent behaviour.

Agent execution loop enabling traceable minimal explanations.

Evaluation metrics for static and agentic settings

Setting	Metric	Description	MEP Criteria	Custom
Static	Explanation Stability	Avg. Spearman rank correlation (ρ) across perturbed inputs or repeated runs	Reliability	✓
Agentic	Intent Alignment	Actions align with stated goals and task requirements	Grounding	✓
	Plan Adherence	Maintains coherent multi-step plans throughout execution	Grounding, Reliability	✓
	Tool Correctness	Invokes appropriate tools with valid parameters	Auditability	✓
	Tool-Choice Accuracy	Selects optimal tools for given sub-tasks	Grounding, Auditability	✓
	State Consistency	Maintains coherent internal state across steps	Reliability	✓
	Error Recovery	Detects and recovers from execution failures	Reliability	✓

Explanation stability is adopted from prior XAI robustness work. Agentic metrics are custom rubric signals defined in the paper and operationalized using Docent.

Comparison of local (LIME) and global (SHAP) interpretability

(a) LIME explanation:
Feature importance for a single instance. Blue indicates non-IT and orange indicates IT features, illustrating how the prediction is formed.

(b) SHAP global summary:
Beeswarm plot showing global feature importance. Terms such as software push predictions toward IT, while accounting pushes toward non-IT; ambiguous terms contribute neutrally.

Traditional attribution-based vs. trace-based agentic explainability

Aspect	Traditional-XAI (SHAP/LIME)	Agentic-XAI (Docent)
Input representation	Aggregated feature vector	Full execution trace
Primary output	Feature importance	Rubric satisfaction / violation
Unit of explanation	Outcome prediction	Entire trajectory
Temporal reasoning	×	✓
Tool / state awareness	×	✓
Per-run failure localization	Limited (indirect)	Explicit (direct)
Explanation goal	Correlative (what matters)	Diagnostic (what went wrong)

Rubric-level SHAP summary

To bridge static attribution methods with agentic behavior, we compress execution traces into rubric-level features and train a surrogate outcome predictor. The plot below visualizes how each behavioral dimension contributes to predicted task success across runs.

SHAP beeswarm plot for rubric-level features

SHAP summary (beeswarm) plot for rubric-level features. Each point represents a run; the x-axis shows the SHAP value (contribution to predicted success), and color encodes feature value (low to high). Features are ordered by mean absolute SHAP value.

BibTeX

@article{featurestoactions2026,
title={From Features to Actions: Explainability in Traditional and Agentic AI Systems},
author={Sindhuja Chaduvula, Jessee Ho, Kina Kim, Aravind Narayanan, Mahshid Alinoori, Muskan Garg, Dhanesh Ramachandram, Shaina Raza},
journal={arXiv preprint arXiv:2602.06841},
year={2026}
}

Acknowledgments

Resources used in preparing this research were provided, in part, by the Province of Ontario and the Government of Canada through CIFAR, as well as companies sponsoring the Vector Institute (partners). This research was funded by the European Union’s Horizon Europe research and innovation programme under the AIXPERT project (Grant Agreement No. 101214389), which aims to develop an agentic, multi-layered, GenAI-powered framework for creating explainable, accountable, and transparent AI systems.