Abstract
Over the last decade, explainable AI has primarily focused on interpreting individual model predictions, producing post-hoc explanations that relate inputs to outputs under a fixed decision structure. Recent advances in large language models (LLMs) have enabled agentic AI systems whose behaviour unfolds over multi-step trajectories, where success and failure are determined by sequences of decisions rather than a single output. While effective for static predictions, it remains unclear how existing explanation approaches translate to agentic settings where behaviour emerges over time. In this work, we bridge the gap between static and agentic explainability by comparing attribution-based explanations with trace-based diagnostics across both settings. We contrast attribution methods used in static classification tasks with trace-based diagnostics applied to agentic benchmarks (TAU-bench Airline and AssistantBench). Our results show that attribution methods produce stable feature rankings in static settings (Spearman ρ = 0.86) but fail to diagnose execution-level failures in agentic trajectories. By contrast, trace-based evaluation consistently localizes behaviour breakdowns and reveals that state-tracking inconsistency is 2.7× more prevalent in failed runs and reduces success probability by 49%. These findings motivate a shift toward trajectory-level explainability for evaluating and diagnosing autonomous agent behaviour.
Agent execution loop enabling traceable minimal explanations.
Evaluation metrics for static and agentic settings
| Setting | Metric | Description | MEP Criteria | Custom |
|---|---|---|---|---|
| Static | Explanation Stability | Avg. Spearman rank correlation (ρ) across perturbed inputs or repeated runs | Reliability | ✓ |
| Agentic | Intent Alignment | Actions align with stated goals and task requirements | Grounding | ✓ |
| Plan Adherence | Maintains coherent multi-step plans throughout execution | Grounding, Reliability | ✓ | |
| Tool Correctness | Invokes appropriate tools with valid parameters | Auditability | ✓ | |
| Tool-Choice Accuracy | Selects optimal tools for given sub-tasks | Grounding, Auditability | ✓ | |
| State Consistency | Maintains coherent internal state across steps | Reliability | ✓ | |
| Error Recovery | Detects and recovers from execution failures | Reliability | ✓ |
Explanation stability is adopted from prior XAI robustness work. Agentic metrics are custom rubric signals defined in the paper and operationalized using Docent.
Comparison of local (LIME) and global (SHAP) interpretability
Feature importance for a single instance. Blue indicates non-IT and orange indicates IT features, illustrating how the prediction is formed.
Beeswarm plot showing global feature importance. Terms such as software push predictions toward IT, while accounting pushes toward non-IT; ambiguous terms contribute neutrally.
Traditional attribution-based vs. trace-based agentic explainability
| Aspect | Traditional-XAI (SHAP/LIME) | Agentic-XAI (Docent) |
|---|---|---|
| Input representation | Aggregated feature vector | Full execution trace |
| Primary output | Feature importance | Rubric satisfaction / violation |
| Unit of explanation | Outcome prediction | Entire trajectory |
| Temporal reasoning | × | ✓ |
| Tool / state awareness | × | ✓ |
| Per-run failure localization | Limited (indirect) | Explicit (direct) |
| Explanation goal | Correlative (what matters) | Diagnostic (what went wrong) |
Rubric-level SHAP summary
To bridge static attribution methods with agentic behavior, we compress execution traces into rubric-level features and train a surrogate outcome predictor. The plot below visualizes how each behavioral dimension contributes to predicted task success across runs.
SHAP summary (beeswarm) plot for rubric-level features. Each point represents a run; the x-axis shows the SHAP value (contribution to predicted success), and color encodes feature value (low to high). Features are ordered by mean absolute SHAP value.
BibTeX
@article{featurestoactions2026,
title={From Features to Actions: Explainability in Traditional and Agentic AI Systems},
author={Sindhuja Chaduvula, Jessee Ho, Kina Kim, Aravind Narayanan, Mahshid Alinoori, Muskan Garg, Dhanesh Ramachandram, Shaina Raza},
journal={arXiv preprint arXiv:2602.06841},
year={2026}
}
Acknowledgments
Resources used in preparing this research were provided, in part, by the Province of Ontario and the Government of Canada through CIFAR, as well as companies sponsoring the Vector Institute (partners). This research was funded by the European Union’s Horizon Europe research and innovation programme under the AIXPERT project (Grant Agreement No. 101214389), which aims to develop an agentic, multi-layered, GenAI-powered framework for creating explainable, accountable, and transparent AI systems.