From Features to Actions: Explainability in Traditional and Agentic AI Systems

1 Vector Institute for Artificial Intelligence, Toronto, Canada
2 Independent Researcher
3 Mayo Clinic, Rochester, MN, USA
Comparison of MEPs

Comparison of Minimal Explanation Packet (MEP) structure across static and agentic paradigms

Abstract

Over the last decade, explainable AI has primarily focused on interpreting individual model predictions, producing post-hoc explanations that relate inputs to outputs under a fixed decision structure. Recent advances in large language models (LLMs) have enabled agentic AI systems whose behaviour unfolds over multi-step trajectories, where success and failure are determined by sequences of decisions rather than a single output. While effective for static predictions, it remains unclear how existing explanation approaches translate to agentic settings where behaviour emerges over time. In this work, we bridge the gap between static and agentic explainability by comparing attribution-based explanations with trace-based diagnostics across both settings. We contrast attribution methods used in static classification tasks with trace-based diagnostics applied to agentic benchmarks (TAU-bench Airline and AssistantBench). Our results show that attribution methods produce stable feature rankings in static settings (Spearman ρ = 0.86) but fail to diagnose execution-level failures in agentic trajectories. By contrast, trace-based evaluation consistently localizes behaviour breakdowns and reveals that state-tracking inconsistency is 2.7× more prevalent in failed runs and reduces success probability by 49%. These findings motivate a shift toward trajectory-level explainability for evaluating and diagnosing autonomous agent behaviour.

Agent execution loop enabling traceable minimal explanations.

Description of your image

Evaluation metrics for static and agentic settings

Setting Metric Description MEP Criteria Custom
Static Explanation Stability Avg. Spearman rank correlation (ρ) across perturbed inputs or repeated runs Reliability
Agentic Intent Alignment Actions align with stated goals and task requirements Grounding
Plan Adherence Maintains coherent multi-step plans throughout execution Grounding, Reliability
Tool Correctness Invokes appropriate tools with valid parameters Auditability
Tool-Choice Accuracy Selects optimal tools for given sub-tasks Grounding, Auditability
State Consistency Maintains coherent internal state across steps Reliability
Error Recovery Detects and recovers from execution failures Reliability

Explanation stability is adopted from prior XAI robustness work. Agentic metrics are custom rubric signals defined in the paper and operationalized using Docent.

Comparison of local (LIME) and global (SHAP) interpretability

LIME local explanation
SHAP global summary
(a) LIME explanation:
Feature importance for a single instance. Blue indicates non-IT and orange indicates IT features, illustrating how the prediction is formed.
(b) SHAP global summary:
Beeswarm plot showing global feature importance. Terms such as software push predictions toward IT, while accounting pushes toward non-IT; ambiguous terms contribute neutrally.

Traditional attribution-based vs. trace-based agentic explainability

Aspect Traditional-XAI (SHAP/LIME) Agentic-XAI (Docent)
Input representation Aggregated feature vector Full execution trace
Primary output Feature importance Rubric satisfaction / violation
Unit of explanation Outcome prediction Entire trajectory
Temporal reasoning ×
Tool / state awareness ×
Per-run failure localization Limited (indirect) Explicit (direct)
Explanation goal Correlative (what matters) Diagnostic (what went wrong)

Rubric-level SHAP summary

To bridge static attribution methods with agentic behavior, we compress execution traces into rubric-level features and train a surrogate outcome predictor. The plot below visualizes how each behavioral dimension contributes to predicted task success across runs.

SHAP beeswarm plot for rubric-level features

SHAP summary (beeswarm) plot for rubric-level features. Each point represents a run; the x-axis shows the SHAP value (contribution to predicted success), and color encodes feature value (low to high). Features are ordered by mean absolute SHAP value.

BibTeX

@article{featurestoactions2026,
title={From Features to Actions: Explainability in Traditional and Agentic AI Systems},
author={Sindhuja Chaduvula, Jessee Ho, Kina Kim, Aravind Narayanan, Mahshid Alinoori, Muskan Garg, Dhanesh Ramachandram, Shaina Raza},
journal={arXiv preprint arXiv:2602.06841},
year={2026}
}
    

Acknowledgments

Resources used in preparing this research were provided, in part, by the Province of Ontario and the Government of Canada through CIFAR, as well as companies sponsoring the Vector Institute (partners). This research was funded by the European Union’s Horizon Europe research and innovation programme under the AIXPERT project (Grant Agreement No. 101214389), which aims to develop an agentic, multi-layered, GenAI-powered framework for creating explainable, accountable, and transparent AI systems.