Abstract
Agentic AI systems extend large language models with planning, tool use, memory, and multi-step control loops, shifting deployment from a single predictive model to a behavior-producing system. We argue interpretability has not made this accompanying shift: prevailing methods remain model-centric, explaining isolated outputs rather than diagnosing long-horizon plans, tool-mediated actions, and multi-agent coordination. This gap limits auditability and accountability because failures emerge from interactions among planning, memory updates, delegation, and environmental feedback. \textbf{We advance the position that interpretability for agentic AI must be system-centric, focusing on trajectories, responsibility assignment, and lifecycle dynamics, not only internal model mechanisms.} To operationalize this view, we propose the Agentic Trajectory and Layered Interpretability Stack (ATLIS), spanning real-time behavioral monitoring, mechanistic analysis, abstraction bridging, multi-agent coordination analysis, and safety and alignment oversight. We map these layers to a five-stage agent lifecycle and motivate risk-aware activation of high-cost analyses during incidents alongside continuous low-overhead monitoring in production.
Introduction
(Add introduction here.)
Position
The interpretability field is solving the wrong problem for the agentic era. Current methods explain how individual models compute outputs but cannot explain why an agent selected a particular plan, how coordination failed, or where accountability lies. We argue three points: (1) interpretability methods must co-evolve with agentic capabilities rather than follow them, embedding transparency into planning, tool use, and memory from the outset; (2) agentic opacity occurs at distinct layers—behavioral, mechanistic, coordination, and safety, each requiring tailored methods; and (3) interpretability must integrate across the full agent development lifecycle rather than serve as a one-time audit.
Core Claims (placeholders)
- Coevolution over reaction. …
- Layered decomposition. …
- Lifecycle integration. …
Alternate Positions and Counterarguments
(Add Counterpositions here)
Conceptual Framework
(Briefly introduce ATLIS and the lifecycle + layered stack figure)
Illustrative Case Study
Add illustrative Hospital Use Case.
Call to Action
- System-Level Attribution and Tracing.
- Scalable Runtime Interpretability.
- Benchmarks and Evaluation Infrastructure.
Discussion and Limitations
Add discussion implementation limitations with ATLIS.
BibTeX
@inproceedings{atlIs2026position,
title = {Position: As we move from models to systems, we need and should use the Agentic Trajectory and Layered Interpretability Stack (ATLIS)},
author = {Zhu, Judy and Gandhi, Dhari and Mianroodi, Ahmad Rezaie and Ramachandran, Dhanesh and Raza, Shaina and Kocak, Sedef Akinli},
booktitle = {International Conference on Machine Learning (ICML) Position Paper},
year = {2026},
note = {Under review}
}
Update venue/year/status when submission details are finalized.