Agentic AI systems—LLM-based agents with planning, memory, and tool use—introduce transparency challenges that are poorly served by explainability methods designed for single-step predictions. This article surveys and synthesizes interpretability and explainability techniques relevant to agentic behavior across the agent lifecycle.
We organize this survey using a five-axis taxonomy that categorizes prior work by (i) cognitive objects being inspected, (ii) assurance objectives being targeted, (iii) mechanisms employed, (iv) lifecycle stages, and (v) stakeholders served.