// term 91 · Governance & Trust
Explainable AI (XAI)
Transparent AI Reasoning
Methods that make AI decisions understandable to humans — which factors drove this output, how the model would respond if inputs changed, and why this case landed where it did. XAI is the bridge between opaque computation and the explanations regulators, customers, and operators require.
// Standards
SHAP / LIME
Feature-attribution methods quantifying each input's contribution to a decision — the workhorses of tabular-model explainability.
// Mandate
regulated
Credit, hiring, insurance, and healthcare decisions carry explanation requirements — adverse-action reasons by law, not preference.
// Caveat
faithfulness
An explanation can be plausible without reflecting the model's actual computation — validity is tested, never assumed.
// full definition
What Explainable AI (XAI) actually is
A model that denies a loan, flags a transaction, or routes a hire must increasingly answer the follow-up question: why? Explainable AI is the toolkit for answering it — methods that attribute decisions to input factors, surface the patterns a model learned, and show how outcomes would change if circumstances did. The need spans audiences: regulators demand adverse-action reasons, operators need to debug strange outputs, domain experts must validate that models reason from sensible signals, and affected people deserve answers.
The toolkit splits by depth. Post-hoc attribution — SHAP, LIME, and kin — explains individual decisions of any model by quantifying each feature's contribution: this denial traces 40% to debt-to-income, 25% to credit history length. Inherently interpretable models — scorecards, rule lists, constrained trees — make the reasoning legible by construction, trading some accuracy headroom for transparency that needs no extraction. The choice between them is contextual: where stakes and statutes demand explanations that are certainly faithful, interpretable-by-design carries weight that extraction can't.
LLMs complicate the picture honestly. Self-explanations — “I concluded this because…” — are generated text, plausible by construction and demonstrably not guaranteed to reflect the actual computation; attention maps correlate with relevance but fail as faithful accounts. The emerging discipline of mechanistic interpretability — reverse-engineering circuits and features inside networks — is making real progress (notably at frontier labs) but remains research, not compliance tooling. For deployed LLM systems, the practical explainability layer is architectural: grounding with citations, traceable retrieval, and logged reasoning chains — evidence of what the system used, even where the weights stay opaque.
The governing caveat across all methods is faithfulness: an explanation can satisfy its audience while misrepresenting the model — rationalization wearing rigor. Mature XAI practice validates explanation methods (do attributions track real model behavior under perturbation?), matches method to audience and stake, and resists the comfort of plausible stories. The goal is calibrated trust: explanations good enough to catch bad models and ground sound decisions — not narrative satisfaction that lets either slip through.
// how it works
Extracting reasons from black boxes
Explainability runs as its own pipeline — attribution methods applied to model decisions, validated for faithfulness, and translated to each audience that needs the why.
Requirement Mapping
Who needs explanations, of what decisions, at what depth — regulator, operator, expert, and affected person each define different needs.
Method Selection
Post-hoc attribution, interpretable-by-design, or architectural traceability — matched to stakes, model type, and statute.
Explanation Generation
Attributions, counterfactuals, or evidence trails compute per decision — the why extracted or exhibited.
Faithfulness Validation
Explanations test against actual model behavior — perturbation checks catching plausible stories that misrepresent.
Audience Translation
Technical attributions become adverse-action notices, debugging views, and expert validations — one why, many renderings.
Explanation Audit
Generated explanations log alongside decisions — the record regulators and reviews will request together.
// anatomy
The components teams must understand
01
Feature Attribution
Contribution accounting
SHAP-style decompositions of each input's weight in a decision — the standard answer to “what drove this?”
02
Counterfactuals
The actionable why
What minimal change flips the outcome — the explanation form affected people can actually use.
03
Interpretable Models
Legible by construction
Scorecards and rule systems whose reasoning is the artifact — faithfulness guaranteed where stakes demand it.
04
LLM Traceability
Architectural evidence
Citations, retrieval logs, and reasoning traces — what the system used, exhibited where the weights can't be.
05
Mechanistic Interpretability
The research frontier
Reverse-engineering features and circuits inside networks — genuine progress, not yet compliance tooling.
06
Faithfulness Tests
Explanations on trial
Perturbation and consistency checks validating that explanations track computation — the guard against rationalization.
// strategic implications
What this changes for the business
01 · Compliance
Explanation is a legal deliverable
Credit, hiring, insurance, and healthcare decisions carry statutory explanation duties — adverse-action reasons, GDPR-adjacent rights, sectoral rules. Architect explainability into decision systems at design time; retrofitting reasons onto an opaque deployment is the expensive version.
02 · Method
Match depth to stakes
Post-hoc attribution serves debugging and moderate stakes; interpretable-by-design earns its accuracy trade where explanations must be certainly faithful; LLM systems lean on architectural traceability. One explainability standard across all systems over- and under-serves simultaneously.
03 · Skepticism
Plausible is not faithful
Models — especially LLMs — generate satisfying explanations that may not reflect their computation. Validate explanation methods against behavior, treat self-reports as evidence rather than testimony, and reserve trust for explanations that survive perturbation testing.
// common misconceptions
What Explainable AI (XAI) is not
Myth
“Deep models are black boxes — explanation is impossible.”
Reality
Attribution methods, counterfactuals, and traceability architectures extract useful, validated explanations today; mechanistic interpretability is opening the boxes further. Opacity is a gradient under active assault, not a verdict.
Myth
“The model explained itself, so we understand it.”
Reality
LLM self-explanations are generated text — plausible by construction, faithful only sometimes, and demonstrably unreliable as accounts of computation. Treat them as hypotheses to verify, not transcripts to file.
Myth
“Explainability costs too much accuracy to be practical.”
Reality
Post-hoc methods cost no accuracy at all, and interpretable models close most of the gap on tabular tasks where they're mandated. The trade is real only at the margin — and where statutes apply, it isn't optional anyway.
// from literacy to leverage
Know the term. Now build the strategy.
Vocabulary is the entry fee. Turning these primitives into pipeline, moats, and margin is the work. That's the conversation.