// term 91 · Governance & Trust

Explainable AI (XAI)

Transparent AI Reasoning

Methods that make AI decisions understandable to humans — which factors drove this output, how the model would respond if inputs changed, and why this case landed where it did. XAI is the bridge between opaque computation and the explanations regulators, customers, and operators require.

InterpretabilitySHAPTransparencyCompliance

// Standards

SHAP / LIME

Feature-attribution methods quantifying each input's contribution to a decision — the workhorses of tabular-model explainability.

// Mandate

regulated

Credit, hiring, insurance, and healthcare decisions carry explanation requirements — adverse-action reasons by law, not preference.

// Caveat

faithfulness

An explanation can be plausible without reflecting the model's actual computation — validity is tested, never assumed.

// full definition

What Explainable AI (XAI) actually is

A model that denies a loan, flags a transaction, or routes a hire must increasingly answer the follow-up question: why? Explainable AI is the toolkit for answering it — methods that attribute decisions to input factors, surface the patterns a model learned, and show how outcomes would change if circumstances did. The need spans audiences: regulators demand adverse-action reasons, operators need to debug strange outputs, domain experts must validate that models reason from sensible signals, and affected people deserve answers.

The toolkit splits by depth. Post-hoc attribution — SHAP, LIME, and kin — explains individual decisions of any model by quantifying each feature's contribution: this denial traces 40% to debt-to-income, 25% to credit history length. Inherently interpretable models — scorecards, rule lists, constrained trees — make the reasoning legible by construction, trading some accuracy headroom for transparency that needs no extraction. The choice between them is contextual: where stakes and statutes demand explanations that are certainly faithful, interpretable-by-design carries weight that extraction can't.

LLMs complicate the picture honestly. Self-explanations — “I concluded this because…” — are generated text, plausible by construction and demonstrably not guaranteed to reflect the actual computation; attention maps correlate with relevance but fail as faithful accounts. The emerging discipline of mechanistic interpretability — reverse-engineering circuits and features inside networks — is making real progress (notably at frontier labs) but remains research, not compliance tooling. For deployed LLM systems, the practical explainability layer is architectural: grounding with citations, traceable retrieval, and logged reasoning chains — evidence of what the system used, even where the weights stay opaque.

The governing caveat across all methods is faithfulness: an explanation can satisfy its audience while misrepresenting the model — rationalization wearing rigor. Mature XAI practice validates explanation methods (do attributions track real model behavior under perturbation?), matches method to audience and stake, and resists the comfort of plausible stories. The goal is calibrated trust: explanations good enough to catch bad models and ground sound decisions — not narrative satisfaction that lets either slip through.

// how it works

Extracting reasons from black boxes

Explainability runs as its own pipeline — attribution methods applied to model decisions, validated for faithfulness, and translated to each audience that needs the why.

Requirement Mapping

Who needs explanations, of what decisions, at what depth — regulator, operator, expert, and affected person each define different needs.

Method Selection

Post-hoc attribution, interpretable-by-design, or architectural traceability — matched to stakes, model type, and statute.

Explanation Generation

Attributions, counterfactuals, or evidence trails compute per decision — the why extracted or exhibited.

Faithfulness Validation

Explanations test against actual model behavior — perturbation checks catching plausible stories that misrepresent.

Audience Translation

Technical attributions become adverse-action notices, debugging views, and expert validations — one why, many renderings.

Explanation Audit

Generated explanations log alongside decisions — the record regulators and reviews will request together.

// anatomy

The components teams must understand

Feature Attribution

Contribution accounting

SHAP-style decompositions of each input's weight in a decision — the standard answer to “what drove this?”

Counterfactuals

The actionable why

What minimal change flips the outcome — the explanation form affected people can actually use.

Interpretable Models

Legible by construction

Scorecards and rule systems whose reasoning is the artifact — faithfulness guaranteed where stakes demand it.

LLM Traceability

Architectural evidence

Citations, retrieval logs, and reasoning traces — what the system used, exhibited where the weights can't be.

Mechanistic Interpretability

The research frontier

Reverse-engineering features and circuits inside networks — genuine progress, not yet compliance tooling.

Faithfulness Tests

Explanations on trial

Perturbation and consistency checks validating that explanations track computation — the guard against rationalization.

// strategic implications

What this changes for the business

01 · Compliance

Explanation is a legal deliverable

Credit, hiring, insurance, and healthcare decisions carry statutory explanation duties — adverse-action reasons, GDPR-adjacent rights, sectoral rules. Architect explainability into decision systems at design time; retrofitting reasons onto an opaque deployment is the expensive version.

02 · Method

Match depth to stakes

Post-hoc attribution serves debugging and moderate stakes; interpretable-by-design earns its accuracy trade where explanations must be certainly faithful; LLM systems lean on architectural traceability. One explainability standard across all systems over- and under-serves simultaneously.

03 · Skepticism

Plausible is not faithful

Models — especially LLMs — generate satisfying explanations that may not reflect their computation. Validate explanation methods against behavior, treat self-reports as evidence rather than testimony, and reserve trust for explanations that survive perturbation testing.

// common misconceptions

What Explainable AI (XAI) is not

Myth

“Deep models are black boxes — explanation is impossible.”

Reality

Attribution methods, counterfactuals, and traceability architectures extract useful, validated explanations today; mechanistic interpretability is opening the boxes further. Opacity is a gradient under active assault, not a verdict.

Myth

“The model explained itself, so we understand it.”

Reality

LLM self-explanations are generated text — plausible by construction, faithful only sometimes, and demonstrably unreliable as accounts of computation. Treat them as hypotheses to verify, not transcripts to file.

Myth

“Explainability costs too much accuracy to be practical.”

Reality

Post-hoc methods cost no accuracy at all, and interpretable models close most of the gap on tabular tasks where they're mandated. The trade is real only at the margin — and where statutes apply, it isn't optional anyway.

// from literacy to leverage

Know the term. Now build the strategy.

Vocabulary is the entry fee. Turning these primitives into pipeline, moats, and margin is the work. That's the conversation.

AI innovation, applied

Explainable AI (XAI)

What Explainable AI (XAI) actually is

Extracting reasons from black boxes

The components teams must understand

What this changes for the business

What Explainable AI (XAI) is not

Explore the wider architecture

Know the term. Now build the strategy.