Skip to main content
AI Foundations

⏱ About 20 min20 XP

Transparency and Explainability

When a judge sentences you to prison, you have the right to know the reasons. When a bank denies your mortgage application, the Equal Credit Opportunity Act requires that you receive an explanation. These are not arbitrary bureaucratic requirements — they are mechanisms of accountability. If you know the reason for a decision, you can contest it, identify errors, and appeal. Now consider that consequential decisions — loan approvals, parole recommendations, medical diagnoses, job screenings — are increasingly made or influenced by machine-learning models whose internal logic is not easily accessible to anyone, including their creators. This lesson is about why that opacity exists, what researchers are doing to reduce it, and what is at stake.

Why Are Many AI Systems Opaque?

The short answer is that the opacity of modern neural networks is a side effect of their power. A large neural network may have billions of parameters — numerical weights distributed across many layers of computation. When the network processes an input, it computes a sequence of transformations through all of those layers. The final output — a classification, a score, a generated text — is the result of billions of multiplicative interactions. There is no single rule that summarizes what happened. The computation is distributed across the entire network in a way that resists reduction to a simple explanation. This is fundamentally different from a decision tree or a linear regression model, where you can read off exactly which features contributed to a decision and by how much. Deep neural networks trade interpretability for representational power — and for many tasks, the representational power wins handily. Two terms are often used interchangeably but are worth distinguishing: Transparency refers to the ability to inspect the model itself — to see its architecture, weights, and computational process. Explainability (also called interpretability) refers to the ability to generate a human-comprehensible account of why a model produced a particular output for a particular input. A model can be fully transparent (its weights are public) but still not explainable (no one can understand those billions of weights). Conversely, a model can be explainable via post-hoc analysis even if it is not directly interpretable.

Transparency vs. Explainability

Transparency means you can see inside the model. Explainability means you can understand why it made a specific decision. These are distinct goals requiring different methods. Many modern AI deployments provide neither.

Researchers have developed a range of techniques to make AI decisions more explainable, even when the underlying model is opaque. SHAP (SHapley Additive exPlanations): Based on game theory's Shapley values, SHAP attributes each feature's contribution to a particular prediction. If a model denies a loan application, SHAP can estimate how much each feature (income, credit score, ZIP code) contributed to that denial. The attribution is mathematically principled but is a post-hoc explanation — it explains the model's behavior without necessarily revealing the model's true internal reasoning. LIME (Local Interpretable Model-agnostic Explanations): LIME fits a simple, interpretable model (like a linear regression) locally around a specific input to approximate what the complex model is doing in that neighborhood. It gives a local explanation — valid near that data point — without claiming to explain the whole model. Attention visualization: In transformer models, attention weights indicate which parts of the input the model 'attended to' when producing its output. These are often visualized as heatmaps. However, research has shown that attention weights do not necessarily correspond to causal importance — a model can attend to a word without that attendance being the reason for its decision. Concept-based explanations: Rather than attributing to raw features, these methods identify high-level human concepts (e.g., 'zebra stripe pattern') that the model uses and explain predictions in those terms.

Flashcards — click each card to reveal the answer

The Case for a Right to an Explanation

The EU's General Data Protection Regulation (GDPR), adopted in 2018, includes provisions that have been interpreted as creating a limited right to an explanation for automated decisions. Article 22 allows individuals to request human review of decisions made solely by automated processing when those decisions 'significantly affect' them. The scope and enforceability of this right have been actively debated by legal scholars, but the underlying intuition is clear: consequential automated decisions should be contestable, and contesting a decision requires understanding its basis. Opponents of strong explainability requirements raise several objections: Accuracy vs. interpretability trade-off: More interpretable models (linear models, decision trees) are often less accurate than opaque models (deep neural networks). Mandating interpretability might mean mandating worse predictions — and in medical diagnosis, a worse prediction may cause more harm than the opacity. Fidelity problems: Post-hoc explanations approximate the model's behavior but do not perfectly reproduce its reasoning. An explanation that is comprehensible but inaccurate may be worse than no explanation — it provides false confidence. Propriety and gaming: If the explanation reveals the features that lead to approval, applicants will optimize for those features rather than the underlying qualities the features are meant to measure.

Post-hoc Explanations Are Approximations

SHAP and LIME explain model behavior; they do not reveal the model's true internal reasoning. When the explanation contradicts the model's actual process, the explanation is wrong — and the user who trusts it may be misled. Always treat explanation methods as useful approximations, not ground truth.

Complete these statements about explainability techniques.

uses Shapley values from game theory to attribute feature contributions, while fits a simple local model to approximate a complex model near a specific input.

A bank uses a neural network to approve or deny mortgages. The bank publishes the model's full weights online. Does this make the decision explainable to a denied applicant?

A medical AI system is 95% accurate using a deep neural network, but a decision tree achieves only 87% accuracy. A hospital wants to deploy the more interpretable system. Which consideration most strongly supports the decision tree despite lower accuracy?

Contesting an Algorithmic Decision

  1. Imagine you are a credit counselor, and a client has been denied a loan by a bank using an AI system. The bank provides you with the following SHAP explanation for the denial:
  2. - ZIP code: -0.22 (negative contribution)
  3. - Payment history: +0.18
  4. - Income: +0.12
  5. - Requested loan amount: -0.31
  6. - Credit utilization: -0.19
  7. Answering the following in writing:
  8. 1. Which features hurt the application most?
  9. 2. The client reports their ZIP code has changed and the loan amount is negotiable. Draft a specific appeal letter that uses this SHAP explanation to argue for reconsideration.
  10. 3. The ZIP code feature concerns you. Write a separate paragraph explaining why it might indicate potential measurement bias (connect to what you learned in Lesson 2).
  11. 4. What additional information would you want from the bank to fully evaluate this decision?