Mitigation and Its Limits
Given documented harms from biased AI systems and the mathematical constraints on fairness, what can practitioners do? This lesson surveys the main technical approaches to reducing algorithmic bias — organized by where in the pipeline they intervene — and examines honestly what each approach achieves, what it cannot achieve, and what the impossibility results imply about any purely technical mitigation strategy. The honest answer is that technical mitigation is necessary, insufficient, and sometimes misleading if presented as a complete solution.
Pre-Processing: Intervening on the Data
Pre-processing techniques attempt to reduce bias before training by modifying the training data. There are three main strategies. Reweighting: assign higher importance weights to underrepresented groups or individual examples, so that the training loss penalizes errors on those groups more heavily. If a dataset contains 90% examples from Group A and 10% from Group B, reweighting can force the model to attend equally to both groups during training. Reweighting addresses representation bias without altering the data itself. Resampling: oversample underrepresented groups (duplicating or synthetically generating examples) or undersample overrepresented groups (randomly removing examples). SMOTE (Synthetic Minority Oversampling Technique) is a commonly used method that generates synthetic examples by interpolating between real minority-class examples. Resampling changes the actual training distribution seen by the model. Data relabeling: identify and correct labels that are systematically biased. If annotation bias has produced labels that are less accurate for certain groups (as in the AAVE toxicity labeling example from Lesson 2), relabeling involves re-annotating affected examples with higher-quality labels, often using more diverse or more carefully trained annotators. This is expensive and requires identifying which labels are biased — which may require an audit. Fair representation learning: transform the data into a new representation space that 'removes' information about protected attributes while preserving information about the task. The goal is a representation where, given only the representation, you cannot predict the protected attribute, but you can still predict the outcome. This is technically challenging because protected attributes are often correlated with task-relevant features — removing the former risks removing the latter.
Because bias introduced in training data propagates through everything downstream, addressing it in the data is often the highest-leverage intervention. Pre-processing approaches do not require modifying the model architecture or training algorithm, which makes them more modular and easier to apply to existing pipelines. However, they cannot fix biases introduced at other stages — problem framing, feature engineering, or deployment context.
In-Processing: Fairness Constraints During Training
In-processing techniques modify the training objective itself to enforce fairness constraints during model learning. Rather than optimizing purely for accuracy, the model optimizes a combined objective that penalizes fairness violations. Fairness-regularized training: add a fairness penalty term to the loss function. For example, penalize the difference in false positive rates across groups: L_total = L_accuracy + lambda * |FPR_A - FPR_B|. The hyperparameter lambda controls the strength of the fairness constraint. Higher lambda forces the model to equalize error rates at the cost of overall accuracy — reflecting the impossibility result: you cannot simultaneously maximize accuracy and equalize all error rates. Adversarial debiasing: train two models simultaneously: a predictor that learns to predict the outcome, and an adversary that learns to predict the protected attribute from the predictor's outputs. The predictor is trained to minimize its prediction loss while also fooling the adversary — making it harder to infer the protected attribute from its predictions. This is an elegant approach that can produce representations that are less predictive of protected attributes without requiring explicit fairness constraints. Constrained optimization: formulate fairness as a hard constraint rather than a regularization penalty. Train the model to maximize accuracy subject to the hard constraint that the demographic parity gap (or some other fairness metric) does not exceed a specified threshold. This requires constrained optimization methods (such as Lagrangian relaxation) rather than standard gradient descent.
Post-Processing: Adjusting Model Outputs
Post-processing techniques intervene after the model has been trained, adjusting its outputs to satisfy a fairness criterion without modifying the model itself. This is appealing when a model is a commercial black box that cannot be retrained. Threshold calibration by group: instead of applying a single decision threshold (e.g., classify as positive if score exceeds 0.5) to all individuals, apply different thresholds to different groups so that some fairness criterion is satisfied. For example, if the goal is to equalize false positive rates, find the threshold for each group that achieves the desired FPR and apply group-specific thresholds. This can substantially reduce fairness gaps but introduces explicit group-specific decision rules — which may be legally prohibited in some jurisdictions and raises its own fairness concerns (is it fair to apply different standards to different groups?). Rejection option classification: for predictions near the decision boundary — where the model is uncertain — defer to a human decision-maker rather than producing an automated classification. This addresses uncertainty near the threshold without adjusting outcomes for confident predictions and can reduce harm from low-confidence decisions without sacrificing accuracy for clear cases. Calibration correction: if a model is less calibrated for one group (its probability outputs do not reflect true rates as well), apply a correction function to its outputs for that group to improve calibration. This is used in settings where predictive parity is the target criterion.
Match each mitigation technique to the description of how and where it intervenes.
Terms
Definitions
Drag terms onto their definitions, or click a term then click a definition to match.
What Mitigation Cannot Fix
Technical mitigation techniques are real and valuable, but they operate within the constraints established by the impossibility results and have limits that practitioners must understand honestly. First: technical mitigation cannot simultaneously satisfy all fairness criteria when base rates differ. Any in-processing fairness constraint or post-processing threshold adjustment is implicitly choosing which fairness criterion to prioritize, moving toward one criterion at the expense of others. This is a value choice, not a technical problem. Presenting a debiased model as simply 'fair' — without specifying which criterion has been satisfied and which have been sacrificed — is misleading. Second: pre-processing techniques can reduce representation bias and reweight training distributions, but they cannot correct for historical bias in labels. If the labels themselves reflect past discrimination (recidivism labels that reflect policing patterns, hiring labels that reflect past discriminatory choices), reweighting or resampling the historical data does not change what the model is learning to predict. It may predict the discriminatory historical outcome more equitably — which is not the same as making fair predictions. Third: no technical intervention addresses the problem of a system deployed for a purpose it was not designed for, used on a population it was not trained on, or embedded in a decision-making process that misuses its outputs. Deployment context is not a technical parameter. Fourth: fairness metrics improve measured bias on the test set, but systems encounter distribution shift in deployment. A system audited and deemed fair today may become unfair as the world changes, as the user population shifts, or as the model's predictions interact with the social environment through feedback effects. Static one-time audits do not provide ongoing assurance.
A technically debiased model deployed in a context where the decisions it informs are not subject to human review, not contestable by affected individuals, and not subject to ongoing monitoring is not a fair system — it is a biased system with a fairness certificate on paper. Technical mitigation and institutional governance are complements, not substitutes. When technical solutions are presented as complete, they may reduce pressure for the harder institutional reforms that are also required.
A team applies adversarial debiasing to a hiring model and achieves equalized odds across gender groups on their validation set. They deploy the model. Six months later, an auditor finds that the model now produces higher false positive rates for older applicants than for younger applicants. What does this reveal about the adversarial debiasing approach?
A data scientist applies threshold calibration by group to equalize the false positive rate across racial groups in a credit scoring model. A legal team objects that this approach may violate fair lending law, which prohibits applying different standards to different racial groups. Which tension does this scenario illustrate?
Design a Mitigation Strategy
- Your team has been hired to advise a hospital system that uses an automated triage scoring tool to rank emergency department patients by urgency. An internal audit has found that patients with certain insurance types (a proxy correlated with race and income) receive systematically lower urgency scores than patients with comparable symptom severity. The tool is a commercial black-box system the hospital licenses — the hospital cannot retrain it.
- Step 1: Given the constraint that the model cannot be retrained, what post-processing mitigation options are available? List at least two with a description of how they would work in this clinical context.
- Step 2: For each option, identify one fairness criterion it would improve and one it might worsen or leave unaddressed.
- Step 3: The impossibility results tell you that you cannot satisfy all fairness criteria simultaneously. For this clinical setting (emergency triage), which fairness criterion should be prioritized, and why? Who should make this decision — the hospital, a regulator, affected patients, or some combination?
- Step 4: Identify two things that no technical mitigation can fix in this scenario, and for each, propose a non-technical institutional remedy (a policy, process, oversight mechanism, or accountability structure).
- Present your recommendations in writing, with explicit acknowledgment of the trade-offs each recommendation involves.