Designing a Learning Setup
Three modules have introduced you to three fundamentally different ways a machine can learn: supervised learning (from labeled examples), unsupervised learning (from unlabeled structure), and reinforcement learning (from delayed reward). In practice, the most consequential skill is not implementing any single algorithm — it is correctly diagnosing which paradigm fits a problem, then constructing the learning setup precisely. This lesson is a practice session in that diagnostic process.
A Framework for Paradigm Selection
Work through these questions in order for any machine learning problem: Question 1: Do I have labeled outputs for my training data? - YES — consider supervised learning as the starting point. - NO — move to Question 2. Question 2: Is the goal to learn a decision-making process across time (sequential actions with delayed feedback)? - YES — consider reinforcement learning. - NO — consider unsupervised learning. Question 3 (for supervised): Is the output categorical (class label) or continuous (number)? This determines classification vs. regression. Question 4 (for unsupervised): What kind of structure am I looking for? - Groups of similar items → clustering. - Compressed representation → dimensionality reduction. - Rare unusual items → anomaly detection. Question 5 (for RL): Can I run the environment safely and cheaply enough to collect millions of episodes? If not, the approach may be impractical. Question 6 (for any paradigm): What data do I actually have? How much? How clean? The right algorithm for a problem with 10 million clean labeled examples is often wrong for the same problem with 500 noisy samples.
Real systems frequently combine paradigms. A self-driving car uses supervised learning to detect objects from cameras, unsupervised clustering to identify rare edge-case scenarios for human review, and reinforcement learning (or imitation learning) for planning and control. The diagnostic framework above applies at each sub-problem level, not just to the overall system.
Worked case studies: Case A: Email spam detection. Data: 500,000 emails, each hand-labeled 'spam' or 'not spam' by human reviewers. Output: binary category. Diagnosis: supervised classification. Labels exist, output is categorical, single-step prediction. Setup: feature extraction (word frequencies, sender reputation, link count), train a classifier (logistic regression or gradient-boosted trees), evaluate on held-out labeled test set with precision and recall. Case B: Customer segmentation for a streaming service. Data: 2 million users, each described by watch history, search queries, and time-of-day usage. No behavioral categories have been defined. Output: groups of users with similar tastes. Diagnosis: unsupervised clustering. No labels exist; the goal is to discover structure, not predict a defined target. Setup: construct user feature vectors (fraction of time spent on each genre, average session length, etc.), standardize features, apply k-means with k selected via elbow method, inspect and name clusters, validate by seeing whether cluster membership predicts different response rates to marketing. Case C: Adaptive tutoring system. Data: student interaction logs — which problems were attempted, correct/incorrect, time spent. No labels for 'optimal teaching sequence.' Output: a sequence of decisions (which problem to assign next) that maximizes long-term learning outcomes. Diagnosis: reinforcement learning. Sequential decision-making across many time steps; the best next problem depends on the student's current knowledge state; reward (improvement on assessments) is delayed. Setup: define state (student's performance profile), action (next problem to assign), reward (score improvement on end-of-unit assessment), run in simulation or with a small pilot group, train an RL policy.
Case D: Detecting network intrusions in a corporate system. Data: 50 million network packet records per day, almost all normal. No labels for which packets belong to intrusions. Output: flag anomalous packets for human review. Diagnosis: unsupervised anomaly detection. The rarity of anomalies makes labeling impractical (most would-be labels are 'normal'); the goal is to find what deviates from normal. Setup: train an Isolation Forest or autoencoder on a period of clean traffic, set a threshold for anomaly scores, alert on flagged packets, tune threshold via precision-recall analysis on a small labeled validation set. Case E: Drug dosage optimization in an ICU. Data: historical patient vital signs and dosing records, with outcomes labeled (survived, deteriorated). Output: a dosing policy that maximizes patient survival and recovery metrics. Diagnosis: both supervised and RL are candidates. The historical labeled data supports a supervised approach: train a model to predict outcome given current state and action, then use offline RL (batch RL) to derive a policy from the predictions without requiring real-world exploration. Setup: offline (batch) RL — build a world model from historical data, train a policy against that model. Validate rigorously in simulation before any clinical deployment. A safety constraint: the policy is not allowed to recommend doses more than X% outside standard clinical ranges without human override. Key lesson from Case E: the existence of labels does not preclude RL — it enables safer RL. Data from the past can train both a predictive model and a policy.
Fill in the correct learning paradigm.
Constructing the Learning Setup — Beyond Paradigm Choice
Choosing the paradigm is step one. A complete learning setup also requires: Data specification: - How is data collected? (passive logs, sensors, human labeling, simulation) - How much is needed? (depends on dimensionality, model complexity, and noise level) - What preprocessing is required? (normalization, handling missing values, deduplication) Evaluation design: - Supervised: held-out test set, appropriate metrics (accuracy, F1, AUROC depending on class balance). - Clustering: silhouette score, downstream task performance, domain expert review. - RL: evaluation on a separate set of episodes under the learned policy, not the training environment. Deployment constraints: - Latency: does the model need to predict in real-time? (milliseconds vs. hours) - Interpretability: does the decision need to be explainable? (loan approval, medical diagnosis) - Update cadence: how often does the model need to be retrained as the world changes? Failure mode analysis: - What happens when the model is wrong? Are false positives or false negatives more costly? - Is the deployment environment different from the training distribution? (covariate shift) - Who is harmed if the model fails, and how?
Before deploying any ML model, define a simple baseline — a rule-based system, a majority-class classifier, or a linear model. If your sophisticated algorithm does not substantially beat the baseline, the problem may not require ML at all, or the data is insufficient. Baselines prevent over-engineering and reveal whether complexity is paying off.
A hospital wants to predict which patients will be readmitted within 30 days of discharge, using 3 years of historical patient records that include readmission labels. Which paradigm is most appropriate?
A music streaming service wants to group its 100 million users into segments for targeted playlists, with no predefined genre categories. Which setup is most appropriate?
Design Three Learning Setups
- For each of the following problems, produce a complete learning setup specification:
- Problem 1: A city wants to forecast the number of ambulance calls per hour across 50 city zones, using 5 years of historical call records with timestamps, locations, and weather data.
- Problem 2: A cybersecurity firm has network logs from 1,000 client companies. No attacks have been labeled, but they want a system that flags traffic for human analysts to investigate.
- Problem 3: A game developer wants an AI opponent for a strategy game that learns to improve as it plays against human players, starting with no pre-programmed strategies.
- For each problem, write:
- a) The paradigm choice and why.
- b) The input features you would use.
- c) The output or objective.
- d) How you would evaluate success.
- e) One major risk or failure mode to watch for.