Skip to main content
Machine Learning & Deep Learning

⏱ About 20 min20 XP

Module Check: The Mathematics of Learning

You have covered the foundational mathematical framework of machine learning in nine lessons — from the most abstract (ML as a function search) to the concrete (hand-computing gradient descent steps). This module check is not a formality. It is a test of whether you can think with these ideas, not merely recall them. The flashcards review core vocabulary; the quizzes demand reasoning; the capstone asks you to synthesize everything into a structured argument. Take your time.

Flashcards — click each card to reveal the answer

A model achieves 99% training accuracy and 61% test accuracy on a balanced dataset. Which single intervention is most likely to improve test accuracy?

Two functions are both in the hypothesis space of a degree-3 polynomial model: f1(x) = 2x^3 - x + 5 and f2(x) = 0.001x^3 + 0. After training on data generated by f1, which function should the model converge to, and what determines this?

A team trains three models on the same dataset: linear (2 params), polynomial degree 8 (9 params), polynomial degree 50 (51 params). Rank them from lowest to highest expected training error and from lowest to highest expected test error, assuming 20 training examples.

Cross-entropy loss for a correct label is -log(p), where p is the model's predicted probability for the correct class. A model outputs p = 0.95 for one example and p = 0.55 for another, both correctly classified. Which has higher loss and by roughly how much?

A medical classifier trained on data from North American hospitals is deployed in Southeast Asia. The team finds much worse performance than expected despite having 2 million training examples. The most likely explanation and remedy are:

In gradient descent, a practitioner halves the learning rate. What is the guaranteed effect, and what is the trade-off?

The Through-Line of This Module

Every concept in this module is a facet of one idea: learning is constrained search. The hypothesis space defines what you can find. Training data constrains which hypotheses survive. The loss function defines what 'survive' means. Optimization navigates the landscape that the loss function creates. Generalization is the test of whether the hypothesis found in that search will hold outside the training sample. Bias and variance name the two ways the search can fail. Data quantity determines how tightly the search is constrained. You now have the vocabulary to reason rigorously about any machine learning system you encounter.

Capstone Analysis: Audit an ML System

  1. You will perform a structured audit of a hypothetical ML system using every concept from this module.
  2. System description: A university deploys a model that predicts, during a student's first semester, whether they will graduate within four years. The model uses: high school GPA, SAT scores, first-semester grades, declared major, financial aid status, and dormitory assignment. It was trained on 15 years of alumni records (graduated 2005-2020). It will be used to identify students for academic intervention programs.
  3. Your audit should address each of the following in a structured written response:
  4. 1. Function framing: Write the precise function signature. What is the output space — a probability, a binary label, or something else? What are the implications of this choice?
  5. 2. Hypothesis space: Name two model architectures appropriate for this problem. For each, describe what the hypothesis space looks like and what architectural assumptions are encoded.
  6. 3. Training data analysis: Identify three specific representativeness or label-quality concerns with using 2005-2020 alumni records to predict outcomes for current students.
  7. 4. Loss function: What loss function would you use? How would you handle the fact that graduation rates may vary significantly by major (some majors have 95% four-year graduation; others 55%)?
  8. 5. Generalization strategy: Describe your exact train/validation/test split, including why you would or would not use a random split.
  9. 6. Bias-variance diagnosis: After initial training you find training AUC = 0.88, validation AUC = 0.85. What is your diagnosis? What is your next step?
  10. 7. Ethical limitations: Name two ways this model could cause harm even if it is statistically accurate, and propose a safeguard for each.
  11. Present your audit as if you are advising the university administration. Be precise, be honest about uncertainty, and be willing to recommend against deployment if the analysis warrants it.