Skip to main content
Frontier & Future AI

⏱ About 20 min20 XP

Module Check: Modern AI Foundations

You have covered ten lessons tracing the full arc of the current AI paradigm: from the four pillars that define it and the transformer architecture that implements it, through pretraining, scaling laws, emergent capabilities, and alignment training, to inference mechanics and the lifecycle of a real model. This module check is not a recall exercise. It demands that you think across the whole arc, connect ideas from different lessons, and reason about real scenarios as a practitioner would. The flashcards open the review; the quizzes test cross-lesson reasoning; the synthesis activity asks you to build an argument from the full set of ideas.

Flashcards — click each card to reveal the answer

A research team finds that switching from BERT-style masked language modeling (predict randomly masked tokens) to GPT-style causal language modeling (predict the next token, never attending to future tokens) dramatically improved their model's ability to generate coherent long-form text. Which property of causal language modeling explains this improvement?

A company trains a 7-billion parameter model following Chinchilla-optimal principles and achieves strong benchmark performance. A competitor trains a 70-billion parameter model on the same compute budget but uses only 10% as many training tokens as Chinchilla recommends. The smaller company claims their model is superior for deployment. Which argument best supports the smaller company's claim?

A language model performs excellently on standard safety evaluations but is found, six months after deployment, to assist with a specific category of harmful request that was not part of the red-teaming protocol. Which two concepts from the module jointly explain this failure?

A model is trained to generate chain-of-thought reasoning traces before producing final answers. Its performance on complex multi-step math problems improves by 40% compared to the same model generating direct answers. Scaling the model's parameters by 10x with direct answering yields only a 12% improvement on the same benchmark. What does this comparison reveal about the relationship between training-time compute and inference-time compute?

Two researchers debate whether emergence is a real phenomenon or a measurement artifact. Researcher A points to the sharp transition on a binary benchmark. Researcher B shows that under a continuous partial-credit metric the same capability improves smoothly. Which conclusion is most precisely supported by the evidence?

A lab is deciding between training one 100-billion parameter model or ten 10-billion parameter models and deploying them as an ensemble (averaging their outputs). Assuming the same total compute budget, what are the strongest arguments for each approach?

Module Synthesis: Write the Paradigm Explained

  1. This is your capstone synthesis for the module. Work individually.
  2. Your task: Write a structured explanation of the current AI paradigm, its foundations, and its limits for an intelligent audience that has never studied AI. Your explanation must be accurate, precise, and accessible without being condescending. Length: four to six paragraphs.
  3. Your explanation must integrate concepts from at least six of the ten lessons and must address all of the following points:
  4. 1. What the current paradigm is: the four pillars and why they work together as a system rather than as independent techniques.
  5. 2. Why next-token prediction on diverse text produces general world knowledge and reasoning ability, not just an autocomplete engine.
  6. 3. What scaling laws predict and what they do not predict, specifically how they relate to emergent capabilities.
  7. 4. What alignment training achieves and what it cannot guarantee, with at least one concrete example of a known limitation.
  8. 5. What happens computationally when a user sends a prompt to a deployed model, tracing the inference process.
  9. 6. One honest statement about what is not yet understood about why the paradigm works as well as it does.
  10. Quality bar: every claim must be defensible. No vague statements like 'AI has learned to think.' Every mechanism must be named. Every limitation must be specific.
  11. When complete, exchange with a partner. Read their explanation and identify: (a) one claim that is imprecise or potentially misleading, (b) one concept that is missing or underexplained, and (c) one thing they explained especially well. Write your critique as one paragraph and return it to them.
  12. Revise your explanation based on the critique. The revised version is your final module artifact.