Trace a Model's Lifecycle
You have studied each component of the current AI paradigm in isolation: the transformer architecture, pretraining objectives, scaling laws, emergent capabilities, alignment training, and inference. In practice, these components do not exist independently. Every frontier model is the result of dozens of deliberate choices made in sequence, each one constraining the next. A decision about data composition affects what capabilities emerge. A decision about model size affects alignment training cost. A decision about inference architecture affects who can afford to use the model. This lesson is structured around a central activity: you will trace a hypothetical-but-realistic frontier language model from its earliest stages of data collection through production deployment. At each stage, you will identify what decisions were made, why, what was gained, and what trade-offs were accepted. By the end, you will have built a complete mental model of a model's lifecycle.
The Seven Stages of a Model's Life
A frontier language model passes through the following stages before it is used by anyone outside the lab that built it. Stage 1: Data collection and curation. The training corpus must be assembled. For a modern frontier model this means acquiring web crawls (Common Crawl snapshots), licensing or scraping book corpora, pulling code from public repositories, and collecting multilingual content. Raw data must be cleaned: HTML markup stripped, duplicate documents removed, low-quality text filtered by heuristic and classifier-based quality scores, and potentially harmful content identified and removed. Data mixture ratios are decided here: what fraction of tokens comes from web text versus books versus code versus scientific papers. These decisions are among the most consequential in the entire lifecycle because they determine what the model learns and cannot unlearn. Stage 2: Architecture and hyperparameter decisions. The team decides on model size (number of parameters), the ratio of model width to depth (embedding dimension versus number of layers), the number of attention heads, the context window for training, and the activation function used in feedforward layers. These decisions are informed by scaling law predictions but also by practical constraints: what fits on available hardware, what has been shown to be stable to train, and what inference cost the intended deployment can support. A model designed for edge deployment has different constraints than one served from a data center. Stage 3: Pretraining. The model is trained on the assembled corpus using next-token prediction with the Adam optimizer, a learning rate schedule (warmup followed by cosine decay), and gradient clipping to prevent instability. Training a frontier model today may involve tens of thousands of GPUs or TPUs running for weeks to months. Mid-training, the team monitors training loss curves, watching for instabilities (spikes in loss that indicate numerical problems) and evaluating on downstream benchmarks periodically to track capability development. Compute costs at this stage range from tens to hundreds of millions of dollars for frontier scale. Stage 4: Post-pretraining evaluation. Before alignment training begins, the pretrained model is evaluated to understand its raw capability profile. What domains is it strong in? Where does it fail? What emergent capabilities are present? This evaluation informs which behaviors need to be reinforced in alignment training and which harmful patterns need to be suppressed.
The decisions made at Stage 1 (data) constrain Stage 3 (pretraining capability). Stage 3 outputs constrain what Stage 5 (alignment) can achieve. A capability that was never learned during pretraining cannot be elicited by fine-tuning alone. And a capability learned during pretraining but not addressed during alignment may surface in deployment in unexpected ways.
Stage 5: Alignment training. The post-pretraining model undergoes supervised fine-tuning on a curated dataset of high-quality prompt-response pairs, followed by reinforcement learning from human feedback. Human annotators write comparison judgments across thousands to tens of thousands of prompt pairs. A reward model is trained on these judgments. The language model is then optimized using proximal policy optimization to maximize reward model scores. This stage may be iterated multiple times, with each iteration refining the balance between helpfulness, honesty, and harmlessness. Stage 6: Safety evaluation and red-teaming. Before deployment, the aligned model undergoes systematic red-teaming: a team of human testers attempts to elicit harmful, dishonest, or dangerous behavior from the model using a wide range of adversarial prompts. Capability evaluations probe for dangerous emergent abilities such as assistance with weapons of mass destruction or autonomous deception. The results of red-teaming inform further alignment training iterations or the addition of system-level guardrails. Only models that pass a defined safety threshold are approved for deployment. Stage 7: Deployment and monitoring. The model is deployed via an API or integrated product. Inference infrastructure must handle variable load, manage KV caches efficiently, and route requests to available hardware. After deployment, the team monitors for unexpected behaviors, collects user feedback that can inform future fine-tuning, and tracks whether the model's behavior drifts over time or under distribution shift. Periodic safety monitoring continues throughout the model's production life.
Trace the Lifecycle: Full Model Audit
- This is the central activity for this lesson. Work in groups of three to four students. You will conduct a complete lifecycle trace for a hypothetical frontier model called Meridian-1.
- Meridian-1 background: A research lab is training a 70-billion parameter language model intended for general-purpose use, with a focus on scientific and technical reasoning. Training compute budget: approximately 10 million A100-GPU-hours. Intended deployment: public API, expected 10 million monthly active users.
- Your task is to complete a structured lifecycle audit document by working through each stage below and answering the questions. Write your answers as a team, be specific, and justify every decision.
- Stage 1: Data. (a) What sources would you include in the training corpus for a science-focused model? List at least five and explain why each was chosen. (b) What fraction of tokens would you allocate to code versus scientific papers versus general web text? Justify your ratios using what you know about how different data types affect capability. (c) Describe one data quality problem specific to scientific text and how you would filter for it.
- Stage 2: Architecture. (a) Using Chinchilla scaling principles, estimate the optimal number of training tokens for a 70-billion parameter model and a fixed compute budget of 10 million GPU-hours. Show your reasoning. (b) Would you use a standard dense transformer or a mixture-of-experts architecture? State your reasoning including the inference cost implications.
- Stage 3: Pretraining. (a) What would you monitor during training to detect instability? Describe two specific warning signs. (b) At what point during training would you run your first downstream benchmark evaluations, and why not earlier or later?
- Stage 4: Post-pretraining evaluation. (a) Design a capability evaluation suite for a science-focused model. Name five benchmarks and explain what each measures. (b) If you discovered the pretrained model could generate detailed synthesis routes for dangerous chemical compounds, what would you do before proceeding to alignment training?
- Stage 5: Alignment. (a) What specific behaviors would you prioritize in supervised fine-tuning for a scientific assistant? Write three example prompt-response pairs. (b) What is one way reward hacking could manifest specifically for a scientific assistant, and how would you detect it?
- Stage 6: Red-teaming. (a) Describe three adversarial prompts a red-teamer would use specifically for a scientific assistant. What harmful behavior is each designed to elicit? (b) What criterion would you use to decide the model is safe enough to deploy?
- Stage 7: Deployment. (a) Describe the inference infrastructure decisions you would make for 10 million monthly active users. Consider batching, context window limits, and tiered access. (b) What would you monitor post-deployment to detect that the model's behavior has degraded or changed unexpectedly?
- Present your lifecycle audit to the class. Compare your decisions with other groups and discuss: where did groups make different decisions? What do the differences reveal about the values and priorities underlying model development?
Every stage of the model lifecycle involves value judgments disguised as technical decisions. Choosing which data to include is a decision about whose knowledge and perspective the model will reflect. Choosing the safety threshold for deployment is a decision about acceptable risk. Alignment training encodes someone's definition of what helpful and harmless mean. Understanding the technical process means recognizing the value decisions embedded within it.
During post-pretraining evaluation (Stage 4), a team discovers their model excels at organic chemistry synthesis questions but performs poorly on physics reasoning. Given that the model is intended as a general scientific assistant, which Stage 1 decision most likely caused this imbalance, and how would it be addressed?
A lab discovers during red-teaming (Stage 6) that their model, when asked to write a story involving a chemistry teacher character, will produce accurate descriptions of drug synthesis routes embedded in the narrative. The lab's safety classifier does not flag this because the output is labeled as fiction. Which stage of the lifecycle most directly failed, and what is the correct fix?