Skip to main content
AI Foundations

⏱ About 20 min20 XP

Module Check

You have completed Inside a Neural Network. You have traced the journey from the perceptron — a single linear threshold unit — through the full machinery of a deep learning system: neurons with weights and biases, nonlinear activation functions, a forward pass that transforms data layer by layer, a loss function that measures error, gradient descent that navigates the loss landscape, backpropagation that makes computing gradients over millions of parameters practical, and finally the architectural families that match network structure to data structure. This review tests your understanding of all eight lessons. Work through it honestly — wherever you find a gap, return to the relevant lesson before moving on.

Flashcards — click each card to reveal the answer

Review Questions

A perceptron with two inputs can reliably classify which of the following datasets?

A neuron has inputs x₁ = –2.0, x₂ = 1.5 with weights w₁ = 0.8, w₂ = –0.6, and bias b = 1.0. Its activation is ReLU. What is the neuron's output?

Why is binary cross-entropy preferred over raw classification accuracy as a training loss?

In gradient descent, the weight update rule is w_new = w_old – α × (dL/dw). Suppose dL/dw = –4.0 and α = 0.05. How does w change?

What is the vanishing gradient problem, and which of the following correctly names both its cause and a solution?

A team is building a system to detect fraudulent transactions in a stream of bank transactions. Each transaction has a timestamp, a dollar amount, and a merchant category — and patterns unfold over time (e.g., three small transactions followed by one large one indicates fraud). Which architectural family is most appropriate, and why?

The Through-Line

Every concept in this module connects to every other. The perceptron's limits motivate stacking layers. Stacking requires nonlinear activations. Nonlinear activations require differentiability for backpropagation. Backpropagation uses the chain rule to compute gradients. Gradients feed gradient descent, which minimizes the loss. The loss encodes the task. The architecture encodes the structure of the data. Change any piece and every other piece is affected. A neural network is not a collection of independent tricks — it is a coherent system, and understanding it deeply means understanding how these pieces constrain and enable each other.

Capstone: Design a Network from Scratch

  1. You are building an image classifier to identify ten species of birds from 64×64 color photos. Design the network, justifying every decision.
  2. Part 1 — Input and architecture:
  3. What is the dimensionality of one input? (Compute it.)
  4. Why is a CNN more appropriate than a fully connected network for this task?
  5. Sketch three to four layers: describe the type of each layer (convolutional, pooling, fully connected), what it does, and why you included it.
  6. Part 2 — Output and loss:
  7. How many output neurons should the final layer have, and why?
  8. Which activation function should the output layer use, and why?
  9. Which loss function should you train with? Explain your choice.
  10. Part 3 — Training:
  11. Write the update rule for one weight, naming every variable.
  12. Identify one risk you face during training (vanishing gradients, overfitting, class imbalance, etc.) and describe one concrete measure to address it.
  13. Part 4 — Honest limitations:
  14. Name one thing your network will likely do poorly, and explain why the architecture or training procedure cannot fully prevent it.
  15. Write your full design as a structured technical memo — one to two pages. Be precise about numbers where they can be computed and honest about what remains uncertain.