Weights: The Network's Memory
If you saved a neural network to a file on your computer, you would not be saving any code that 'thinks.' You would be saving a long list of numbers. Those numbers are the weights (and biases) of every connection in the network. They are everything the network learned — every pattern it extracted from training data, every correlation it discovered, every feature it found useful. Change those numbers and you change what the network knows. Set them all to zero and the network forgets everything. Understanding weights is understanding what it means for a neural network to learn.
What a Weight Encodes
As you saw in Lesson 2, a weight is a multiplier. When the weight on a particular connection is large and positive, it means: 'when this input is high, push the output higher.' When a weight is large and negative, it means: 'when this input is high, push the output lower.' A weight near zero means: 'this input barely matters for this connection.' Now imagine a network trained to detect spam emails. After training, one neuron in an early hidden layer might have a very high weight connected to inputs that correspond to the words 'FREE,' 'CLICK HERE,' and 'ACT NOW.' That neuron has learned to fire strongly when those words appear. It is not because anyone told it those words mean spam — it discovered the pattern by adjusting weights across millions of examples until the weights that worked best emerged naturally. This is the central insight: the weights are not programmed by a human. They are discovered by a mathematical process that searches for the combination of weights that makes the fewest mistakes on training data.
A neural network starts with random weights — small numbers chosen at random. Through training, those weights are adjusted repeatedly until the network makes good predictions. The final set of weights is the network's 'knowledge.' There is no other place where learning is stored — no separate memory bank, no symbolic rules. Just numbers.
Let's trace a simple weight story. Suppose you are training a network to predict tomorrow's weather from three inputs: current temperature, humidity, and wind speed. The network starts with random weights, say: temperature → hidden neuron: weight = 0.12 humidity → hidden neuron: weight = −0.05 wind speed → hidden neuron: weight = 0.33 After seeing thousands of days of weather data, the weights might settle to something like: temperature → hidden neuron: weight = 0.71 humidity → hidden neuron: weight = 0.84 wind speed → hidden neuron: weight = 0.19 The high weight on humidity tells us: humidity turned out to be very important for predicting rain. The lower weight on wind speed says: wind matters, but less. The network discovered these importances by trying different weight values and keeping the ones that produced better predictions. Nobody labeled humidity as 'important for rain' — the training data taught that by example.
How Many Weights Are There?
Even small neural networks have a surprising number of weights. A network with 784 inputs, two hidden layers of 256 neurons each, and 10 outputs has: 784 × 256 = 200,704 weights in the first layer 256 × 256 = 65,536 weights in the second layer 256 × 10 = 2,560 weights in the output layer That is about 270,000 weights for a modest digit-recognizer. GPT-4, a large language model, has an estimated 1.8 trillion weights. Each of those numbers contributes, in a tiny way, to the model's responses. This scale matters because it explains why training takes so long, why it requires so much data, and why large AI models cost millions of dollars to train. The entire process is a search over a space of trillions of numbers, trying to find a combination that makes good predictions.
If the training data reflects human prejudices — for example, historical hiring data where one group was systematically underpaid — the weights will learn those prejudices too. A network is only as fair as the data it learned from. This is not a bug in the math; it is a consequence of how learning works. It is one of the most important ethical challenges in modern AI.
Flashcards — click each card to reveal the answer
What happens to a trained neural network if you set all its weights to zero?
Why might a neural network trained on historical data reflect human prejudices?
Weight Intuition Experiment
- You are a human 'training' a single neuron to predict exam scores from two inputs: hours studied (0-10) and hours slept the night before (0-10). The target: predict score out of 100.
- Step 1: Pick starting weights for each input (any numbers you like). Write them down.
- Step 2: Test on these three students:
- Student A: studied 8 hours, slept 7 hours. Actual score: 88.
- Student B: studied 2 hours, slept 9 hours. Actual score: 61.
- Student C: studied 6 hours, slept 4 hours. Actual score: 70.
- Step 3: Compute your neuron's prediction for each student (hours_studied × weight1 + hours_slept × weight2).
- Step 4: Compare your predictions to actual scores. For whichever input you think matters more, increase its weight slightly.
- Step 5: Recompute. Did your predictions improve?
- Repeat Steps 4-5 twice more. This manual process is a toy version of what a neural network does automatically, billions of times over.