Weights and What They Store
If you took a trained neural network and erased every weight, setting them all to zero, the network would become instantly useless — it would output the same meaningless number no matter what you gave it. Every fact a network has learned, every pattern it can recognize, every decision it can make is stored in one place and one place only: the weights.
Weights as Memory
Think about how you memorize a friend's face. You can't point to a single neuron in your brain that 'stores' their nose. The memory is distributed — spread across millions of connections, each contributing a tiny piece. Artificial neural networks work the same way. No single weight stores 'cat' or 'dog' or 'the number 7.' Instead, a pattern emerges from thousands or millions of weights working together. Before training begins, weights are set to small random numbers. The network knows nothing — it is guessing. Through training, each weight gets nudged in tiny increments until the network's outputs match the correct answers. When training ends, the weights freeze. From that moment on, those frozen numbers are the network's entire knowledge of its task.
A trained neural network is, at its core, just a long list of numbers — the weights. Change those numbers and you change what the network knows. The architecture (how neurons connect) stays the same; knowledge is entirely in the values.
To see this concretely, consider a neuron that detects whether a pixel is part of a horizontal edge in an image. It receives brightness values from a row of pixels. After training, its weights might look like this: Weights for top-row pixels: +1, +1, +1 (strong positive — bright pixels above matter) Weights for bottom-row pixels: -1, -1, -1 (strong negative — bright pixels below push the opposite way) If the top row is bright and the bottom row is dark, the sum is large and positive: horizontal edge detected. If both rows are equally bright, the positives and negatives cancel and the sum is near zero: no edge. Those specific weight values encode the concept of a horizontal edge. The network learned that from thousands of example images — you never told it what an edge was.
Big Weights, Small Weights, and What They Mean
A large positive weight says: when this input is big, push the output up strongly. A large negative weight says: when this input is big, pull the output down strongly. A weight near zero says: this input barely affects me. During training, the network figures out which weights should be large and which should be near zero for each task. For a spam filter, the weight on the word 'FREE!!!' in all-caps might grow very negative (reducing the spam score — wait, we want spam score high, so it might grow very positive). The weight on the word 'meeting' stays near zero because 'meeting' appears equally in spam and real email. This is the profound part: nobody told the network that 'FREE!!!' is suspicious. It figured that out by adjusting weights until its spam-or-not predictions matched correctly labeled examples.
Flashcards — click each card to reveal the answer
If a training dataset is skewed — for example, containing mostly images of people with one skin tone — the weights will learn that skew. The network's knowledge reflects its training data, flaws included. This is why dataset quality and diversity are crucial.
A neural network for recognizing handwritten digits has been trained. Where is its knowledge stored?
Before training starts, weights are typically set to:
Weights as Votes
- Step 1: You are a neuron deciding if a movie is good. Your inputs are critic score (0-10), friend's recommendation (0-10), and runtime in hours.
- Step 2: Assign weights that reflect YOUR values — maybe friend opinion matters most to you. Write them down.
- Step 3: Pick two movies you have seen. Score each on the three inputs.
- Step 4: Compute the weighted sum for each movie. Does the higher sum match the one you actually liked better?
- Step 5: Adjust one weight and recompute. Notice how changing a single weight shifts the result. This is what training does — one tiny weight adjustment at a time.