Decision Boundaries
Imagine plotting every training example as a point in a two-dimensional space, colored red if it belongs to class A and blue if it belongs to class B. A classifier's job, viewed geometrically, is to draw a line — or a curve, or a complicated boundary — through that space so that most red points are on one side and most blue points are on the other. The line it draws is called the decision boundary, and understanding its shape tells you almost everything about what an algorithm can and cannot do.
What Is a Decision Boundary?
Formally, a decision boundary is the set of all points x in the input space where a classifier is exactly indifferent between two classes — the region where the predicted class switches. For binary classification, the boundary divides the input space into two regions, one for each class. For K-class classification, the boundary is a collection of K(K-1)/2 pairwise boundaries separating each pair of classes. A concrete example with real numbers: Suppose a classifier uses the simple rule: predict spam if x₁ (exclamation mark count) > 5, else predict not spam. The decision boundary is the vertical line x₁ = 5. Points to the right are classified as spam; points to the left are classified as not spam. Now suppose the rule is: predict spam if 2x₁ + 3x₂ > 10, where x₁ is exclamation count and x₂ is whether 'free' appears. The boundary is the line 2x₁ + 3x₂ = 10. Points satisfying the inequality are classified as spam; points below the line are not spam. This is a linear decision boundary — it is a straight line in 2D, a flat plane in 3D, and a hyperplane in higher dimensions.
Once you know where the decision boundary is, you know how the classifier will behave on every possible input. The boundary is the classifier's complete prediction rule, expressed geometrically. Different algorithms correspond to different families of boundary shapes.
The shape of the decision boundary is determined by the algorithm's hypothesis space. Linear classifiers (logistic regression, linear SVMs) can only produce straight-line boundaries — hyperplanes. They work extremely well when classes are linearly separable (a straight line can cleanly divide them), but they fail when the true boundary is curved. Polynomial classifiers add polynomial features (x₁², x₁x₂, x₂², ...) to allow curved boundaries. A degree-2 polynomial classifier can produce ellipses, parabolas, and hyperbolas as boundaries. Kernel SVMs and neural networks can produce highly irregular, complex boundaries that wrap tightly around clusters of points. Decision trees produce boundaries that are always axis-aligned — they consist entirely of horizontal and vertical lines in 2D. This is because each split asks 'is feature j greater than threshold t?' which corresponds to a horizontal or vertical cut. A critical insight: a more flexible boundary is not always better. A boundary that perfectly separates all training points often wiggles through noise in the data, and it will misclassify new points that differ slightly from what was seen. This is the bias-variance trade-off.
A rigid boundary (high bias) may fail to capture the true boundary shape and underfits the data. A too-flexible boundary (high variance) captures training noise and overfits. The goal is to find a boundary flexible enough to represent the true pattern but not so flexible it memorizes noise. Selecting the right algorithm and regularization strength is how you walk this line.
Complete each statement about decision boundaries.
Margin and Confidence
The decision boundary tells you where class predictions switch, but it does not tell you how confident the classifier is. Confidence relates to distance from the boundary. Consider a point very close to the boundary: the classifier is nearly indifferent between the two classes. A small change in its features could flip the prediction. Now consider a point far inside one class region: the classifier is highly confident. This is formalized in Support Vector Machines (SVMs) through the concept of margin: the width of the strip between the decision boundary and the nearest training points on each side. SVMs explicitly try to maximize this margin, producing a boundary that is as far from the nearest points as possible. The intuition: a wider margin means the boundary is robust — even if test points are slightly different from training points, they are unlikely to cross to the wrong side. Margin = 2 / ||w||, where w is the weight vector defining the boundary. Maximizing the margin is equivalent to minimizing ||w||, which also acts as a form of regularization. A worked example: suppose the margin-maximizing boundary between two classes lies at x = 5, with the nearest spam email at x₁ = 6 and the nearest legitimate email at x₁ = 4. The margin is 2 units. An alternative boundary at x = 4.5 might also separate training data, but it has a smaller margin (1 unit from the nearest legitimate email) and is more fragile.
A logistic regression model trained to classify two types of flower produces the boundary 3x₁ - 2x₂ = 7. A new flower has features x₁=4, x₂=2. What does the model predict?
A decision tree classifier splits on feature x₁ > 3 at the root node. What shape does this produce in the x₁–x₂ plane?
Hand-Draw a Decision Boundary
- You will need graph paper or a blank grid drawn on paper.
- Step 1: Plot these 10 points. Label each with its class (+ or -).
- (+): (1,4), (2,5), (1,6), (3,5), (2,7)
- (-): (5,1), (6,2), (7,1), (5,3), (6,4)
- Step 2: Draw a linear decision boundary (a straight line) that correctly separates all + from all -.
- Step 3: Draw the margin strips — the parallel lines on each side of your boundary closest to the nearest points.
- Step 4: Now add this new point: (4, 4). Which class does your boundary assign it to? Is your confidence high or low based on its distance from the boundary?
- Step 5: Discuss — if you made the boundary more flexible (a curve), could you draw it so that (4,4) is even further from the boundary for one class? Would that be a good idea?