Skip to main content
Machine Learning & Deep Learning

⏱ About 20 min20 XP

Anomaly Detection

In April 2022, a single rogue trading algorithm generated a flash crash in a commodity market in under 14 seconds. The trades were statistically anomalous — they bore almost no resemblance to the billions of normal transactions in the system's history. Anomaly detection is the discipline of building systems that recognize the unusual among the overwhelming mass of the ordinary. It is one of the most consequential applications of unsupervised learning because labels for rare events — fraud, equipment failures, cyberattacks — are almost always missing.

What Makes a Point Anomalous

Formally, an anomaly (also called an outlier) is a data point that deviates significantly from the expected distribution of the data. Three structural types: 1. Point anomaly: a single observation is anomalous relative to the rest of the dataset. Example: a credit card transaction of $47,000 when all other transactions for that account are under $200. 2. Contextual anomaly: a point that is anomalous within a specific context but normal elsewhere. Example: a temperature of 35°C is normal in July in Phoenix; the same temperature is highly anomalous in January in Oslo. 3. Collective anomaly: a group of individually unremarkable points that together form an anomalous pattern. Example: each heartbeat in an ECG trace looks normal, but an unusual sequence of beats signals arrhythmia. Recognizing which type of anomaly you are hunting guides algorithm selection.

Key Insight: Normal Is a Distribution, Not a Rule

Anomaly detection works by modeling what normal looks like — building an explicit or implicit distribution over normal data — then measuring how unlikely a new point is under that model. A point is flagged as anomalous when its probability under the normal model falls below a threshold. This means anomaly detection requires a good model of normality before it can find abnormality.

Isolation Forest — a widely used algorithm: Intuition: anomalies are rare and different from normal points. If you randomly partition the data space by drawing random splits (like a random decision tree), anomalies get isolated in very few splits, while normal points require many splits to separate from each other. Algorithm: 1. Build t random trees (an 'isolation forest'). Each tree recursively partitions the data by randomly selecting a feature and a random split value within that feature's range. 2. For each data point, record the average path length across all trees — how many splits were needed to isolate it. 3. Points with short average path lengths are anomalies (they were easy to isolate). Points with long paths are normal. 4. Compute an anomaly score: score = 2^(−average_path_length / c(n)), where c(n) is the average path length of an unsuccessful search in a binary search tree of n nodes. Concrete example: in a dataset of employee login times, most employees log in between 8 AM and 6 PM on weekdays. A login at 3 AM on a Sunday will be isolated in the very first or second split of every tree (random splits in the time feature will immediately separate it from the dense cluster of normal times). Its anomaly score approaches 1. Normal logins require many splits to isolate and get scores near 0.5.

Other approaches: Gaussian Modeling: assume each feature follows a normal distribution. Fit mean μ and variance σ² from training data. Flag points that fall more than k standard deviations from the mean. Simple and interpretable, but assumes Gaussian distributions and struggles with multivariate correlations. One-Class SVM: train a support vector machine to define a tight boundary around the normal data in feature space. Anything outside that boundary is anomalous. Powerful but computationally expensive on large datasets. Autoencoder-based detection: train a neural autoencoder on normal data. The network learns to compress and reconstruct normal points well. At inference time, compute reconstruction error: points the network cannot reconstruct accurately (high error) are anomalous because the bottleneck learned only normal patterns. Threshold selection: every method produces a score; you must choose a threshold above which a point is declared anomalous. Too low a threshold: many false alarms. Too high: real anomalies missed. This precision-recall trade-off is the core operational challenge.

Match each anomaly detection scenario to its anomaly type.

Terms

One transaction of $95,000 in an account that normally sees $50-$300 transactions
A temperature reading of 40°C that is abnormal only in the context of winter in Canada
A sequence of individually normal network packets that together spell out an exfiltration pattern
An autoencoder with high reconstruction error on a new data point

Definitions

Point anomaly
Collective anomaly
Contextual anomaly
Anomaly score from reconstruction error

Drag terms onto their definitions, or click a term then click a definition to match.

Real Applications and Honest Limitations

Anomaly detection is deployed across high-stakes domains: - Fraud detection: flag transactions deviating from a cardholder's historical pattern. - Network intrusion detection: identify traffic patterns inconsistent with normal usage. - Predictive maintenance: detect sensor readings indicating impending equipment failure before breakdowns occur. - Medical monitoring: flag vital-sign patterns suggesting patient deterioration in ICUs. - Manufacturing quality control: identify defective items on assembly lines. Honest limitations: 1. Definition drift: what is 'normal' changes over time. A fraud model trained in 2020 may flag legitimate pandemic-era spending patterns as anomalous. Models must be retrained periodically. 2. Adversarial adaptation: sophisticated attackers learn the anomaly detection system and craft attacks that avoid its blind spots — staying just below the threshold. 3. Labeling for evaluation: because anomalies are rare and often unlabeled, evaluating whether a detection system works is hard. Precision and recall on a labeled anomaly test set are the gold standard, but such sets are expensive to construct. 4. Alert fatigue: too many false positives and human reviewers stop investigating alerts, defeating the purpose.

Base Rate Neglect

If genuine fraud is 0.01% of transactions (1 in 10,000), even a 99% accurate detector will generate roughly 100 false alarms for every true fraud detected — because there are so many more normal transactions. This is the base rate problem. Always reason about precision and recall jointly, not just accuracy, when anomalies are rare.

An Isolation Forest assigns a new data point an anomaly score of 0.97. What does this mean?

A security team sets their intrusion detection anomaly threshold very low, flagging the top 0.1% of all traffic as anomalous. They find the security analysts are overwhelmed and ignoring most alerts. What is the likely problem and fix?

Design an Anomaly Detector

  1. Step 1: Choose one of these domains: bank fraud detection, hospital patient monitoring, or factory equipment sensors.
  2. Step 2: List 5 features you would track (e.g., for banking: transaction amount, time since last transaction, merchant category, country, device fingerprint).
  3. Step 3: For each feature, describe what 'normal' looks like and what range of values you would consider anomalous.
  4. Step 4: Describe one example of a contextual anomaly in your domain — a data point that seems normal in isolation but is anomalous given context.
  5. Step 5: Identify one way a sophisticated adversary could evade your detector — how might they craft their behavior to avoid anomaly flags?
  6. Step 6: Discuss: if your system has a 5% false positive rate, how many false alerts per day would a bank with 1 million transactions see? Is this operationally acceptable?