Sensor Fusion
No single sensor is sufficient for robust robot perception. A camera is rich in texture but blind in the dark and unable to measure distance. A lidar measures distance with high accuracy but cannot read signs or detect transparent objects. An IMU measures motion at high frequency but drifts over time. GPS provides absolute position outdoors but is unavailable indoors and unreliable in urban canyons. The strengths of each sensor cover the weaknesses of the others. Sensor fusion is the principled combination of multiple sensor modalities to produce estimates that are more accurate, more complete, and more robust than any single modality alone.
Why Fusion Is Not Simple Averaging
Naive approaches to combining sensors — average the readings, take the majority vote, use the most recent — fail because sensors have different uncertainties, different latencies, different failure modes, and different physical interpretations. A probabilistic framework is required. The core principle of fusion is: weight each sensor's contribution by its reliability. A sensor with low noise contributes more to the fused estimate; a sensor with high noise contributes less. When a sensor fails (its reading is wildly inconsistent with all others), it should be downweighted or excluded. Kalman-based fusion implements this automatically: the Kalman gain does the weighting, using each sensor's noise covariance. A second principle is temporal alignment. Sensors run at different rates: an IMU might output at 1000 Hz, a camera at 30 Hz, a lidar at 10 Hz. Fusing these requires interpolation, extrapolation, or explicit time-stamping and buffering so that measurements are combined at a consistent reference time. Incorrect temporal alignment introduces systematic errors — a sensor reading from 100 milliseconds ago describes a different state than the current one, especially on a fast-moving vehicle. A third principle is spatial alignment. Each sensor measures in its own local coordinate frame. A camera measures pixel coordinates. A lidar measures range in polar coordinates from the sensor origin. Before fusion, all measurements must be expressed in a common coordinate frame, using the known rigid-body transformation between each sensor and the robot's reference frame. This transformation (the extrinsic calibration) must be precisely measured; errors of even a few millimeters or fractions of a degree degrade fusion quality significantly.
Sensors can be fused complementarily (each covers a gap the other has — GPS covers long-range absolute position, IMU covers short-term high-frequency motion) or redundantly (multiple sensors of the same type provide cross-checks). Complementary fusion expands capability; redundant fusion improves reliability and detects failures. Robust real-world systems need both: complementary sensors for complete coverage, redundant sensors for fault detection.
IMU-camera fusion (Visual-Inertial Odometry, VIO) is one of the most widely deployed fusion architectures in robotics. The IMU provides high-rate orientation and acceleration estimates that can propagate pose between camera frames, bridging the camera's low 30 Hz rate with the IMU's 1000 Hz rate. The camera provides absolute scale and drift correction that the IMU alone cannot — IMU integration drifts quadratically in position. Together they produce a low-drift, high-frequency, metric-scale pose estimate from lightweight, inexpensive hardware. VIO is the backbone of many consumer drones (DJI, Skydio), augmented reality headsets (HoloLens, Quest), and handheld robots. It operates without GPS, making it viable indoors and underground. Apple's ARKit and Google's ARCore are consumer implementations of VIO running on phone hardware. Lidar-camera fusion is the dominant approach for autonomous vehicles. Lidar provides accurate depth at the cost of texture; the camera provides texture and color at the cost of depth. Projecting lidar points onto the camera image associates each 3D point with a pixel's color and semantic label. Conversely, projecting the camera's semantic segmentation onto the lidar point cloud gives each 3D point an object identity. The 2023-era Tesla 'pure vision' strategy (dropping lidar) and Waymo's lidar-plus-camera strategy reflect different engineering bets on the sufficiency of camera data and the cost/reliability tradeoffs of adding lidar.
Match each sensor fusion scenario to the specific problem it solves.
Terms
Definitions
Drag terms onto their definitions, or click a term then click a definition to match.
Sensor fusion quality depends entirely on the accuracy of the extrinsic calibration — the measured transformation between each sensor pair. A mechanical shock, a loose mount, or thermal expansion can shift a sensor by a millimeter or a degree. In safety-critical systems, extrinsic calibration is verified at every startup using automatic calibration targets, and fusion health is monitored continuously for consistency degradation.
Complete the key principles of sensor fusion.
A self-driving car fuses GPS (updated at 1 Hz) with IMU (updated at 1000 Hz) for position estimation. The IMU-integrated position between GPS updates drifts at a rate of 0.1 m/s. After 3 seconds without a GPS update (which can happen in a tunnel), what is the expected position error from IMU drift alone?
A fusion system receives a GPS fix that places the robot 15 meters from its IMU-propagated estimate. The maximum plausible GPS error at this location (open sky, good geometry) is 3 meters. What should the fusion system do?
Design a Fusion Architecture for an Underwater Robot
- An underwater inspection robot must navigate inside a submarine pipeline. Available sensors: (1) DVL — Doppler velocity log, measures velocity relative to the pipe wall at up to 10 Hz with 0.1% accuracy; (2) IMU — measures angular rates and accelerations at 500 Hz, drifts 10 degrees/hour in heading; (3) pressure sensor — measures depth via water pressure at 100 Hz, very accurate; (4) forward-facing sonar — produces a 2D scan of the pipe interior at 2 Hz. GPS is unavailable underwater.
- Step 1: For each sensor, state what quantity it measures, what its primary strength is, and what its primary failure mode or limitation is.
- Step 2: Design a fusion architecture: which sensors are fused at which stage? Draw a block diagram describing your architecture in words (Primary estimator: ..., Secondary correction: ..., etc.).
- Step 3: How would you handle heading drift from the IMU over a 2-hour inspection? What sensor or technique could bound the heading error?
- Step 4: If the DVL loses bottom lock (loses contact with the pipe wall and returns no valid reading), how should the fusion system respond?
- Step 5: What is the expected position error at the end of a 2-hour inspection, and how would you verify it?