Depth and 3D Sensing
A standard photograph is flat. It shows you color and texture, but it tells you nothing definitive about how far away anything is. That missing dimension — depth — is critical for a robot that needs to navigate around obstacles, pick up objects, or model the space it occupies. Three-dimensional sensing gives a robot the ability to measure not just what is in the scene but exactly how far away each part of it is.
Lidar: Measuring with Light Pulses
Lidar stands for Light Detection and Ranging. A lidar sensor emits rapid pulses of laser light and measures how long each pulse takes to bounce off a surface and return. Since light travels at a known, constant speed, the return time directly tells the sensor the distance to whatever the pulse hit. A spinning lidar rotates its laser emitter while firing, scanning in many directions every second. By combining thousands of distance measurements taken from slightly different angles, the sensor builds a point cloud — a dense 3D map of surrounding surfaces expressed as a collection of points, each tagged with its exact position in three-dimensional space. Self-driving cars typically use lidar sensors mounted on their rooftops that produce point clouds with hundreds of thousands of points per second.
A point cloud is a set of data points in 3D space, each representing a spot on a surface the sensor detected. Imagine a dust of tiny dots floating in the air, collectively outlining every wall, floor, and object in a room.
Stereo Cameras: Depth from Two Eyes
Hold your finger in front of your face and close one eye, then switch eyes. Your finger appears to jump left or right against the background. That apparent shift is called parallax, and it is directly related to how far away the finger is. Objects closer to you show more parallax; objects farther away show less. A stereo camera uses exactly this principle. It contains two identical cameras mounted side by side at a known distance apart — just like your two eyes. A computer compares the two images and finds the parallax displacement for each point in the scene. Using the known separation between the cameras (called the baseline) and basic geometry, it calculates the depth to every matched point. The result is a depth map: an image where each pixel stores a distance value rather than a color.
Stereo vision is passive — it works with ambient light and requires no special emitters. This makes it excellent for outdoor use where additional light sources would be impractical. Its limitation is that it struggles with surfaces that have no texture, like a plain white wall, because there are no distinctive features to match between the two camera views.
Structured Light: Projecting Patterns
Structured light sensors solve the textureless-surface problem by projecting their own pattern — typically a grid of dots or stripes — onto the scene using an infrared projector. A camera then observes how that pattern deforms as it lands on objects of different depths. A flat wall produces a regular grid; a curved vase warps the pattern into curves. By analyzing the deformation, a processor computes depth at each projected point. The Microsoft Kinect, popular in research labs and classrooms from 2010 onward, used structured infrared light to generate depth images at 30 frames per second. Structured light sensors are widely used in robot arms for tabletop manipulation tasks because they provide high-resolution depth even on objects with no natural texture.
Match each 3D sensing technology to its key operating principle.
Terms
Definitions
Drag terms onto their definitions, or click a term then click a definition to match.
Choosing the Right Sensor
No single 3D sensing technology is best for every situation. Lidar gives accurate long-range depth but is expensive, heavy, and struggles in heavy rain or fog because water droplets scatter the laser pulses. Stereo cameras are cheap and lightweight but need textured surfaces and good lighting. Structured light works beautifully indoors on cluttered tabletops but fails outdoors because sunlight overwhelms the projected infrared pattern. Real robots often combine multiple sensing technologies to cover each other's weaknesses — a strategy explored in depth in the Sensor Fusion lesson. For now, understanding what each sensor measures and where it struggles is the foundation for making good engineering choices.
How does a lidar sensor measure distance?
Why does stereo vision struggle with plain, untextured surfaces?
Flashcards — click each card to reveal the answer