Skip to main content
Robotics & Embodied AI

⏱ About 15 min15 XP

Handling the Unexpected

No matter how well a robot is engineered, the real world will eventually throw something at it that nobody planned for. A sensor fails. A wheel hits an unexpected patch of gravel. A child places a toy in the exact spot the robot needs to occupy. A power surge corrupts a computation mid-task. The ability to handle these surprises gracefully — rather than freezing, crashing, or causing harm — is called robustness, and it is one of the hardest properties to engineer into an autonomous system.

Fault Detection: Knowing Something Is Wrong

The first step in handling the unexpected is noticing that something unexpected has happened. This sounds obvious, but it is a non-trivial engineering problem. A robot must have enough self-awareness to recognize when its sensors are giving implausible readings, when its actions are not having their expected effect, or when its hardware is degrading. One technique is consistency checking: comparing multiple independent sources of information to detect contradictions. If the wheel odometer says the robot is moving forward but the camera shows a stationary scene, something is wrong — either the wheel is slipping or the camera has failed. Another technique is expectation monitoring: before taking an action, the robot predicts what the next sensor reading should be. If reality does not match the prediction, a fault is flagged.

Fault vs. Failure

Roboticists distinguish between a fault (a component malfunctioning) and a failure (the overall system no longer meeting its requirements). A faulty sensor might not cause a system failure if a backup sensor covers the same information. Good robustness engineering catches faults early, before they cascade into failures.

Recovery Behaviors: Getting Back on Track

Once a fault is detected, the robot needs a recovery strategy. Recovery behaviors are pre-designed fallback plans for specific types of faults. Common examples include: if a path is blocked, attempt a different route; if a grasp fails three times, ask for human help; if battery drops below 15%, abandon the current task and return to the charging station. More advanced systems use exception handling inspired by software engineering: a hierarchy of responses ranging from local fixes (retry the action) to moderate interventions (switch to a backup sensor) to last-resort responses (stop everything and signal for human assistance).

Graceful Degradation

Graceful degradation means that as components fail, the robot continues to function — perhaps with reduced capability — rather than failing completely. A robot that loses one of its four cameras does not stop moving; it continues with reduced spatial awareness and moves more slowly. A robot whose left arm malfunctions may complete a two-handed task awkwardly with just the right arm. Graceful degradation requires redundancy — having backup components or alternative ways to achieve the same goal — and adaptive behavior — the ability to re-plan around the degraded capability rather than rigidly following the original plan that assumed full hardware function.

Match each robustness concept to its correct description.

Terms

Fault detection
Consistency checking
Recovery behavior
Graceful degradation
Expectation monitoring

Definitions

Predicting what the next sensor reading should be and flagging a fault if reality does not match the prediction
Comparing multiple independent data sources to spot contradictions that reveal a malfunction
A pre-designed fallback strategy the robot executes when a specific type of fault is detected
Identifying that a sensor reading, actuator response, or computation has produced an implausible or inconsistent result
Continuing to operate at reduced capability after a component fails rather than shutting down entirely

Drag terms onto their definitions, or click a term then click a definition to match.

Edge Cases: The Long Tail of Failure

Engineers test robots against a wide range of scenarios, but the real world has an enormous number of rare situations — edge cases — that are hard to anticipate. A robot trained in a well-lit warehouse may fail when a single fluorescent bulb flickers. The long tail of edge cases is one reason fully autonomous systems are so difficult to deploy safely in open, uncontrolled environments.

degradation means a robot keeps working at reduced capability when parts fail, rather than stopping entirely. checking compares multiple sensors to detect contradictions.

A warehouse robot's left wheel motor loses power. Instead of stopping, it continues its delivery at reduced speed using only its right wheel and adjusting its path. What robustness property is this robot demonstrating?

A robot predicts that after driving forward 0.5 meters its front sensor should read approximately 2.0 meters to the nearest wall. Instead it reads 0.2 meters. What technique is the robot using, and what might it conclude?

Failure Mode Analysis

  1. Step 1: Describe a simple robot task: a robot that collects dirty dishes from restaurant tables and carries them to the dishwasher.
  2. Step 2: List five things that could go wrong: hardware failures, sensor errors, unexpected environmental events, or software issues.
  3. Step 3: For each failure, write: (a) how the robot might detect that this failure has occurred, and (b) what recovery behavior it should perform.
  4. Step 4: Identify which of your failures would lead to graceful degradation and which would require a full stop and human assistance. Justify each choice.
  5. Step 5: Discuss: is it possible to plan for every failure in advance? What is the argument for keeping a human in the loop for autonomous robots in public spaces?