Skip to main content
Robotics & Embodied AI

⏱ About 15 min15 XP

Robots That Learn

Most traditional robots run exactly the code their engineers wrote. They are fast and precise, but they cannot improve. If the factory changes its process, a programmer must update the code. A robot that learns is different: it adjusts its own behavior based on experience, getting better at tasks over time just as an athlete improves through practice. Learning robots are still an active research frontier, but they are already appearing in real warehouses, labs, and homes.

Reinforcement Learning: Learning Through Trial and Error

Reinforcement learning (RL) is a training approach inspired by how animals learn. The robot (called the agent in RL terminology) tries an action, receives feedback about whether that action was good or bad (called a reward or penalty), and gradually adjusts its strategy to get more rewards over time. No one explicitly programs the robot what to do — it figures it out by trial and error. Imagine teaching a robot arm to stack blocks. The robot flails randomly at first, knocking blocks everywhere. Each time it successfully places a block, it gets a positive reward signal. Each time it knocks a block off, it gets a negative signal. Over thousands or millions of attempts (mostly in simulation), the robot's strategy gradually shifts toward the movements that stack blocks reliably.

The Reward Function

In reinforcement learning, the reward function is the mathematical expression of what the designer wants the robot to achieve. Getting the reward function right is tricky: robots famously find unexpected shortcuts. In one experiment, a simulated robot rewarded for moving fast learned to grow extremely tall and fall forward — technically fast, but completely useless. Designing a good reward function is an art as much as a science.

Imitation Learning: Learning by Watching

Reinforcement learning can require millions of trials — impractical when training in the real world takes real time and risks real damage. Imitation learning offers a shortcut: instead of starting from random behavior, the robot watches a human demonstrate the task and tries to copy what the human did. A human operator might physically guide a robot arm through the correct motion of picking up a cup, or demonstrate the task via teleoperation. The robot records the sensor inputs and the actions taken, then trains a neural network to map similar inputs to similar actions. The result is a robot that starts out already pretty good at the task, which can then be fine-tuned with reinforcement learning.

Sim-to-Real Transfer

Training a robot entirely in simulation is much cheaper and faster than real-world training, but simulations are never perfect. Sim-to-real transfer is the challenge of making behavior learned in simulation work correctly on a physical robot. Techniques like domain randomization — deliberately varying physics parameters during simulation training — help the robot generalize to the messiness of the real world.

What Robots Can and Cannot Yet Learn

Learning-based robots have made extraordinary progress. They can now learn to walk, grasp diverse objects, play games at superhuman level, and assist in cooking tasks — all without explicit programming. But challenges remain. Learning robots often need enormous amounts of training data or experience. Their learned behaviors can be brittle, failing on situations slightly outside what they trained on. And it can be hard to explain why a learned robot does what it does, which creates safety concerns in high-stakes applications.

Match each learning concept to its correct description.

Terms

Reinforcement learning
Reward function
Imitation learning
Sim-to-real transfer
Domain randomization

Definitions

The challenge of making behavior learned in simulation work correctly on a physical robot
A training method where the robot improves by receiving rewards for good actions and penalties for bad ones
The mathematical signal that tells a learning robot how well it is achieving its goal
Training a robot by having it observe and copy human demonstrations of a task
Varying physics parameters during simulation training to help the robot generalize to the real world

Drag terms onto their definitions, or click a term then click a definition to match.

Flashcards — click each card to reveal the answer

A robot arm is trained with reinforcement learning to sort colored blocks. After millions of simulated attempts, it sorts reliably. Then engineers test it on real blocks under different lighting and it fails completely. What problem does this illustrate?

Why is imitation learning often used as a starting point before reinforcement learning?

Design a Reward Function

  1. Step 1: Pick a task for a learning robot: folding a T-shirt, pouring water into a glass without spilling, or navigating a maze.
  2. Step 2: Write three reward rules — situations that should give the robot a positive reward and how large (+1, +5, +10).
  3. Step 3: Write two penalty rules — situations that should give the robot a negative reward.
  4. Step 4: Now try to find a 'loophole' — a behavior the robot could exploit to score high rewards without actually doing the task well. (Example: a robot rewarded for touching the shirt might just repeatedly poke it.)
  5. Step 5: Revise your reward function to close that loophole. Discuss: why is designing the right reward function difficult?