Building a World Model
Perception produces a stream of observations. Detection outputs bounding boxes. SLAM builds a point cloud or occupancy grid. State estimation produces a pose estimate. But a robot's planner needs more than a raw stream — it needs a persistent, queryable representation of the environment that it can use to answer questions: Is that doorway passable? Where did I last see the package? Is this region of floor safe to stand on? The world model is that representation — the structured, maintained description of the environment that bridges perception and action.
Occupancy Grids: The Navigation Foundation
The occupancy grid is the simplest and most widely deployed world representation for mobile robot navigation. The environment is divided into a regular grid of cells (typically 5cm x 5cm in a 2D map, or 10cm x 10cm x 10cm voxels in 3D). Each cell holds a value representing the probability that the cell is occupied by an obstacle. The grid is updated using the inverse sensor model: given a sensor reading (e.g., a lidar return at range r, bearing theta), what is the probability that cells along the beam are free, and the cell at range r is occupied? The inverse sensor model encodes the sensor's geometry and noise characteristics. Log-odds representation is used in practice — storing log(p/(1-p)) for each cell — because it makes Bayesian updates a simple addition, avoiding numerical issues at extreme probabilities. ROS (Robot Operating System) OctoMap is the standard 3D occupancy representation for robotics, using an octree data structure that stores full 3D occupancy at configurable resolution. An octree is efficient: free space takes almost no memory (large free regions are collapsed into single coarse nodes), while occupied regions are stored at full resolution. Occupancy grids support navigation planning directly. A path planner like A* or Dijkstra operates on the grid, finding the least-cost path from the robot's current cell to the goal cell that does not pass through occupied cells. Inflation layers around obstacles account for the robot's physical footprint.
An occupancy grid assumes that anything not observed is 'unknown,' and the robot should not pass through unknown cells without exploration. This closed world assumption is appropriate for cautious navigation but too conservative for planning in large environments. Semantic and prior knowledge can relax it: if the robot knows a region is likely corridor (based on building floor plans), it can treat it as probably free even before sensing it.
Occupancy grids capture only geometry — is space occupied or free? They cannot answer questions like 'which objects can be manipulated?' or 'which surface is the table I am looking for?' Semantic maps augment geometric representations with object and category labels. A semantic map stores: for each detected object, its 3D location (centroid + bounding volume), its category label and confidence, its last observed time, and optionally its appearance embedding (a compact descriptor that allows recognition on re-encounter). The robot can query: 'where is the nearest chair?' or 'has the door to room 3B been observed open in the last 60 seconds?' Building a reliable semantic map requires solving the data association problem: when the robot observes a chair from a new viewpoint, does this correspond to an existing chair in the map, or is it a new chair? False splits (treating one object as two) and false merges (treating two objects as one) are persistent failure modes. Object identity is often maintained using persistent IDs, with appearance and position jointly used for re-identification. Scene graphs go further, representing objects as nodes and spatial relationships as edges. A scene graph might encode: mug.on(table), table.in(kitchen), door.adjacentTo(kitchen). This relational structure supports natural language grounding ('bring me the mug on the kitchen table'), commonsense reasoning ('if the table is removed, the mug will fall'), and long-horizon planning. Recent work (ConceptFusion, 2023; SayPlan, 2023) builds scene graphs from neural open-vocabulary perception, allowing robots to ground arbitrary language queries into the world model.
Match each world model representation to the planning query it best supports.
Terms
Definitions
Drag terms onto their definitions, or click a term then click a definition to match.
Most world models treat the environment as static: once an obstacle is mapped, it stays there. In dynamic environments — hospitals, offices, streets — people move, doors open and close, objects are rearranged. Dynamic world models explicitly track the state of movable objects over time. Distinguishing static background from dynamic foreground requires change detection, object tracking, and reasoning about persistence. A map that confidently places a person in a doorway from 30 seconds ago is a hazard, not an asset.
Complete the description of occupancy grid and semantic map representations.
A robot's occupancy grid marks a region as 'unknown' because its lidar cannot see around a corner. The path planner refuses to route the robot through this region. This behavior is:
A semantic map records that a coffee mug is located at position (2.3, 1.1, 0.8) in the kitchen. The robot was last in the kitchen 10 minutes ago. Why should the robot treat this entry with skepticism when planning a grasp?
Design a World Model for a Hospital Robot
- A robot must navigate a hospital, deliver medications to patient rooms, and find and pick up empty meal trays. Design its world model.
- Step 1: Identify the types of queries the planner will need to make of the world model. List at least six distinct query types (example: 'Is the corridor to room 204 passable right now?').
- Step 2: For each representation type — occupancy grid, semantic object map, and scene graph — describe what information it would store for this hospital environment and how it would be updated.
- Step 3: Identify three categories of dynamic elements in a hospital environment that a static world model would fail to handle correctly. For each, propose how the world model should track them.
- Step 4: The hospital has a map of all rooms and permanent walls available as a floor plan. Describe how this prior information should be incorporated into the initial world model before the robot has done any sensing.
- Step 5: A new ward has been constructed and is not on the floor plan. How should the robot's world model handle a region it encounters that contradicts its prior map?
- Submit your design as a structured technical specification.