Skip to main content
AI Agents & Automation

⏱ About 20 min20 XP

The Tool-Call Round Trip

You understand what a tool call is and how schemas describe tools. Now let us trace the complete round trip — the full sequence of events from the moment a user sends a message to the moment the model produces its final response, with one or more tool calls in between. This end-to-end picture is what you need to build, debug, and reason about real agent systems.

The Five-Phase Cycle

Every tool-use interaction passes through five phases. Understanding each phase precisely prevents the most common implementation mistakes. Phase 1 — Model receives context. The model receives the conversation history plus the tool definitions (schemas) in its context window. Before it generates a single token of response, the model has already 'read' every available tool and knows what each one does. In multi-turn conversations, the history may already contain previous tool calls and results. Phase 2 — Model decides: respond or call a tool? This decision is the model's highest-level reasoning step. For some queries the model can answer directly from its training knowledge. For others it determines that accurate answering requires live data, an external action, or a computation it cannot perform internally. That judgment is itself a form of reasoning the model has been trained to perform. Phase 3 — Model emits a tool call. If it decides to call a tool, the model outputs a structured tool call message — not a text response. Depending on the API, this might look like a JSON object with 'name' and 'arguments' fields, or a dedicated message type. The model stops generating at this point; control passes to the executor. Phase 4 — Executor runs the tool. Application code extracts the tool name and arguments, calls the corresponding function, and collects the return value. This is where real work happens: a database query runs, an API call is made, a file is read, code is executed. The executor is responsible for error handling — if the tool fails, the executor must decide whether to retry, return an error result, or abort. Phase 5 — Result is returned; model continues. The executor packages the tool result as a new message and appends it to the conversation history. The model sees: [user message] → [tool call] → [tool result] and continues generation. If another tool call is needed, it loops back to Phase 3. When the model has all the information it needs, it generates a final natural-language response to the user.

The Conversation Is the State Machine

In a tool-using agent, the conversation history is not just a log — it is the full state of the agent's current task. The model has no memory outside the context window. Every tool call, every result, every intermediate reasoning step must live in the conversation history for the model to use it. This is why managing context carefully is critical in long-running agent tasks.

Let us trace a complete, realistic round trip for a financial research assistant. The user asks: 'Compare the market cap of Apple and Microsoft, and tell me which grew more over the past year.' Phase 1: The model sees this message plus two tool schemas: get_stock_info(ticker) and calculate_percentage_change(old_value, new_value). Phase 2: The model reasons that it needs current and year-ago market cap values for both companies — it cannot answer accurately from training data because market caps change daily. Phase 3a: The model emits: call get_stock_info with ticker='AAPL'. Phase 4a: Executor calls the financial API. Returns: {current_market_cap: 3.1T, market_cap_one_year_ago: 2.8T}. Phase 5a: Result appended to history. Phase 3b: Model emits: call get_stock_info with ticker='MSFT'. Phase 4b: Executor calls the API. Returns: {current_market_cap: 3.2T, market_cap_one_year_ago: 2.5T}. Phase 5b: Result appended. Phase 3c: Model emits: call calculate_percentage_change with old_value=2.8, new_value=3.1. And: call calculate_percentage_change with old_value=2.5, new_value=3.2. Phase 4c: Executor runs both. Returns: 10.7% for AAPL, 28.0% for MSFT. Phase 5c: Both results appended. Final response: The model synthesizes: 'Apple's market cap is currently $3.1 trillion (grew ~10.7% year over year). Microsoft's is $3.2 trillion (grew ~28.0%). Microsoft grew substantially more.' Four tool calls, four round trips, one coherent answer.

Match each phase of the tool-call round trip to what the system is doing at that moment.

Terms

Phase 1: Context loading
Phase 2: Decision
Phase 3: Tool call emission
Phase 4: Execution
Phase 5: Continuation

Definitions

Model determines whether it can answer directly or must call a tool
Application code runs the real function and collects its return value
Tool result is added to history and the model resumes generation
Model outputs a structured request naming a tool and its arguments
Model reads conversation history and all available tool schemas before generating

Drag terms onto their definitions, or click a term then click a definition to match.

Parallel Tool Calls

In the financial example above, the two calculate_percentage_change calls in Phase 3c are independent — neither depends on the other's result. Some AI APIs (including OpenAI's and Anthropic's) allow the model to emit multiple tool calls simultaneously in a single response turn. The executor can then run them in parallel, cutting latency significantly. Parallel tool calls require the executor to handle concurrent execution and collect all results before returning them to the model. The model receives all results at once and can reason across them together. Not all tools are safe to parallelize — if Tool B's arguments depend on Tool A's output, they must run sequentially. Designing agent systems requires reasoning carefully about these dependencies. A useful mental model: treat tool calls like function calls in a program. Sequential calls are like sequential statements. Parallel calls are like parallel threads. The model's conversation history is the call stack. The executor is the runtime.

Latency Scales With Sequential Depth, Not Total Calls

An agent that makes 10 parallel tool calls in one round trip is much faster than one that makes 10 sequential calls. The key metric is the depth of the dependency chain — how many rounds of model inference plus tool execution must complete before the final answer. Minimize sequential depth to minimize latency.

An agent needs to: (1) fetch a user's account balance, (2) fetch the current price of a stock the user wants to buy, and (3) determine whether the user can afford 10 shares. In what order must these tool calls happen?

A developer's agent works correctly in testing but returns stale data in production. They discover the model is sometimes skipping tool calls and answering from its training knowledge. What is the most likely cause?

Map a Multi-Step Tool Chain

  1. Scenario: You are designing an AI agent that helps a logistics manager answer: 'How many of our delivery trucks are currently more than 30 minutes behind schedule, and what is the weather at their current locations?'
  2. You have access to three tools:
  3. - get_truck_statuses() → returns a list of all trucks with current position and schedule deviation in minutes
  4. - get_weather_by_coordinates(lat, lon) → returns current weather for a GPS location
  5. - calculate_eta_adjustment(weather_conditions, speed_mph) → returns estimated delay in minutes
  6. Step 1: Identify which tool calls are required and what arguments each needs.
  7. Step 2: Draw (or describe) the dependency graph: which calls depend on the outputs of other calls?
  8. Step 3: Label which calls can be parallelized and which must be sequential.
  9. Step 4: Estimate how many sequential 'rounds' of tool calls the agent needs before it has all the information for its final answer.
  10. Step 5: Identify one potential failure point in this chain and describe how the executor should handle it.
  11. Goal: develop the habit of reasoning about tool chains as dependency graphs before building them.