Skip to main content
AI Agents & Automation

⏱ About 20 min20 XP

How Agents Call Tools

A language model alone is an extraordinarily capable reasoner — but it is locked inside text. It can describe how to check the weather, calculate a mortgage payment, or query a database, yet it cannot actually do any of those things. The moment we give that model a set of tools it can invoke, the character of the system changes fundamentally. It stops being a text generator and starts being an agent: something that takes actions in the world on behalf of a user. Understanding how that transformation works at a technical level is the foundation for everything in this module.

The Basic Mechanism

Function calling — the mechanism underpinning tool use — was introduced by OpenAI in 2023 and has since become a standard feature of nearly every production AI API. The mechanism is elegantly simple in concept, even though its implications are deep. When a model is given a set of tool definitions (described in a structured format you will study in Lesson 2), it learns to do something new: instead of always generating a natural-language response, it can instead generate a structured message that says, in effect, 'I want to call this tool with these arguments.' That structured message is the tool call. The model does not execute the tool itself — it has no ability to run code or contact external services. It only emits a description of the call it wants made. A separate piece of software — the executor, or orchestrator — receives the tool call, actually runs the tool (a Python function, an API request, a database query), collects the result, and feeds that result back to the model as a new message. The model then continues its reasoning, now informed by the real data the tool returned. This request-execute-return cycle is the heartbeat of every AI agent.

The Model Never Executes

A critical architectural fact: the model itself never runs code, contacts APIs, or reads files. It only outputs a structured description of what it wants done. All actual execution happens in the application layer surrounding the model. This separation is what makes tool-calling systems inspectable and controllable.

Consider a concrete example. A user asks an AI assistant: 'What is the current price of NVIDIA stock?' The model cannot know this — its training data has a cutoff date and it has no live internet access. But if the model has been given a tool called get_stock_price that accepts a ticker symbol, here is what happens: 1. The model recognizes that answering this question requires live data it does not have. 2. Rather than guessing or refusing, it emits a tool call: call get_stock_price with argument ticker='NVDA'. 3. The executor receives this call, makes a real API request to a financial data service, and gets back the current price: $950.42. 4. The executor sends this result back to the model as a new message: 'Result of get_stock_price: $950.42'. 5. The model now composes its final response: 'NVIDIA (NVDA) is currently trading at $950.42.' From the user's perspective, they asked a question and got an accurate answer. Behind the scenes, a precise choreography of model reasoning, structured output, external execution, and result integration made it possible.

Match each step in the tool-calling sequence to what actually happens at that step.

Terms

Model recognizes a knowledge gap
Model emits a tool call
Executor runs the tool
Result is returned to model
Model generates final response

Definitions

Synthesizes tool result with its reasoning into natural language
Tool output is appended as a new message in the conversation
Outputs a structured message naming the tool and its arguments
Determines live data or action is needed to answer correctly
Application code contacts a real API, function, or database

Drag terms onto their definitions, or click a term then click a definition to match.

Why Not Just Prompt the Model to Include Tool Calls?

Before formal function-calling APIs existed, developers tried a workaround: they would instruct the model in its system prompt to format tool requests in a specific way — for example, 'If you need to search the web, write SEARCH: followed by your query.' The model would produce that text, code would detect it with a regex, and the search would run. This approach worked poorly for several reasons. The model would sometimes hallucinate the format, produce partial matches, or mix tool-call syntax with regular prose. Extracting structured arguments reliably from natural language is a parsing nightmare. And because the format was not part of the model's training, the model could not reason about tool availability — it was just mimicking a pattern. Formal function calling solves all of this. The model is trained to understand tool schemas natively, to reason about when tool use is appropriate versus when it can answer directly, and to emit tool calls in a machine-parseable format with guaranteed structure. The difference between improvised regex parsing and native function calling is the difference between a hack and an interface.

Tool Calls Are a First-Class Output Type

Modern AI APIs treat tool calls as a distinct output type alongside text. A model response can be: a text message, a tool call, or both (some APIs allow the model to reason in text and then emit a tool call in the same turn). Always check the response type before processing — do not assume every response is text.

An AI assistant has a tool called send_email. A user asks it to 'remind me to review the contract.' The model decides to call send_email. Which statement correctly describes what happens next?

Why did regex-based 'fake' function calling (instructing models to emit a special format in plain text) perform worse than formal function-calling APIs?

Flashcards — click each card to reveal the answer

Trace a Tool Call by Hand

  1. Work through this scenario step by step without writing any code.
  2. Scenario: An AI travel assistant has two tools available: search_flights(origin, destination, date) and get_weather(city, date). A user asks: 'Should I fly from Chicago to Miami next Saturday? What will the weather be like when I land?'
  3. Step 1: List every piece of information the model would need that it cannot know from training alone.
  4. Step 2: For each gap, identify which tool should be called and write out the tool call as if you were the model: tool name and all argument values.
  5. Step 3: Invent plausible tool results for each call (make up realistic flight and weather data).
  6. Step 4: Write the final response the model would give the user, incorporating both tool results naturally.
  7. Step 5: Reflect: in what order should the tool calls happen? Could they run in parallel, or must one finish before the other starts? Why does this matter for latency?
  8. Goal: internalize the full request-execute-return-reason cycle before studying the technical details of how each step is implemented.