Skip to main content
AI Agents & Automation

⏱ About 20 min20 XP

Tool Use in Practice

The previous lessons covered the components of tool use: schemas, structured outputs, the round trip, design principles, error handling, and security. This lesson puts them together. We will trace a realistic multi-tool agent system end to end — from the user's request through every tool call, error, and model decision, to the final response. The goal is to see how all the pieces fit together in a single coherent picture, and to develop intuition for the kind of thinking required to build these systems well.

The System: A Research and Summarization Agent

The agent is a research assistant for a high school student writing a paper on renewable energy. The student asks: 'Find me three recent, credible sources on offshore wind energy efficiency, and give me a one-paragraph summary of each with the key statistic from each source.' The agent has four tools: search_web(query: string, num_results: integer, recency_days: integer) — Searches the web and returns a list of URLs with titles and snippets. Restricted to returning metadata only, not full content. Required: query. fetch_page(url: string, max_chars: integer) — Fetches and returns the text content of a web page, truncated to max_chars. Returns sanitized text with script tags removed. Required: url. extract_statistic(text: string, topic: string) — Uses a secondary model call to extract the single most relevant quantitative statistic from a passage of text about a given topic. Returns {statistic: string, context: string} or an error if no statistic found. Required: text, topic. summarize(text: string, max_words: integer, focus: string) — Generates a focused summary of a passage. Required: text, focus. Note the security decisions already embedded here: fetch_page sanitizes scripts (prompt injection mitigation); max_chars limits how much content enters the model's context (preventing context-stuffing attacks); the search tool returns snippets only, not full page content (limiting data exposure).

Security Is Designed In, Not Bolted On

Notice that every tool in this system has security considerations baked into its design: content sanitization in fetch_page, size limits, metadata-only search results. These were not added after the fact — they were part of the initial design. This is the professional approach: threat-model first, design second.

Here is the complete execution trace: Turn 1 — Model receives the user request. It reasons: I need recent web sources, then their full content, then statistics, then summaries. It decides to start with a search. Tool call 1: search_web(query='offshore wind energy efficiency recent research', num_results=5, recency_days=90) Result: 5 URLs with titles and snippets. The model scans the snippets and selects the 3 most relevant sources. Tool calls 2-4 (parallel): fetch_page for each of the 3 selected URLs with max_chars=3000. Result: Full text of 3 articles. One fetch returns a 403 Forbidden error — the site blocks automated access. Structured error: {error_code: 'HTTP_403', message: 'Access denied by server', recoverable: false}. Model reasoning: I have content from 2 sources but one fetch failed. I should search for a replacement. The model calls search_web again with a slightly different query to find a substitute third source. Tool call 5: search_web(query='offshore wind turbine efficiency improvements 2024', num_results=3, recency_days=90) Tool call 6: fetch_page for the top result from the backup search — succeeds. Now the model has text from 3 sources. It proceeds with statistics and summaries. Tool calls 7-9 (parallel): extract_statistic for each source's text with topic='offshore wind energy efficiency'. Results: Source 1 returns {statistic: '67% capacity factor achieved by Hornsea 3', context: 'Highest recorded for any commercial offshore wind farm'}. Source 2 returns {error_code: 'NO_STATISTIC_FOUND', message: 'No quantitative efficiency statistic found in text'}. Source 3 returns {statistic: '14% reduction in levelized cost of energy over 5 years', context: 'Due to turbine scale increases'}. Model reasoning: Source 2 has no extractable statistic. The model calls extract_statistic again with a broader topic to find any relevant number, or decides to note in the summary that no key statistic was identified. Tool calls 10-12 (parallel): summarize for each of the 3 sources with focus='offshore wind energy efficiency' and max_words=80. Final response: The model synthesizes all results into a structured response with source titles, URLs, summaries, and statistics — noting for Source 2 that no precise efficiency statistic was found but providing the summary.

This trace illustrates several important patterns: Adaptive recovery: When a fetch failed with a 403 error, the model did not give up — it searched for a replacement. This required the error to be returned to the model (not swallowed) and the model to have enough context to reason about an alternative. Parallel calls where possible: The three fetch_page calls, three extract_statistic calls, and three summarize calls were all parallelized within their respective rounds. This collapsed what could have been 9 sequential round trips into 3 parallel rounds — dramatically reducing latency. Graceful handling of partial failure: Source 2 had no extractable statistic. Rather than failing the entire request, the model acknowledged the limitation and delivered the best available response. This is graceful degradation in practice. Chained tool calls: The output of search_web fed into fetch_page; the output of fetch_page fed into both extract_statistic and summarize. This chaining is the essence of multi-step agent reasoning.

Match each design pattern demonstrated in this trace to the lesson principle it exemplifies.

Terms

Calling search_web again after a 403 fetch failure
Parallelizing all three fetch_page calls
Returning NO_STATISTIC_FOUND rather than null
fetch_page stripping script tags before returning content
Limiting search_web to metadata snippets only

Definitions

Input sanitization as a prompt injection defense built into the tool
Minimizing sequential depth to reduce total agent latency
Explicit structured errors that enable model reasoning instead of silent failures
Least privilege applied at the tool level, limiting data exposure per call
Adaptive recovery using structured errors that the model can reason about

Drag terms onto their definitions, or click a term then click a definition to match.

In the research agent trace, why were the three summarize calls placed in a separate parallel round after the extract_statistic calls, rather than being parallelized together with extract_statistic?

The research agent makes 12 total tool calls across 5 sequential rounds. If each tool call takes approximately 500ms to execute, what is the approximate minimum total execution time, assuming maximum parallelization within each round?

Trace Your Own Multi-Tool Agent

  1. Design a multi-tool agent for a task you care about. This could be a homework help assistant, a local event finder, a recipe assistant that checks pantry inventory, a sports statistics tracker — your choice.
  2. Step 1: Define the agent's purpose in one sentence and the user request it will handle.
  3. Step 2: Design 3-5 tools with full schemas (name, description, parameters with types and descriptions, required list). Apply all principles from Lesson 5.
  4. Step 3: Trace the complete execution sequence: list every tool call in order, identify which can be parallelized, and show what data flows from each tool call to the next.
  5. Step 4: Introduce one deliberate failure into your trace (a tool error, a missing result, a partial failure). Show how the agent recovers.
  6. Step 5: Apply the security checklist: (a) does every tool follow least privilege? (b) is any tool non-idempotent — and if so, what is your confirmation/idempotency mechanism? (c) does any tool process untrusted external content — and if so, what sanitization happens?
  7. Goal: experience the full design cycle of a real tool-using agent from schema to execution trace to security review.