Tool Use in Practice
The previous lessons covered the components of tool use: schemas, structured outputs, the round trip, design principles, error handling, and security. This lesson puts them together. We will trace a realistic multi-tool agent system end to end — from the user's request through every tool call, error, and model decision, to the final response. The goal is to see how all the pieces fit together in a single coherent picture, and to develop intuition for the kind of thinking required to build these systems well.
The System: A Research and Summarization Agent
The agent is a research assistant for a high school student writing a paper on renewable energy. The student asks: 'Find me three recent, credible sources on offshore wind energy efficiency, and give me a one-paragraph summary of each with the key statistic from each source.' The agent has four tools: search_web(query: string, num_results: integer, recency_days: integer) — Searches the web and returns a list of URLs with titles and snippets. Restricted to returning metadata only, not full content. Required: query. fetch_page(url: string, max_chars: integer) — Fetches and returns the text content of a web page, truncated to max_chars. Returns sanitized text with script tags removed. Required: url. extract_statistic(text: string, topic: string) — Uses a secondary model call to extract the single most relevant quantitative statistic from a passage of text about a given topic. Returns {statistic: string, context: string} or an error if no statistic found. Required: text, topic. summarize(text: string, max_words: integer, focus: string) — Generates a focused summary of a passage. Required: text, focus. Note the security decisions already embedded here: fetch_page sanitizes scripts (prompt injection mitigation); max_chars limits how much content enters the model's context (preventing context-stuffing attacks); the search tool returns snippets only, not full page content (limiting data exposure).
Notice that every tool in this system has security considerations baked into its design: content sanitization in fetch_page, size limits, metadata-only search results. These were not added after the fact — they were part of the initial design. This is the professional approach: threat-model first, design second.
Here is the complete execution trace: Turn 1 — Model receives the user request. It reasons: I need recent web sources, then their full content, then statistics, then summaries. It decides to start with a search. Tool call 1: search_web(query='offshore wind energy efficiency recent research', num_results=5, recency_days=90) Result: 5 URLs with titles and snippets. The model scans the snippets and selects the 3 most relevant sources. Tool calls 2-4 (parallel): fetch_page for each of the 3 selected URLs with max_chars=3000. Result: Full text of 3 articles. One fetch returns a 403 Forbidden error — the site blocks automated access. Structured error: {error_code: 'HTTP_403', message: 'Access denied by server', recoverable: false}. Model reasoning: I have content from 2 sources but one fetch failed. I should search for a replacement. The model calls search_web again with a slightly different query to find a substitute third source. Tool call 5: search_web(query='offshore wind turbine efficiency improvements 2024', num_results=3, recency_days=90) Tool call 6: fetch_page for the top result from the backup search — succeeds. Now the model has text from 3 sources. It proceeds with statistics and summaries. Tool calls 7-9 (parallel): extract_statistic for each source's text with topic='offshore wind energy efficiency'. Results: Source 1 returns {statistic: '67% capacity factor achieved by Hornsea 3', context: 'Highest recorded for any commercial offshore wind farm'}. Source 2 returns {error_code: 'NO_STATISTIC_FOUND', message: 'No quantitative efficiency statistic found in text'}. Source 3 returns {statistic: '14% reduction in levelized cost of energy over 5 years', context: 'Due to turbine scale increases'}. Model reasoning: Source 2 has no extractable statistic. The model calls extract_statistic again with a broader topic to find any relevant number, or decides to note in the summary that no key statistic was identified. Tool calls 10-12 (parallel): summarize for each of the 3 sources with focus='offshore wind energy efficiency' and max_words=80. Final response: The model synthesizes all results into a structured response with source titles, URLs, summaries, and statistics — noting for Source 2 that no precise efficiency statistic was found but providing the summary.
This trace illustrates several important patterns: Adaptive recovery: When a fetch failed with a 403 error, the model did not give up — it searched for a replacement. This required the error to be returned to the model (not swallowed) and the model to have enough context to reason about an alternative. Parallel calls where possible: The three fetch_page calls, three extract_statistic calls, and three summarize calls were all parallelized within their respective rounds. This collapsed what could have been 9 sequential round trips into 3 parallel rounds — dramatically reducing latency. Graceful handling of partial failure: Source 2 had no extractable statistic. Rather than failing the entire request, the model acknowledged the limitation and delivered the best available response. This is graceful degradation in practice. Chained tool calls: The output of search_web fed into fetch_page; the output of fetch_page fed into both extract_statistic and summarize. This chaining is the essence of multi-step agent reasoning.
Match each design pattern demonstrated in this trace to the lesson principle it exemplifies.
Terms
Definitions
Drag terms onto their definitions, or click a term then click a definition to match.
In the research agent trace, why were the three summarize calls placed in a separate parallel round after the extract_statistic calls, rather than being parallelized together with extract_statistic?
The research agent makes 12 total tool calls across 5 sequential rounds. If each tool call takes approximately 500ms to execute, what is the approximate minimum total execution time, assuming maximum parallelization within each round?
Trace Your Own Multi-Tool Agent
- Design a multi-tool agent for a task you care about. This could be a homework help assistant, a local event finder, a recipe assistant that checks pantry inventory, a sports statistics tracker — your choice.
- Step 1: Define the agent's purpose in one sentence and the user request it will handle.
- Step 2: Design 3-5 tools with full schemas (name, description, parameters with types and descriptions, required list). Apply all principles from Lesson 5.
- Step 3: Trace the complete execution sequence: list every tool call in order, identify which can be parallelized, and show what data flows from each tool call to the next.
- Step 4: Introduce one deliberate failure into your trace (a tool error, a missing result, a partial failure). Show how the agent recovers.
- Step 5: Apply the security checklist: (a) does every tool follow least privilege? (b) is any tool non-idempotent — and if so, what is your confirmation/idempotency mechanism? (c) does any tool process untrusted external content — and if so, what sanitization happens?
- Goal: experience the full design cycle of a real tool-using agent from schema to execution trace to security review.