Skip to main content
AI Agents & Automation

⏱ About 20 min20 XP

Designing Good Tools

Tool calling gives an AI agent access to the real world. But the quality of that access depends entirely on how the tools are designed. A poorly designed tool — one that does too much, has an ambiguous description, or behaves unpredictably — degrades the entire agent. A well-designed tool is a clean, reliable interface: the model knows when to use it, what to pass it, and what to expect back. Tool design is a discipline as important as prompt engineering or model selection, and it is just as often the root cause of agent failures.

Principle 1: Single Responsibility

Each tool should do one thing and do it well. This is the software engineering principle of single responsibility applied to AI tools, and it matters even more in this context because the model must select tools based on descriptions. Consider a tool called handle_user_data with a description: 'Create, read, update, or delete user records.' This tool does four fundamentally different operations. From the model's perspective, it is one tool that might create a record when deletion was intended. Separating into create_user, get_user, update_user, and delete_user makes each tool's purpose unambiguous. The model will reliably select the right one. Single-responsibility tools are also safer: a model that needs read access but is handed a tool that can also delete records is a liability. Scoping tools narrowly limits blast radius when something goes wrong. There is a tension here: more tools means a longer schema list in the model's context, which consumes tokens and can overwhelm the model's tool-selection reasoning if the list is very large. The practical balance is to group actions that are genuinely inseparable (like search_and_rank, where ranking is always part of search) and split those that are independently useful.

The Model Chooses by Description, Not by Code

When you have two tools that overlap in capability — for example, search_documents and get_document_by_id — the model distinguishes them entirely through their descriptions. If the descriptions are ambiguous or similar, the model will guess. Every tool in your toolkit needs a description specific enough that there is exactly one obvious right choice for any given situation.

Principle 2: Predictable, Consistent Behavior

A tool that returns different structures or types depending on runtime conditions creates chaos in an agent pipeline. If get_product sometimes returns {price: 29.99} and other times returns {price: '29.99 USD'} — a number versus a formatted string — any downstream logic that does arithmetic on the price field will fail intermittently. Predictability has several dimensions: Consistent schema: the tool always returns the same fields in the same types. If a product is out of stock, return {price: 29.99, in_stock: false, quantity: 0} rather than omitting the price field. The model and the executor both need to rely on a stable contract. Idempotency where possible: an idempotent tool produces the same result when called multiple times with the same arguments. Read-only tools (get_weather, search_products) are naturally idempotent. For write tools (send_email, charge_card), idempotency must be carefully designed — for example by including a unique request ID and checking whether the action has already been performed. This prevents double-sends and double-charges when an agent retries a tool call after a timeout. Explicit errors: when a tool fails, return a structured error rather than raising an exception or returning null. An error like {error: 'USER_NOT_FOUND', message: 'No user with id 12345 exists'} tells the model exactly what happened and lets it reason about whether to try a different approach.

Non-Idempotent Tools and Agent Loops

Agents that retry tool calls on failure can cause serious problems if the tools are not idempotent. An agent that sends an email, receives a timeout before the result arrives, and retries the call can send the same email twice. Design non-idempotent tools (anything that sends, charges, creates, or deletes) with idempotency keys or guard checks to prevent duplicate execution.

Principle 3: Descriptions That Enable Judgment

The description is not just a label — it is the model's operating manual for the tool. Good descriptions answer four questions the model implicitly asks every time it considers a tool: 1. What does this tool return? Not just 'information about a user' but 'the user's name, email, account status, and creation date as a JSON object.' 2. When should I use it? 'Use when you need to look up a specific user by their ID or email address.' 3. When should I not use it? 'Do not use to search for users by name or to list all users — use search_users for those cases.' 4. What should I know about the arguments? Inline the most important constraints into the parameter descriptions rather than relying on the model to infer them from types alone. The negative case — 'when not to use it' — is often the most powerful. If you have a fast approximate search tool and a slow exact lookup tool, telling the model 'use exact_lookup only when the user needs precise results and approximate_search would be insufficient' enables the model to make the cost-quality tradeoff correctly.

Match each tool design principle to the problem it prevents.

Terms

Single responsibility per tool
Consistent return schema
Idempotency keys on write tools
Explicit negative usage guidance in description
Structured error return values

Definitions

Enables the model to reason about what went wrong and whether to try an alternative approach
Prevents the model from choosing this tool in situations where a different tool is more appropriate
Prevents downstream code from failing when a field is missing or has an unexpected type
Prevents duplicate emails, charges, or records when an agent retries a timed-out call
Prevents the model from accidentally triggering a destructive operation when only a read was needed

Drag terms onto their definitions, or click a term then click a definition to match.

An agent has a tool search_all with description 'Search everything.' The agent frequently calls this tool when a more specific tool (search_products or search_customers) would be correct. What is the most direct fix?

An agent task management system has a tool create_task(title, description, assignee). A user asks the agent to create 5 similar tasks. The executor times out on the third call. The agent retries all calls from the beginning. What problem occurs if create_task is not idempotent?

Audit a Broken Tool Design

  1. Here is a poorly designed tool schema for a note-taking assistant:
  2. name: 'notes'
  3. description: 'Does stuff with notes.'
  4. parameters:
  5. action: { type: string } -- no description, no enum
  6. data: { type: string } -- no description
  7. id: { type: string } -- no description, not marked optional or required
  8. Step 1: List every design problem you can identify. For each problem, explain what agent behavior it would cause.
  9. Step 2: Determine how many single-responsibility tools this should be split into. Name them.
  10. Step 3: Rewrite the schema for one of those tools with a proper description, well-described parameters with types, enums where appropriate, and a required array.
  11. Step 4: Write a 'when not to use it' clause for your tool's description.
  12. Step 5: Invent a realistic failure case for your tool (e.g., note not found, permission denied) and describe the structured error response the tool should return.
  13. Goal: build the habit of reading tool designs critically before trusting them in an agent system.