Tool Use in AI Agents: Function Calling Explained
AI Agents
Tool Use in AI Agents: Function Calling Explained
SStackviv Team
14 min read

Key takeaways

  • Tool use (function calling) lets AI agents interact with external systems, APIs, and databases instead of just generating text
  • LLMs do not execute tools directly; they generate structured JSON that your application uses to call the actual functions
  • Good tool definitions need clear names, detailed descriptions, and proper JSON schemas with type constraints
  • Parallel tool calling speeds up agent workflows by running multiple independent operations simultaneously
  • All major providers (OpenAI, Anthropic, Google) now support native function calling with strict schema validation

AI chatbots are great at generating text. But they cannot check your bank balance, send an email, or fetch the weather on their own. That is where tool use comes in.

Tool use, also called function calling, gives AI agents the ability to interact with external systems. Instead of guessing or making things up, an agent can call a weather API, query a database, or execute code to get real, accurate information.

This capability transformed LLMs from isolated text generators into practical tools that can actually do things. And if you are building AI agents, understanding function calling is essential.

Let us break down how it works, why it matters, and how to implement it effectively.

What Is Function Calling in AI Agents?

Function calling is the process where an LLM decides to use an external capability to complete a task it cannot handle on its own.

Here is a simple definition: when a user asks "What is the weather in Tokyo?", the LLM recognizes it needs real data. It generates a structured request for a weather tool with the parameter "Tokyo," your application executes that API call, and the result gets fed back to the model to generate the final response.

The key insight? LLMs do not actually execute functions. They decide which tool to use, generate the right parameters in JSON format, and leave the execution to your application.

This separation matters for security and control. Your code handles authentication, validation, rate limits, and error handling. The model just tells you what it wants to do.

Function calling enables several agent capabilities that pure text generation cannot provide:

Real-time data access. Models can fetch current stock prices, weather conditions, or database records instead of relying on potentially outdated training data.

Action execution. Agents can create calendar events, send messages, update CRM records, or trigger any API-accessible action.

Computation. Complex calculations, code execution, and data transformations become possible through dedicated tools.

System integration. Agents can connect to your existing business systems, databases, and third-party services.

Without function calling, LLMs would remain brilliant but isolated. With it, they become genuinely useful for automating real workflows.

How Does Tool Calling Work?

The function calling process follows a predictable pattern across all major LLM providers. Understanding this flow helps you build more reliable agent systems.

Step 1: Define your tools

Before sending any request, you define which tools the model can access. Each tool definition includes a name, description, and JSON schema specifying expected parameters.

Here is what a basic tool definition looks like:

{"name": "get_weather", "description": "Get the current weather for a specified location", "parameters": {"type": "object", "properties": {"location": {"type": "string", "description": "City and state, e.g., San Francisco, CA"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}}, "required": ["location"]}}

This schema tells the model exactly what the tool does and how to structure its request.

Step 2: Send the prompt with tool definitions

Your API request includes both the user message and the available tools. The model analyzes the query and decides whether any tools would help generate a better response.

Step 3: Model returns tool call (if needed)

If the model determines a tool is necessary, instead of returning a text response, it returns a structured function call object:

{"name": "get_weather", "arguments": {"location": "San Francisco, CA", "unit": "fahrenheit"}}

The model does not execute anything. It just tells you which function to call with which parameters.

Step 4: Execute the tool in your application

Your code receives this response, validates the arguments, calls the actual weather API, and captures the result. This is where authentication, error handling, and security checks happen.

Step 5: Return results to the model

You send the tool output back to the model as a tool result message. The model uses this real data to generate its final response to the user.

Step 6: Model generates final response

With actual weather data in hand, the model can now provide an accurate, grounded answer: "It is currently 68°F and sunny in San Francisco."

This multi-step dance might seem complex, but it is what makes tool calling secure and reliable. The model proposes actions; your application decides whether to execute them.

The think act observe pattern commonly used in AI agents follows this same logic: reason about what to do, take action, observe results, and continue.

Function Calling Across Major LLM Providers

Each major LLM provider implements function calling slightly differently, though the core concepts remain consistent.

OpenAI Approach

OpenAI pioneered widespread function calling adoption with their Chat Completions API. Their latest Responses API makes tool orchestration even smoother by handling multi-step workflows automatically.

Key features include:

Strict mode. Setting strict: true guarantees outputs match your JSON schema exactly. No more parsing errors from unexpected formats.

Parallel tool calls. Models like GPT-4 and newer can request multiple tool calls in a single turn, letting you execute them simultaneously.

Built-in tools. OpenAI offers managed tools for web search, code execution, and file operations that run on their servers without requiring your implementation.

The Responses API represents OpenAI agent-native approach. Rather than managing conversation state manually, it handles the back-and-forth automatically.

Claude and Anthropic

Anthropic Claude models support robust tool calling through their Messages API. You define tools with input schemas, and Claude returns tool_use content blocks when it wants to invoke them.

Claude tool calling strengths include:

Programmatic tool calling. A newer capability where Claude writes code to orchestrate multiple tools, processes their outputs, and controls what information enters its context. This reduces token usage dramatically for complex workflows.

Tool search. For agents with hundreds of tools, you can mark definitions with defer_loading: true. Claude then searches for relevant tools rather than loading everything upfront.

Fine-grained streaming. Stream tool parameters as they generate, reducing perceived latency for tools with large inputs.

The MCP protocol for integrations that Anthropic released provides a standardized way to connect Claude to external tools and data sources.

Google Gemini

Gemini function calling follows similar patterns with some distinctive features:

Tool configuration modes. You can set AUTO (model decides), ANY (forces tool use), or NONE (disables tools) to control behavior precisely.

MCP integration. Gemini SDKs have built-in support for Model Context Protocol, automatically handling tool execution for MCP-connected tools.

Streaming arguments. For Gemini 3 Pro and later, you can stream function call arguments as they generate, useful for tools that can begin processing before receiving all parameters.

All three providers are converging on similar patterns: JSON schema definitions, structured outputs, and automatic tool execution loops. The differences are mostly in API ergonomics and advanced features.

Types of Tools and When to Use Them

Not all tools are created equal. Understanding the different categories helps you design better agent systems.

Client Tools vs Server Tools

Client tools (or custom tools) are functions you define and execute yourself. When the model calls get_customer_data, your code handles the database query, authentication, and response formatting. You have complete control.

Server tools (or built-in tools) run on the provider infrastructure. OpenAI web search, code interpreter, and file search are examples. You include them in your request, and the provider executes them automatically without additional implementation on your part.

Server tools reduce development time but limit customization. Client tools require more work but offer complete flexibility.

Read-Only vs Action Tools

Read-only tools fetch information without changing anything. Search tools, data lookups, and calculations fall into this category. They are generally safe to run automatically or in parallel.

Action tools modify state. Sending emails, creating records, making purchases, or deleting files are actions with real consequences. These typically require more careful handling, including user confirmation for critical operations.

This distinction matters for browser automation agents and computer control agents where a single wrong action could cause serious problems.

Built-in Tool Examples

Modern LLM providers offer several commonly needed tools out of the box:

Web search. Retrieves current information from the internet, grounding responses in up-to-date data.

Code execution. Runs Python or JavaScript in a sandboxed environment for calculations, data analysis, and file manipulation.

File search. Queries documents and knowledge bases using vector similarity.

Text editor and bash. For coding agent capabilities, tools that read and write files or execute shell commands enable full development workflows.

You can combine built-in tools with custom functions to create powerful hybrid systems.

Writing Effective Tool Definitions

The quality of your tool definitions directly impacts how well the model uses them. Vague or incomplete definitions lead to incorrect parameters, wrong tool selection, or hallucinated function calls.

Name Your Tools Clearly

Function names should describe exactly what the tool does using verb_noun patterns:

Good: get_customer_orders, send_email_notification, search_knowledge_base

Bad: process_data, helper_function, api_call

Clear names help the model select the right tool without ambiguity.

Write Detailed Descriptions

The description is your primary way to guide the model tool selection. Include:

What the tool does. Explain the core functionality in plain language.

When to use it. Specify the scenarios where this tool applies.

What it returns. Describe the output format so the model knows what to expect.

For example:

"description": "Retrieves order history for a customer. Use this when users ask about their past purchases, order status, or transaction history. Returns a list of order objects containing order_id, date, total, and status fields."

This gives the model everything it needs to use the tool correctly.

Define Parameters Precisely

Use JSON Schema to constrain parameters tightly:

Specify types. String, number, boolean, array, or object. Do not leave types ambiguous.

Set enums for limited choices. If a parameter only accepts certain values, enumerate them.

Include descriptions for each property. Explain what each parameter means and how to format it.

Mark required fields. Distinguish mandatory parameters from optional ones.

"parameters": {"type": "object", "properties": {"customer_id": {"type": "string", "description": "Unique customer identifier, format: CUST-XXXXX"}, "date_range": {"type": "string", "enum": ["last_7_days", "last_30_days", "last_year", "all_time"]}, "include_canceled": {"type": "boolean", "description": "Whether to include canceled orders in results"}}, "required": ["customer_id"]}

The more specific your schema, the fewer errors you will encounter.

Use Strict Mode

Both OpenAI and Anthropic support strict schema enforcement. When enabled, the model output is guaranteed to match your schema exactly. This eliminates entire categories of parsing bugs.

For OpenAI, set strict: true in your function definition. For Anthropic, structured output and JSON features provide similar guarantees.

Production systems should always use strict mode.

Understanding JSON Schema for Tools

JSON Schema is the standard format for defining tool parameters. If you are not familiar with it, here is what you need to know.

JSON Schema describes the structure, types, and constraints of JSON data. Think of it as a blueprint that specifies what valid input looks like.

Core Type Definitions

String: Text values. You can add patterns, min/max length, and formats.

"email": {"type": "string", "format": "email", "description": "Customer email address"}

Number/Integer: Numeric values with optional min/max constraints.

"quantity": {"type": "integer", "minimum": 1, "maximum": 100}

Boolean: True/false values.

"urgent": {"type": "boolean", "description": "Mark as high priority"}

Array: Lists of items with specified item types.

"tags": {"type": "array", "items": {"type": "string"}, "maxItems": 10}

Object: Nested structures with their own properties.

"address": {"type": "object", "properties": {"street": {"type": "string"}, "city": {"type": "string"}, "zip": {"type": "string"}}}

Best Practices for Schemas

Keep structures flat when possible. Deeply nested objects are harder for models to reason about correctly.

Use enums liberally. When there is a limited set of valid values, enumerate them explicitly.

Document everything. Every property should have a description explaining its purpose and format.

Set reasonable limits. Max lengths, value ranges, and array sizes prevent runaway inputs.

For detailed guidance on working with LLM outputs, check out API parameters for LLMs.

Parallel Tool Calling: Making Agents Faster

Sequential tool calling works fine for simple tasks. But when an agent needs to gather information from multiple sources, waiting for each call to complete creates unnecessary delays.

Parallel tool calling lets models request multiple independent operations in a single turn. Your application executes them concurrently, dramatically reducing total latency.

When to Use Parallel Calls

Parallel execution works well for:

Independent data gathering. Fetching customer profile, order history, and support tickets simultaneously.

Multi-source research. Searching different databases or APIs for related information.

Batch operations. Applying the same operation across multiple items.

It does not work well for:

Dependent operations. When tool B needs the result from tool A.

Write operations with ordering requirements. Transactions that must happen sequentially.

Rate-limited APIs. Where concurrent calls would exceed quotas.

Implementation Patterns

When the model returns multiple tool calls in one response, execute them concurrently:

import asyncio

async def execute_tools(tool_calls):
    tasks = [execute_single_tool(call) for call in tool_calls]
    results = await asyncio.gather(*tasks)
    return results

Return all results together, tagged with their corresponding tool call IDs so the model can correlate responses.

Managing Parallel Complexity

Parallel execution adds complexity:

Error handling. What happens when one tool fails but others succeed?

Rate limiting. You may need concurrency caps to protect downstream services.

Result aggregation. Combining results deterministically while preserving traceability.

Build these concerns into your orchestration layer from the start.

Common Challenges and How to Solve Them

Tool calling is not always smooth. Here are frequent issues and practical solutions.

Tool Selection Errors

Problem: The model picks the wrong tool for the task.

Solution: Improve descriptions. Make tool purposes explicit and non-overlapping. If two tools could apply, add guidance in the description about which to prefer.

Invalid Parameters

Problem: The model generates parameters that do not match your schema.

Solution: Enable strict mode. Add examples to parameter descriptions. Consider providing input examples in your tool definition.

Too Many Tools

Problem: With dozens of tools available, accuracy degrades and token usage explodes.

Solution: Use tool search features where available. Group tools by category. Consider using a meta-tool that routes to specialized sub-tools based on the request.

OpenAI and Anthropic both found that tool selection accuracy drops noticeably beyond 50 tools. Anthropic tool search reduced token usage by 85% in some tests while improving accuracy.

Hallucinated Tool Calls

Problem: The model invents tools that do not exist or calls tools with fabricated parameters.

Solution: Validate everything. Check that requested tool names exist. Validate all parameters before execution. Return clear error messages when validation fails.

Latency Issues

Problem: Multi-step tool workflows feel slow.

Solution: Parallelize independent calls. Cache frequently accessed, non-volatile data. Stream partial results where possible. Use smaller, faster models for routing decisions.

Security Concerns

Problem: Untrusted input could manipulate tool behavior.

Solution: Treat all model-generated parameters as untrusted. Validate and sanitize inputs. Use least-privilege access for tools. Implement approval workflows for high-risk actions.

These challenges apply across all agent types, including web scraping solutions where external data introduces additional unpredictability.

Real-World Use Cases for Function Calling

Function calling powers diverse applications across industries.

Customer Support Automation

Agents can look up order status, check account information, initiate refunds, and create support tickets. A single conversation might involve multiple tool calls to different backend systems.

Research and Analysis

Research agents use web search tools to gather current information, then synthesize findings into comprehensive reports. Multi-agent systems distribute work across specialists, each with their own tool access.

Data Processing Pipelines

Code execution tools enable agents to transform datasets, generate visualizations, and produce formatted outputs. Combined with file tools, they can process uploaded documents and return results.

Workflow Automation

Agents integrated with business systems can create calendar events, send notifications, update CRMs, and trigger downstream processes. This is the "agentic AI" vision becoming reality.

Development Assistance

Coding agents use text editors, bash tools, and debuggers to write, test, and fix code. The most advanced can complete entire features with minimal human intervention.

The common thread: tools transform LLMs from impressive demos into production-ready systems that handle real work.

Getting Started with Tool Use

Ready to implement function calling in your own agents? Here is a practical path forward.

Start simple. Build one tool that does something useful. Get the end-to-end flow working before adding complexity.

Define clear schemas. Invest time in good tool definitions. They are the foundation everything else builds on.

Enable strict mode. Eliminate parsing errors from day one.

Add observability. Log every tool call with inputs, outputs, latency, and errors. You will need this for debugging.

Iterate on descriptions. If the model misuses a tool, refine its description rather than adding workarounds.

Consider security. Think about what happens if the model generates malicious parameters. Validate everything.

The agent architecture components you choose will depend on your specific use case, but tool calling is the common foundation.

Want to explore more AI agent tools for your workflows? Browse our AI tools directory to discover specialized solutions for every use case.

Conclusion

Tool use transformed AI agents from clever text generators into systems that can actually accomplish tasks. Function calling provides the bridge between LLM reasoning and real-world action.

The mechanics are straightforward: define tools with JSON schemas, let the model decide when to use them, execute the calls in your application, and return results for final response generation.

But the impact is profound. With proper tool integration, agents can access current data, perform calculations, interact with APIs, and automate complex workflows that would be impossible with pure text generation.

Whether you are building customer support bots, research assistants, or workflow automation systems, mastering function calling is essential. The technology is mature, well-documented across all major providers, and ready for production use.

Start with a single useful tool. Get the fundamentals right. Then expand from there.

Frequently Asked Questions

What is the difference between tool use and function calling?

Tool use and function calling are essentially the same concept with different names. Both refer to the process where an LLM generates structured requests for external capabilities rather than executing them directly. Different providers use different terminology: OpenAI calls it "function calling," while Anthropic often uses "tool use." The underlying mechanism is identical.

How many tools can an AI agent use effectively?

Most LLMs perform well with up to 20-50 tools. Beyond that, tool selection accuracy tends to degrade and token usage increases significantly. For agents requiring hundreds of tools, use tool search features (available in Claude and other providers) that load only relevant tools based on the query context.

Do LLMs actually execute the functions they call?

No. LLMs never execute functions directly. They generate structured JSON requests specifying which tool to use and what parameters to pass. Your application code is responsible for validating these requests, executing the actual function calls, and returning results to the model. This separation is intentional for security and control.

What is parallel tool calling and when should I use it?

Parallel tool calling allows an LLM to request multiple independent tool executions in a single turn. Your application then runs these concurrently rather than sequentially. Use parallel calling when gathering independent data (like fetching multiple API results) to reduce latency. Avoid it for dependent operations where one result feeds into another.

How do I prevent hallucinated tool calls?

Validate everything the model generates. Check that requested tool names exist in your registered set, validate all parameters against your JSON schema before execution, and return clear error messages when validation fails. Enable strict mode where available to guarantee schema compliance. Never trust model-generated parameters without validation.
Stackviv Team

Stackviv Team

Author

Stackviv Team is our editorial crew of AI enthusiasts and tech researchers dedicated to helping you discover the best AI tools. We test, compare, and review AI software across every category to bring you honest insights and practical guides. Our mission: make AI accessible and useful for everyone - from beginners to professionals.

Related Articles

View All

What is Agentic AI? Beyond Simple Chatbots

AI Agents

What is Agentic AI? Beyond Simple Chatbots

Agentic AI represents a fundamental shift from passive AI systems that wait for your commands to autonomous agents that set goals, plan multi-step tasks, and act independently. Unlike traditional chatbots, agentic AI systems perceive their environment, reason about complex problems, and take purposeful action with minimal supervision.

SStackviv Team
1 min
Read: What is Agentic AI? Beyond Simple Chatbots

Agentic AI & Multi-Agent Systems: Advanced Guide

AI Agents

Agentic AI & Multi-Agent Systems: Advanced Guide

Multi-agent systems represent the next evolution in enterprise AI, where specialized agents work together to handle complex workflows. This advanced guide covers everything you need to understand agentic AI, from foundational concepts to production deployment with leading frameworks.

SStackviv Team
1 min
Read: Agentic AI & Multi-Agent Systems: Advanced Guide

AI Agent Memory: Short-term vs Long-term

AI Agents

AI Agent Memory: Short-term vs Long-term

Learn how agent memory works in AI systems. This guide covers short-term vs long-term memory types, persistent storage approaches, episodic, semantic, and procedural memory, plus the leading tools and frameworks for building agents that actually remember.

SStackviv Team
1 min
Read: AI Agent Memory: Short-term vs Long-term