Tutorials10 min read

Vercel AI SDK Tool Calling: A Real-World Tutorial

By Ergini, Software & AI Developer in Pristina, Kosovo

TL;DR

Tool calling looks simple until you ship it. This is the production walkthrough using the Vercel AI SDK: Zod schemas, parallel calls, streaming UI updates, retries, and the testing pattern that catches regressions.

Most tool-calling tutorials are toys. You get a weather function, a single model call, a console.log, and a smug screenshot. None of that survives contact with a real product. The tools shadow each other, the model loops, the streaming breaks, the parallel calls race a shared database, and the first error in production is a 3 AM wakeup. This tutorial is the version I would give a junior engineer on day one - a complete, runnable walkthrough of Vercel AI SDK tool calling, from the first tool() definition to a three-tool agent with evals and observability.

Why most tool-calling tutorials are toys

Toy tutorials skip everything that bites in production. They define a single tool with three string parameters, run one generateText call, log the result, and stop. The model is well-behaved because it only has one option. The tool always succeeds because it returns a hard-coded value. There is no streaming, no UI, no error path, no retries, no step limit, no observability, and no test. None of those things are optional once you have users.

Real tool-calling work spends 20% of its time on the tool definitions and 80% on the loop: how many steps, what happens when a tool errors, how to stream partial UI, when to escalate to a human, when to stop the agent and ask for clarification. Get the loop right and the model feels magical. Get it wrong and you have a chatbot that loops three times, calls the wrong tool, and bills you $4 to tell the user it could not help. This post is structured around the loop, not the tool, because that is where the leverage is.

The Vercel AI SDK is the right substrate for this in 2026. It abstracts the wire protocol differences between OpenAI and Anthropic, gives you Zod-typed tools that compile both the JSON Schema and the TypeScript types, and ships a useChat hook that handles streaming UI updates with almost no boilerplate. The docs at ai-sdk.dev are good - this tutorial fills the gap between "hello world" and "works in production."

Setup

You need Node 20+, a Next.js 16 app (or any TypeScript project with Node runtime support), and an API key for at least one model provider. Install the SDK and one provider - pick OpenAI or Anthropic; both work identically for everything in this post.

npm install ai@^6 zod
npm install @ai-sdk/openai      # or @ai-sdk/anthropic
# Optional helpers used later
npm install -D vitest @ai-sdk/provider-utils

Set OPENAI_API_KEY (or ANTHROPIC_API_KEY) in your .env.local. If you are deploying to Vercel, add the same key to the project environment variables - the SDK reads the env directly, no client init required for the default provider settings.

Zod is not optional. The SDK uses it to derive both the JSON Schema the model sees and the TypeScript types your execute function receives. The docs at zod.dev cover the schema primitives - for tool calling you mostly need z.string(), z.number(), z.enum(), z.array(), and .describe(). The descriptions matter: the model reads them as parameter docs.

Hello-world tool

Smallest useful tool. Twenty-five lines, one server file, no UI. This is the shape every tool in your app will take - a Zod input schema, an async execute function, and a description that tells the model when to call it.

// src/tools/weather.ts
import { tool } from "ai";
import { z } from "zod";

export const getWeather = tool({
  description:
    "Get the current weather in a named city. Use this whenever the user asks about weather, temperature, rain, snow, or what to wear.",
  inputSchema: z.object({
    city: z.string().describe("City name, e.g. 'Pristina' or 'Berlin'."),
    units: z
      .enum(["metric", "imperial"])
      .default("metric")
      .describe("Temperature units. Default metric."),
  }),
  execute: async ({ city, units }) => {
    const res = await fetch(
      `https://api.example.com/weather?city=${encodeURIComponent(city)}&units=${units}`
    );
    if (!res.ok) {
      return { ok: false as const, error: `Weather API returned ${res.status}` };
    }
    const data = (await res.json()) as { tempC: number; conditions: string };
    return { ok: true as const, data };
  },
});

Three things to notice. The description tells the model when to call the tool, not just what it does - that is the single biggest lever on tool-calling reliability. The input schema uses .describe() on every parameter because those descriptions flow into the JSON Schema the model sees. And the execute function returns a discriminated union ( { ok: true, data } or { ok: false, error }) instead of throwing. That gives the model a structured signal it can reason about, which we will lean on in the error-handling section.

Calling tools from generateText

The simplest way to wire a tool into a model call is generateText. You pass the model, the user prompt, your tool map, and a step limit. The SDK runs the model, executes any tool calls it emits, feeds the results back to the model, and loops until the model produces text or hits the step limit.

// src/server/ask.ts
import { generateText, stepCountIs } from "ai";
import { openai } from "@ai-sdk/openai";
import { getWeather } from "../tools/weather";

export async function ask(userMessage: string) {
  const result = await generateText({
    model: openai("gpt-5"),
    tools: { getWeather },
    stopWhen: stepCountIs(5),
    prompt: userMessage,
  });

  return {
    text: result.text,
    steps: result.steps.length,
    toolCalls: result.steps.flatMap((s) => s.toolCalls),
    usage: result.usage,
  };
}

The shape of result is the useful part. text is the final assistant message. steps is an array of every round-trip the SDK made: each step contains the tool calls the model emitted, the tool results the SDK got back, and the model's intermediate reasoning. usage is the cumulative token count across every step - important for cost tracking, since a tool-calling loop can easily cost 5x a single-shot completion.

Always pass stopWhen. Without a stop condition, a misbehaving model that keeps calling tools will loop until the SDK's internal safety limit (currently 10) kicks in. Five is a reasonable default for most assistants; raise it for research-style agents that actually need long chains.

Streaming tool calls with streamText

generateText is fine for background jobs but useless for a chat UI. For anything user-facing, use streamText on the server and useChat on the client. The SDK speaks a custom UI message protocol over the wire that handles partial text, tool calls, tool results, and arbitrary data parts in a single stream.

Server route in Next.js App Router:

// app/api/chat/route.ts
import { streamText, stepCountIs, convertToModelMessages } from "ai";
import { openai } from "@ai-sdk/openai";
import { getWeather } from "@/lib/tools/weather";

export const maxDuration = 60;

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: openai("gpt-5"),
    messages: convertToModelMessages(messages),
    tools: { getWeather },
    stopWhen: stepCountIs(5),
  });

  return result.toUIMessageStreamResponse();
}

Client component:

// app/chat/page.tsx
"use client";
import { useChat } from "@ai-sdk/react";

export default function Chat() {
  const { messages, sendMessage, status } = useChat();

  return (
    <div className="space-y-4">
      {messages.map((m) => (
        <div key={m.id}>
          <strong>{m.role}:</strong>
          {m.parts.map((part, i) => {
            if (part.type === "text") return <span key={i}>{part.text}</span>;
            if (part.type === "tool-getWeather") {
              return part.state === "output-available" ? (
                <pre key={i}>{JSON.stringify(part.output, null, 2)}</pre>
              ) : (
                <em key={i}>calling weather...</em>
              );
            }
            return null;
          })}
        </div>
      ))}
      <form
        onSubmit={(e) => {
          e.preventDefault();
          const input = (e.currentTarget.elements.namedItem(
            "msg"
          ) as HTMLInputElement).value;
          sendMessage({ text: input });
        }}
      >
        <input name="msg" disabled={status !== "ready"} />
      </form>
    </div>
  );
}

The interesting line is the rendering loop over m.parts. The AI SDK 5+ UI protocol gives you typed parts per tool - a part of type tool-getWeather is statically known to have the input and output shape of your getWeather tool. That is what lets the chat UI render "calling weather..." while the tool is in flight and the actual result the moment it returns. The pattern generalizes to every tool in the app. For more on streaming patterns, see the streaming OpenAI responses in Next.js tutorial.

Parallel tool calls

Modern models emit parallel tool calls when the user request is independent across tools. "What is the weather in Berlin and Pristina?" will trigger two simultaneous getWeather calls. The SDK runs the executes in parallel under the hood and bundles both results into the follow-up model call - you get a 2x speedup for free.

The race condition to watch for: tools that touch shared mutable state. If you have a createOrder tool and the model decides to call it twice in parallel on an ambiguous user request, you have two orders. Two defenses:

  • Idempotency keys in the tool input. Make the model pass an explicit idempotencyKey: z.string().uuid() on every mutating tool and dedupe on the server side. The model will generate the same key for both parallel calls if you tell it to in the description.
  • Disable parallel calls for mutators. OpenAI lets you set parallelToolCalls: false at the call level. Anthropic does not expose a knob; instead, mark tool descriptions as "serial - do not call in parallel with itself." The model respects it most of the time but not always, so combine with server-side dedup.
const result = await generateText({
  model: openai("gpt-5"),
  tools: { getWeather, createOrder },
  providerOptions: {
    openai: { parallelToolCalls: false },
  },
  stopWhen: stepCountIs(8),
  prompt: userMessage,
});

For read-only tools, leave parallelism on. It is one of the biggest latency wins the SDK gives you and the failure mode is benign.

Multi-step tool sequences

The step limit you pass to stopWhen controls how many round-trips the SDK will make. Each step is one model call plus any tool executions that came out of it. A research agent might need 10 to 15 steps; a simple assistant should cap at 5. The loop pattern is the same in either case.

// src/agents/research.ts
import { generateText, stepCountIs, hasToolCall } from "ai";
import { openai } from "@ai-sdk/openai";
import { searchDocs } from "../tools/search";
import { fetchUrl } from "../tools/fetch";
import { summarize } from "../tools/summarize";

export async function research(question: string) {
  const result = await generateText({
    model: openai("gpt-5"),
    system:
      "You are a research assistant. Use searchDocs to find relevant sources, fetchUrl to read them in full, and summarize to produce the final answer. Always finish with summarize.",
    tools: { searchDocs, fetchUrl, summarize },
    stopWhen: [stepCountIs(12), hasToolCall("summarize")],
    prompt: question,
  });

  return {
    answer: result.text,
    sources: result.steps
      .flatMap((s) => s.toolResults)
      .filter((r) => r.toolName === "searchDocs"),
  };
}

Two stop conditions composed with an array: stop at 12 steps, OR stop the moment the model calls summarize. The combined condition is "whichever comes first." That is the loop pattern for any agent that has a terminal tool - a tool whose call means "I am done." Without it, the model will keep researching after it has the answer. With it, the agent exits as soon as it commits to a summary. For deeper coverage of the loop patterns themselves, see the AI agent design patterns post.

Streaming UI updates from tools

The basic streaming pattern shows tool inputs and outputs. The production pattern streams arbitrary state from inside a long-running tool - a progress bar, a partial table of intermediate results, a list of sources as they are discovered. The SDK exposes a writer on the streaming context that you can pass into the tool execute function to push data parts to the client.

// app/api/chat/route.ts
import { streamText, stepCountIs, createUIMessageStream } from "ai";
import { openai } from "@ai-sdk/openai";
import { researchTool } from "@/lib/tools/research";

export async function POST(req: Request) {
  const { messages } = await req.json();

  const stream = createUIMessageStream({
    execute: async ({ writer }) => {
      const result = streamText({
        model: openai("gpt-5"),
        messages,
        tools: {
          research: researchTool({ writer }),
        },
        stopWhen: stepCountIs(8),
      });
      writer.merge(result.toUIMessageStream());
    },
  });

  return stream.toUIMessageStreamResponse();
}

Inside the tool, the writer lets you push custom data parts as the tool runs:

// src/tools/research.ts
import { tool } from "ai";
import { z } from "zod";

export const researchTool = ({ writer }: { writer: any }) =>
  tool({
    description: "Multi-source research over the knowledge base.",
    inputSchema: z.object({ query: z.string() }),
    execute: async ({ query }) => {
      writer.write({ type: "data-progress", data: { status: "searching" } });
      const sources = await search(query);
      writer.write({ type: "data-progress", data: { status: "fetching", count: sources.length } });
      const docs = await Promise.all(sources.map(fetch));
      writer.write({ type: "data-progress", data: { status: "synthesizing" } });
      return { docs };
    },
  });

On the client, those data parts show up in m.parts with type data-progress. You render them with their own component - a spinner, a counter, whatever fits the UX. The user sees the tool thinking out loud instead of staring at a blank loading state.

Error handling in tools

Two failure modes, two strategies. Bugs in your tool code (a null reference, a misspelled field, a broken SQL query) should throw and surface as errors. The SDK catches the throw, marks the tool result as an error, and either propagates to the caller or lets the model try to recover, depending on how you configure it. Expected failures the model can reason about (rate limit, no results, permission denied) should be returned as structured objects, not thrown.

// src/tools/search.ts
export const searchDocs = tool({
  description: "Search the knowledge base.",
  inputSchema: z.object({ query: z.string(), limit: z.number().int().max(20).default(5) }),
  execute: async ({ query, limit }) => {
    try {
      const results = await db.search(query, limit);
      if (results.length === 0) {
        return {
          ok: false as const,
          error: "no_results",
          message: `No documents matched "${query}". Try a broader query.`,
        };
      }
      return { ok: true as const, data: results };
    } catch (err) {
      if (isRateLimit(err)) {
        return {
          ok: false as const,
          error: "rate_limited",
          message: "Rate limit hit. Retry in 30 seconds.",
          retryAfterMs: 30000,
        };
      }
      throw err; // Real bug - let it propagate.
    }
  },
});

The pattern is: discriminated union returns for known errors, throws for unknown errors. The model reads the structured error and decides what to do - usually it apologizes to the user, picks a different tool, or asks for clarification. Throws bubble up to the SDK and either crash the request (with onError: "throw") or surface as tool-error parts the model can also see. I default to the latter in user-facing agents because a polite recovery beats a 500. For the broader checklist on robust tool design, see the tool calling best practices guide.

Testing tool implementations

Two layers of testing matter. First, unit test your execute functions directly - they are just async functions with typed inputs and outputs. Second, test the loop behavior using a mock language model so you can assert that a given user message produces the expected sequence of tool calls without spending a cent on real tokens.

Vitest pattern with a mock model:

// test/research.test.ts
import { describe, it, expect, vi } from "vitest";
import { MockLanguageModelV2 } from "ai/test";
import { generateText, stepCountIs } from "ai";
import { searchDocs } from "../src/tools/search";

describe("research agent", () => {
  it("calls searchDocs then returns a summary", async () => {
    const executeSpy = vi.spyOn(searchDocs, "execute");

    const model = new MockLanguageModelV2({
      doStream: async ({ prompt }) => {
        const turn = prompt.length;
        if (turn === 1) {
          return {
            stream: simulateToolCall("searchDocs", { query: "RAG" }),
          };
        }
        return { stream: simulateText("Top result: ...") };
      },
    });

    const result = await generateText({
      model,
      tools: { searchDocs },
      stopWhen: stepCountIs(5),
      prompt: "Tell me about RAG.",
    });

    expect(executeSpy).toHaveBeenCalledWith(
      expect.objectContaining({ query: "RAG" }),
      expect.anything()
    );
    expect(result.text).toContain("Top result");
    expect(result.steps).toHaveLength(2);
  });
});

The point is not to test the model - that is the provider's job. The point is to test your loop: given a model that says "call searchDocs," does your code call searchDocs with the right arguments, pass the result back correctly, and terminate when the model produces text? Those are the contract tests that catch regressions in the SDK upgrade, in your tool schema, and in your prompt - without burning real API budget. vitest.dev covers the spying, mocking, and snapshot helpers I lean on.

Real-world example: a research and book agent

Putting it together - a three-tool agent that searches venues, checks availability, and books a reservation. Around 50 lines, every pattern from above in one place.

// src/agents/booking.ts
import { generateText, stepCountIs, hasToolCall, tool } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { z } from "zod";

const searchVenues = tool({
  description: "Search venues by city and cuisine. Returns up to 10 matches.",
  inputSchema: z.object({
    city: z.string(),
    cuisine: z.string().optional(),
    partySize: z.number().int().min(1).max(20),
  }),
  execute: async ({ city, cuisine, partySize }) => {
    const venues = await db.venues.find({ city, cuisine, capacity: { gte: partySize } });
    return { ok: true as const, data: venues.slice(0, 10) };
  },
});

const checkAvailability = tool({
  description:
    "Check whether a venue can seat a party at a given date and time. Call this for each candidate venue before attempting to book.",
  inputSchema: z.object({
    venueId: z.string(),
    isoDateTime: z.string().describe("ISO 8601 timestamp in the venue's local timezone."),
    partySize: z.number().int().min(1).max(20),
  }),
  execute: async ({ venueId, isoDateTime, partySize }) => {
    const slot = await db.availability.find({ venueId, time: isoDateTime });
    if (!slot || slot.remaining < partySize) {
      return { ok: false as const, error: "no_availability" };
    }
    return { ok: true as const, data: { holdToken: slot.holdToken } };
  },
});

const bookReservation = tool({
  description:
    "Confirm a reservation. Only call this after checkAvailability returned a holdToken for the chosen venue. This is the terminal tool - once it succeeds, the conversation is done.",
  inputSchema: z.object({
    holdToken: z.string(),
    guestName: z.string(),
    guestEmail: z.string().email(),
    idempotencyKey: z.string().uuid().describe("Stable UUID for this booking intent."),
  }),
  execute: async ({ holdToken, guestName, guestEmail, idempotencyKey }) => {
    const booking = await db.bookings.upsert({ idempotencyKey, holdToken, guestName, guestEmail });
    return { ok: true as const, data: { confirmation: booking.code } };
  },
});

export async function book(userRequest: string, guest: { name: string; email: string }) {
  return generateText({
    model: anthropic("claude-opus-4-7"),
    system: `You are a booking concierge. Guest: ${guest.name} <${guest.email}>. Always check availability before booking. Stop after bookReservation succeeds.`,
    tools: { searchVenues, checkAvailability, bookReservation },
    stopWhen: [stepCountIs(10), hasToolCall("bookReservation")],
    prompt: userRequest,
  });
}

Every pattern from this post is in there. Discriminated-union returns for expected failure modes. A terminal tool (bookReservation) wired into a composite stop condition. An idempotency key on the mutating tool so parallel calls cannot double-book. Descriptions that tell the model the call order. A system prompt that names the user so the model does not invent details. Run this against a real model and the agent searches, checks two or three candidate venues in parallel, commits to one, and exits. Run it against the mock model in a test and you assert the exact sequence on every CI run.

Production observability

Tool-calling agents fail in ways single-shot completions do not. A prompt regression might shift the model from calling the right tool 70% of the time to 40% - and you will never notice in code review. Two observability options that wire into the AI SDK in 2026 with almost zero code: Helicone and Langfuse.

Helicone is a one-line change - set baseURL on your provider to the Helicone proxy and every call gets logged, traced, and cost-tracked. Langfuse is heavier but gives you per-step breakdowns, custom evals, and prompt versioning. I cover both, plus LangSmith, in the LLM observability comparison.

// src/providers/observed.ts
import { createOpenAI } from "@ai-sdk/openai";

export const openai = createOpenAI({
  baseURL: "https://oai.helicone.ai/v1",
  headers: {
    "Helicone-Auth": `Bearer ${process.env.HELICONE_API_KEY}`,
    "Helicone-Property-Environment": process.env.NODE_ENV ?? "development",
  },
});

That one change gives you a dashboard with every request, every tool call, the cost per conversation, and a searchable history of every prompt that hit the model. The first time you ship a tool-calling agent without observability and have to debug a user complaint, you will install it within an hour. Save yourself the hour.

For the agent-side companion to this client tutorial - how to expose the same tools as a portable MCP server so Claude Code, Cursor, and Claude Desktop can call them too - see the MCP server tutorial. For the structured output side, where the model returns a typed JSON object instead of calling a tool, see the OpenAI structured outputs guide. And for the retrieval architecture that wraps around tool calling in research-style agents, see the agentic RAG post.

If you want a senior engineer who has shipped tool-calling agents in production - typed tools, parallel calls, streaming UI, evals, observability, the full stack - my AI agent development practice covers exactly this scope, and AI integration when the agent needs to wire into existing internal systems. I work with teams worldwide, and you can also hire an AI developer in Kosovo directly. Same person behind OmniAPI, the universal API gateway that ships its own tool-calling layer out of the box.

Frequently asked questions

Which Vercel AI SDK version should I use for tool calling in 2026?

Use AI SDK 6 or newer. The tool() helper, stepCountIs stop condition, typed onToolCall callback in useChat, and the streamlined UI message protocol all stabilized in the 5.x line and got cleaner in 6.x. If you are still on 3.x, the public API for tools is similar but the streaming protocol changed enough that the client hook code in this tutorial will not compile against the older versions. Upgrade before you copy any of this.

Do I have to use Zod, or can I pass JSON Schema directly?

You can pass either. Zod is the default because the SDK can derive both the JSON Schema it sends to the model and the TypeScript types for your handler from a single source. If you have an existing JSON Schema document (from an OpenAPI spec, a database schema, or a shared contract), pass it as inputSchema directly and the SDK will skip the conversion step. The cost of doing that is you lose the inferred TypeScript types in the execute function and have to type the input by hand.

What is the difference between maxSteps and stepCountIs?

maxSteps was the AI SDK 4.x option. stepCountIs is the AI SDK 5+ replacement and it is composable with other stop conditions. stepCountIs(8) means stop after eight steps total, where a step is one model call plus any parallel tool executions that came out of it. You can compose stop conditions with hasToolCall(`finish`) or your own predicate, which is more flexible than the old single-number cap. Always set one - without it, a tool-calling model that picks the wrong tool can loop until your bill catches fire.

How do I stream partial UI from inside a tool execution?

Three pieces. First, on the server, give your tool an async execute that returns the final value but writes intermediate state with the dataStream writer the SDK exposes when you stream a response. Second, on the client, use useChat and read the parts array on each message - parts include tool-input, tool-output, and any custom data parts you wrote. Third, render each part with its own component so a partial tool call shows a loading state while a finished one shows the result. The result is the model thinks step by step and the user sees the same step by step.

Can the model call tools in parallel with the Vercel AI SDK?

Yes, and it does so by default whenever the underlying model supports it. GPT-5, GPT-5-mini, Claude Opus 4.7, and Claude Sonnet 4.6 all emit parallel tool calls when the user request is independent across tools. The SDK runs the execute functions in parallel with Promise.all under the hood and bundles every result into a single follow-up model call. If you do not want parallel calls (for example because the tools share a database transaction), set toolChoice to required with one tool, or use a model parameter like parallel_tool_calls: false where the provider exposes it.

What happens when a tool throws?

The error surfaces as a tool-error part in the message stream. The model sees a tool result with an error flag and decides what to do - usually it tries to recover by either calling a different tool or asking the user for clarification. You can shape that behavior by returning a discriminated union from your execute function ({ ok: true, data } or { ok: false, error, retryable }) instead of throwing, which gives the model a structured signal. The pattern that ships in production is: throw for bugs you want to surface, return error objects for expected failure modes the model should reason about.

How do I test tool implementations without burning real model tokens?

Use the MockLanguageModelV2 helper from the @ai-sdk/provider-utils package (or write a tiny mock that conforms to the LanguageModelV2 interface). It lets you script the model response - including tool calls and tool results - so you can assert that given a user message, generateText invoked your tool with the expected arguments, and the loop terminated at the expected step. Pair that with calling your execute functions directly in unit tests, and you get full coverage with zero API spend. I show the Vitest pattern later in this post.

Do I need an agent framework like LangGraph on top of the AI SDK?

Probably not. For 80% of tool-calling work - a few tools, a step limit, maybe a routing layer - the AI SDK alone is enough. Reach for a framework when you need durable execution (resume a stuck workflow after a restart), explicit state machines (every step is a node, the model picks the edge), or multi-agent handoff with shared memory. For everything else, the extra dependency and bundle weight buy you nothing. I covered the broader pick in the LangChain vs Vercel AI SDK post.