Section 2 — Streaming, Tool Calling & Extended Thinking

Summary — what this page covers Three capabilities that turn a chat call into an agent: streaming responses to the client token-by-token, letting Claude call your C# functions (tool calling) to act on real data, and using extended thinking for harder reasoning. Attendees build a streaming endpoint and a multi-tool agent loop. Pair with Lab 2.

10:45 AM – 12:00 PM · 75 min — 40 min lecture/demo + 35 min lab

Learning objectives

  • Stream responses with Server-Sent Events (SSE) from ASP.NET Core
  • Define tools (functions) Claude can call, with typed inputs
  • Implement the agent loop: model → tool call → tool result → model → … → end turn
  • Wire tools to real BookTracker services/data
  • Apply extended thinking for complex reasoning and know its cost trade-offs

Content

Block 2A — Streaming chat with Server-Sent Events (≈25 min)

Why streaming matters. A non-streaming chat call sits silent for seconds, then dumps the whole answer. Streaming surfaces tokens as they're generated, so the UI feels responsive — the perceived latency drops even though total time is the same.

SSE in ASP.NET Core. The SDK's CreateStreaming returns an IAsyncEnumerable of stream events; you pick out the text deltas and write each one to the response as a data: frame:

using Anthropic.Models.Messages;

// In StreamingService: surface Claude's text deltas as IAsyncEnumerable<string>
public async IAsyncEnumerable<string> StreamAsync(string prompt,
    [EnumeratorCancellation] CancellationToken ct)
{
    var parameters = new MessageCreateParams {
        Model = Model.ClaudeSonnet4_6,
        MaxTokens = 1024,
        Messages = [new() { Role = Role.User, Content = prompt }],
    };
    await foreach (var ev in client.Messages.CreateStreaming(parameters, ct))
        if (ev.TryPickContentBlockDelta(out var delta) && delta.Delta.TryPickText(out var text))
            yield return text.Text;
}
// In the Minimal API endpoint: write SSE frames and flush per chunk
app.MapGet("/api/chat/stream", async (string q, StreamingService svc, HttpResponse res, CancellationToken ct) =>
{
    res.Headers.ContentType = "text/event-stream";
    await foreach (var chunk in svc.StreamAsync(q, ct))
    {
        await res.WriteAsync($"data: {chunk}\n\n", ct);
        await res.Body.FlushAsync(ct);   // flush so the client sees tokens immediately
    }
});

Consume it with curl -N or a browser EventSource so attendees watch tokens arrive live.

Block 2B — Tool calling & the agent loop (multi-step workflows) (≈30 min)

Define a tool — name, description, and a JSON input schema. The description is what makes the tool reliable: design it like a skill description (say when to use it, not just what it does).

using System.Text.Json;
using Anthropic.Models.Messages;

var findBook = new Tool {
    Name = "find_book",
    Description = "Look up a book in BookTracker by title. Use when the user names a specific book.",
    InputSchema = new() {
        Properties = new Dictionary<string, JsonElement> {
            ["title"] = JsonSerializer.SerializeToElement(new { type = "string", description = "Book title" }),
        },
        Required = ["title"],
    },
};

The agent loop — the reusable pattern for multi-step AI workflows: Claude requests a tool → you execute the C# method → you return a tool_result → Claude continues, possibly calling more tools, until StopReason is EndTurn. Implement it as a loop, not a single call. The SDK's BetaToolRunner drives this loop for you:

var runner = client.Beta.Messages.ToolRunner(betaParams);   // handles call → execute → feed-back → repeat
await foreach (var message in runner)
    foreach (var block in message.Content)
        if (block.TryPickText(out var text)) Console.WriteLine(text.Text);

Wire each tool to a real BookTracker service (find a book, update reading progress) so the agent acts on the actual database. Guard the loop: cap max iterations and handle per-tool-call errors (return a tool_result with is_error rather than throwing).

Block 2C — Extended thinking (≈20 min)

For genuinely hard reasoning, enable adaptive thinking — Claude decides how much to think, and the reasoning is returned as thinking blocks ahead of the answer:

Thinking = new ThinkingConfigAdaptive(),
// optional: OutputConfig = new OutputConfig { Effort = Effort.High },

It costs more tokens and adds latency, so reserve it for multi-step problems where the quality lift is worth it — not routine chat.

Tie-in to Day 1: this is the same host/tool idea as MCP — here you implement tools in-process; in Section 4 you expose them over MCP so any host (including Claude Code) can call them.

Demos referenced here

  • Live streaming endpoint (tokens arriving via curl -N) · A 2+ tool agent run visible in logs before the final answer. [Scripts in _instructor/.]

→ Continue to Lab 2.