# Section 2 — Streaming, Tool Calling & Extended Thinking

> **Summary — what this page covers**
> Three capabilities that turn a chat call into an agent: streaming responses to the client
> token-by-token, letting Claude call your C# functions (tool calling) to act on real data, and
> using extended thinking for harder reasoning. Attendees build a streaming endpoint and a
> multi-tool agent loop. Pair with **Lab 2**.

**10:45 AM – 12:00 PM · 75 min** — 40 min lecture/demo + 35 min lab

## Learning objectives

- Stream responses with Server-Sent Events (SSE) from ASP.NET Core
- Define **tools** (functions) Claude can call, with typed inputs
- Implement the **agent loop**: model → tool call → tool result → model → … → end turn
- Wire tools to real BookTracker services/data
- Apply **extended thinking** for complex reasoning and know its cost trade-offs

## Content

### Block 2A — Streaming chat with Server-Sent Events (≈25 min)

**Why streaming matters.** A non-streaming chat call sits silent for seconds, then dumps the whole
answer. Streaming surfaces tokens *as they're generated*, so the UI feels responsive — the perceived
latency drops even though total time is the same.

**SSE in ASP.NET Core.** The SDK's `CreateStreaming` returns an `IAsyncEnumerable` of stream events;
you pick out the text deltas and write each one to the response as a `data:` frame:

```csharp
using Anthropic.Models.Messages;

// In StreamingService: surface Claude's text deltas as IAsyncEnumerable<string>
public async IAsyncEnumerable<string> StreamAsync(string prompt,
    [EnumeratorCancellation] CancellationToken ct)
{
    var parameters = new MessageCreateParams {
        Model = Model.ClaudeSonnet4_6,
        MaxTokens = 1024,
        Messages = [new() { Role = Role.User, Content = prompt }],
    };
    await foreach (var ev in client.Messages.CreateStreaming(parameters, ct))
        if (ev.TryPickContentBlockDelta(out var delta) && delta.Delta.TryPickText(out var text))
            yield return text.Text;
}
```

```csharp
// In the Minimal API endpoint: write SSE frames and flush per chunk
app.MapGet("/api/chat/stream", async (string q, StreamingService svc, HttpResponse res, CancellationToken ct) =>
{
    res.Headers.ContentType = "text/event-stream";
    await foreach (var chunk in svc.StreamAsync(q, ct))
    {
        await res.WriteAsync($"data: {chunk}\n\n", ct);
        await res.Body.FlushAsync(ct);   // flush so the client sees tokens immediately
    }
});
```

Consume it with `curl -N` or a browser `EventSource` so attendees watch tokens arrive live.

### Block 2B — Tool calling & the agent loop (multi-step workflows) (≈30 min)

**Define a tool** — name, description, and a JSON input schema. **The description is what makes the
tool reliable**: design it like a skill description (say *when* to use it, not just what it does).

```csharp
using System.Text.Json;
using Anthropic.Models.Messages;

var findBook = new Tool {
    Name = "find_book",
    Description = "Look up a book in BookTracker by title. Use when the user names a specific book.",
    InputSchema = new() {
        Properties = new Dictionary<string, JsonElement> {
            ["title"] = JsonSerializer.SerializeToElement(new { type = "string", description = "Book title" }),
        },
        Required = ["title"],
    },
};
```

**The agent loop** — the reusable pattern for *multi-step AI workflows*: Claude requests a tool →
you execute the C# method → you return a `tool_result` → Claude continues, possibly calling more
tools, until `StopReason` is `EndTurn`. Implement it as a **loop**, not a single call. The SDK's
**`BetaToolRunner`** drives this loop for you:

```csharp
var runner = client.Beta.Messages.ToolRunner(betaParams);   // handles call → execute → feed-back → repeat
await foreach (var message in runner)
    foreach (var block in message.Content)
        if (block.TryPickText(out var text)) Console.WriteLine(text.Text);
```

Wire each tool to a **real BookTracker service** (find a book, update reading progress) so the agent
acts on the actual database. **Guard the loop:** cap max iterations and handle per-tool-call errors
(return a `tool_result` with `is_error` rather than throwing).

### Block 2C — Extended thinking (≈20 min)

For genuinely hard reasoning, enable **adaptive thinking** — Claude decides how much to think, and
the reasoning is returned as thinking blocks ahead of the answer:

```csharp
Thinking = new ThinkingConfigAdaptive(),
// optional: OutputConfig = new OutputConfig { Effort = Effort.High },
```

It costs more tokens and adds latency, so reserve it for multi-step problems where the quality lift
is worth it — not routine chat.

> **Tie-in to Day 1:** this is the same host/tool idea as MCP — here you implement tools
> **in-process**; in [Section 4](09-section-4-mcp.md) you expose them over MCP so *any* host
> (including Claude Code) can call them.

## Demos referenced here

- **Live streaming endpoint** (tokens arriving via `curl -N`) · **A 2+ tool agent run** visible in
  logs before the final answer. [Scripts in `_instructor/`.]

→ Continue to [**Lab 2**](06-lab-2-agent.md).
