Section 2 — Streaming, Tool Calling & Extended Thinking
Summary — what this page covers Three capabilities that turn a chat call into an agent: streaming responses to the client token-by-token, letting Claude call your C# functions (tool calling) to act on real data, and using extended thinking for harder reasoning. Attendees build a streaming endpoint and a multi-tool agent loop. Pair with Lab 2.
10:45 AM – 12:00 PM · 75 min — 40 min lecture/demo + 35 min lab
Learning objectives
- Stream responses with Server-Sent Events (SSE) from ASP.NET Core
- Define tools (functions) Claude can call, with typed inputs
- Implement the agent loop: model → tool call → tool result → model → … → end turn
- Wire tools to real BookTracker services/data
- Apply extended thinking for complex reasoning and know its cost trade-offs
Content
Block 2A — Streaming chat with Server-Sent Events (≈25 min)
Why streaming matters. A non-streaming chat call sits silent for seconds, then dumps the whole answer. Streaming surfaces tokens as they're generated, so the UI feels responsive — the perceived latency drops even though total time is the same.
SSE in ASP.NET Core. The SDK's CreateStreaming returns an IAsyncEnumerable of stream events;
you pick out the text deltas and write each one to the response as a data: frame:
using Anthropic.Models.Messages;
// In StreamingService: surface Claude's text deltas as IAsyncEnumerable<string>
public async IAsyncEnumerable<string> StreamAsync(string prompt,
[EnumeratorCancellation] CancellationToken ct)
{
var parameters = new MessageCreateParams {
Model = Model.ClaudeSonnet4_6,
MaxTokens = 1024,
Messages = [new() { Role = Role.User, Content = prompt }],
};
await foreach (var ev in client.Messages.CreateStreaming(parameters, ct))
if (ev.TryPickContentBlockDelta(out var delta) && delta.Delta.TryPickText(out var text))
yield return text.Text;
}
// In the Minimal API endpoint: write SSE frames and flush per chunk
app.MapGet("/api/chat/stream", async (string q, StreamingService svc, HttpResponse res, CancellationToken ct) =>
{
res.Headers.ContentType = "text/event-stream";
await foreach (var chunk in svc.StreamAsync(q, ct))
{
await res.WriteAsync($"data: {chunk}\n\n", ct);
await res.Body.FlushAsync(ct); // flush so the client sees tokens immediately
}
});
Consume it with curl -N or a browser EventSource so attendees watch tokens arrive live.
Block 2B — Tool calling & the agent loop (multi-step workflows) (≈30 min)
Define a tool — name, description, and a JSON input schema. The description is what makes the tool reliable: design it like a skill description (say when to use it, not just what it does).
using System.Text.Json;
using Anthropic.Models.Messages;
var findBook = new Tool {
Name = "find_book",
Description = "Look up a book in BookTracker by title. Use when the user names a specific book.",
InputSchema = new() {
Properties = new Dictionary<string, JsonElement> {
["title"] = JsonSerializer.SerializeToElement(new { type = "string", description = "Book title" }),
},
Required = ["title"],
},
};
The agent loop — the reusable pattern for multi-step AI workflows: Claude requests a tool →
you execute the C# method → you return a tool_result → Claude continues, possibly calling more
tools, until StopReason is EndTurn. Implement it as a loop, not a single call. The SDK's
BetaToolRunner drives this loop for you:
var runner = client.Beta.Messages.ToolRunner(betaParams); // handles call → execute → feed-back → repeat
await foreach (var message in runner)
foreach (var block in message.Content)
if (block.TryPickText(out var text)) Console.WriteLine(text.Text);
Wire each tool to a real BookTracker service (find a book, update reading progress) so the agent
acts on the actual database. Guard the loop: cap max iterations and handle per-tool-call errors
(return a tool_result with is_error rather than throwing).
Block 2C — Extended thinking (≈20 min)
For genuinely hard reasoning, enable adaptive thinking — Claude decides how much to think, and the reasoning is returned as thinking blocks ahead of the answer:
Thinking = new ThinkingConfigAdaptive(),
// optional: OutputConfig = new OutputConfig { Effort = Effort.High },
It costs more tokens and adds latency, so reserve it for multi-step problems where the quality lift is worth it — not routine chat.
Tie-in to Day 1: this is the same host/tool idea as MCP — here you implement tools in-process; in Section 4 you expose them over MCP so any host (including Claude Code) can call them.
Demos referenced here
- Live streaming endpoint (tokens arriving via
curl -N) · A 2+ tool agent run visible in logs before the final answer. [Scripts in_instructor/.]
→ Continue to Lab 2.