C9 — AI Tests + CI/CD
Summary — what this page covers Day 2's testing-and-pipeline lab. You make the agent loop unit-testable by introducing an SDK-free LLM seam, draft
AgentServiceunit tests (happy path, multi-tool, error, cancellation) plus an adversarial edge-case pass, add TestContainers integration tests (real Qdrant — and a stretch SQL Server), then promote your Day 1 code-review skill into the CI pipeline with two GitHub Actions workflows: AI code review on every PR and AI release notes on everyv*tag.Time: ~60 minutes · Format: hands-on, solo · You start from:
checkpoint/c8-mcp-server· You end at:checkpoint/c9-tests-cicd
C6 gave you a working tool-using agent — but its loop called the concrete AnthropicClient directly, so
the loop couldn't be tested without the SDK and the network. C9 fixes that, then builds out the test
suite the workshop has been deferring, and finally turns the C2 code-review discipline into automation
that guards every future change. By the end the pipeline is your safety net.
You need Docker running (TestContainers spins real dependencies), a demo PR to review, GitHub
Actions enabled on your fork, and ANTHROPIC_API_KEY added as a repo Actions secret (the CI jobs
call the API). The workflows use Haiku to keep CI cheap.
The matching tag,
checkpoint/c9-tests-cicd, is the answer key — peek there if you get stuck, but build it yourself first.
1. Start from the C8 checkpoint
Each lab starts from the previous checkpoint. Make sure your branch is at the C8 state, or branch from the tag:
git switch -c my-c9 checkpoint/c8-mcp-server
cd src/BookTracker
dotnet build BookTracker.sln
The Actions workflows you add later live at the git repo root .github/workflows/ — the parent of
src/BookTracker/, alongside the existing retype-action.yml. GitHub Actions only runs workflows from
the repo root, so don't put them under src/BookTracker/.
2. Make the agent testable — add the LLM seam
The C6 AgentService is tied to the concrete AnthropicClient, so it can't be mocked. Introduce a
neutral, SDK-free seam in Core so the loop is testable without the SDK:
- In
BookTracker.Core/Services/IAgentLlm.cs, define neutral turn types and the seam interfaces:-
AgentToolRequest(Id, Name, Input),AgentToolOutcome(Id, Result, IsError), andAgentTurn(carryingFinalText,ToolRequests, and input/output token counts, with aWantsToolshelper). -
IAgentConversationwithSendUserMessageAsync(ct)(first turn) andSubmitToolOutcomesAsync(outcomes, ct)(each subsequent turn), both returning anAgentTurn. -
IAgentLlm.StartConversation(userMessage)returning anIAgentConversation.
-
- Move all SDK message/block handling into an adapter in
BookTracker.Api/Services/AnthropicAgentLlm.cs(with a privateAnthropicAgentConversation). Core stays SDK-free. - Refactor
AgentService.RunAsyncto drive the seam (StartConversation→SendUserMessageAsync→ loop overSubmitToolOutcomesAsync) instead of touchingAnthropicClient.
Now the loop can be exercised with a scripted fake IAgentLlm — no SDK, no network.
3. Unit-test the loop with NSubstitute
Mock the Core services the agent's tools call (IBookService, IReadingProgressService) so there's
no real DB or API. Build a small harness of scripted seam doubles, then write the tests. This is the
payoff for the C2 test-writer Haiku subagent — point it at the loop and have it draft these.
# from src/BookTracker — packages this lab adds to BookTracker.Tests
dotnet add BookTracker.Tests package NSubstitute
Create the files:
BookTracker.Tests/Agent/AgentTestHarness.cs— scripted seam doubles (ScriptedLlmreplays a list ofAgentTurns;AlwaysCallsToolLlmnever stops) plus small helpers (Final,ToolTurn,Args).BookTracker.Tests/Agent/AgentServiceTests.cs— the deterministic behaviors:- happy path — one tool call → final answer;
- multi-tool — two tool calls in order before the final (e.g.
find_book→update_reading_progress); - tool error — a mocked service throws → the loop returns a
tool_resultwithis_error, no crash; - cancellation — a cancelled
CancellationTokenpropagates and stops the loop.
BookTracker.Tests/Agent/AgentEdgeCaseTests.cs— the adversarial pass:- the iteration-cap guard (the model never stops → guard message after the cap);
- an unknown tool name (dispatch returns a message, not an exception);
- a tool that returns no data but still completes.
dotnet test BookTracker.sln
4. Integration tests with TestContainers
Real dependencies in Docker, no mocks. Docker must be running. Add the packages:
dotnet add BookTracker.Tests package Testcontainers.Qdrant
dotnet add BookTracker.Tests package Testcontainers.MsSql
dotnet add BookTracker.Tests package Microsoft.EntityFrameworkCore.SqlServer
First, a guard so the suite stays green without Docker:
BookTracker.Tests/Integration/DockerFactAttribute.cs— a[DockerFact]deriving from xUnit v2'sFactAttributethat auto-skips when Docker isn't available (and accepts an env-var name, e.g.[DockerFact("RUN_MSSQL_TESTS")], to gate expensive tests). In CI, ubuntu has Docker, so these run automatically.
Then the tests:
-
BookTracker.Tests/Integration/QdrantRagTests.cs— spin a real Qdrant container and drive the C7QdrantVectorStoredirectly with hand-built vectors (no Ollama needed): create the collection, upsert three records, assert the count,SearchAsynctop-K, and assert the nearest-neighbour book comes back. Mark it[DockerFact]. -
BookTracker.Tests/Integration/SqlServerTests.cs(stretch) — spin a real SQL Server container, point aBookTrackerDbContextat it viaUseSqlServer, build the schema, and assert theHasDataseed. Mark it[DockerFact("RUN_MSSQL_TESTS")]because the SQL Server image is large.
⚠️ DB-provider caveat. Dev/run uses SQLite, and EF Core migrations are provider-specific — the SQLite
InitialCreate/AddReadingProgressmigrations won't apply to SQL Server. So the SQL Server test builds its schema withDatabase.EnsureCreatedAsync()(model-driven, provider-agnostic), notMigrate().EnsureCreatedalso applies theHasDataseed, which the test asserts.
# Qdrant test runs whenever Docker is up; the MsSql stretch is gated:
RUN_MSSQL_TESTS=1 dotnet test BookTracker.sln
5. CI — AI code review on every PR
Now promote the C2 code-review discipline into the pipeline. At the repository root, create
.github/workflows/ai-code-review.yml that, on pull_request to main:
-
checks out with
fetch-depth: 0, installs Claude Code (npm install -g @anthropic-ai/claude-code); -
diffs the base against
HEADintopr.diff; -
runs Claude headless —
claude -p "$PROMPT" --model claude-haiku-4-5 < pr.diff > review.md— with a prompt covering correctness, security (injection / missing auth/validation / leaked secrets), missing error handling, andCLAUDE.md+.claude/rules/violations (DTOs not entities, thin async endpoints, parameterized queries, migrations in Data); -
posts the result with
gh pr comment.
Give the job permissions: pull-requests: write and pass ANTHROPIC_API_KEY from secrets.
The headless flag is
claude -p(print mode), not--no-interactive. Haiku keeps each review inexpensive. This workflow is the C2/code-reviewskill — the same discipline, now automated.
6. CI — AI release notes on v* tags
Add .github/workflows/release-notes.yml at the repo root, triggered on push: tags: ['v*']:
-
collect commits since the previous
v*tag (git describe --tags --abbrev=0 --match 'v*' "${TAG}^", thengit log "$RANGE" … > commits.txt); -
summarize them headless with Claude into grouped notes (
claude -p "$PROMPT" --model claude-haiku-4-5 < commits.txt > notes.md, with headings What's New / Bug Fixes / Breaking Changes / Developer Notes); -
publish with
gh release create/gh release edit --notes-file notes.md(job needspermissions: contents: write).
Fire only on
v*release tags — not thecheckpoint/c*workshop tags — so cutting a checkpoint doesn't trigger a release.
7. Wire up secrets and verify the pipeline
# add the API key as a repo Actions secret (never commit it)
gh secret set ANTHROPIC_API_KEY
Confirm Actions is enabled on your fork, then:
- open a demo PR and confirm the AI code review workflow runs and a comment appears;
- push a
v*tag and confirm AI release notes are generated on the Release.
✅ Checkpoint — you're done when:
-
dotnet testis green:AgentServiceunit tests (happy / multi-tool / error / cancellation) and the edge-case pass (iteration cap / unknown tool / empty result). -
The
IAgentLlm/IAgentConversationseam lives in Core, theAnthropicAgentLlmadapter holds all SDK handling in Api, andAgentService.RunAsyncdrives the seam. -
At least one TestContainers test passes with Docker running (Qdrant; SQL Server too if you did the stretch with
EnsureCreated()), and the suite still passes with Docker stopped (auto-skip). -
A PR triggers the AI code-review workflow and a comment appears.
-
A
v*tag push produces AI-generated release notes on the Release. -
ANTHROPIC_API_KEYis an Actions secret (not committed) and both workflows use Haiku.
When all the boxes are checked, your branch matches checkpoint/c9-tests-cicd.
What's next
C10 (Section 6) — responsible-AI hardening: the final pass adds prompt-injection defense, token budgets, and an audit log — and the quality gate can fail the build on critical review findings. The CI review and tests you built here become the safety net guarding every later change.