C9 — AI Tests + CI/CD

Summary — what this page covers Day 2's testing-and-pipeline lab. You make the agent loop unit-testable by introducing an SDK-free LLM seam, draft AgentService unit tests (happy path, multi-tool, error, cancellation) plus an adversarial edge-case pass, add TestContainers integration tests (real Qdrant — and a stretch SQL Server), then promote your Day 1 code-review skill into the CI pipeline with two GitHub Actions workflows: AI code review on every PR and AI release notes on every v* tag.

Time: ~60 minutes · Format: hands-on, solo · You start from: checkpoint/c8-mcp-server · You end at: checkpoint/c9-tests-cicd

C6 gave you a working tool-using agent — but its loop called the concrete AnthropicClient directly, so the loop couldn't be tested without the SDK and the network. C9 fixes that, then builds out the test suite the workshop has been deferring, and finally turns the C2 code-review discipline into automation that guards every future change. By the end the pipeline is your safety net.

You need Docker running (TestContainers spins real dependencies), a demo PR to review, GitHub Actions enabled on your fork, and ANTHROPIC_API_KEY added as a repo Actions secret (the CI jobs call the API). The workflows use Haiku to keep CI cheap.

The matching tag, checkpoint/c9-tests-cicd, is the answer key — peek there if you get stuck, but build it yourself first.


1. Start from the C8 checkpoint

Each lab starts from the previous checkpoint. Make sure your branch is at the C8 state, or branch from the tag:

git switch -c my-c9 checkpoint/c8-mcp-server
cd src/BookTracker
dotnet build BookTracker.sln

The Actions workflows you add later live at the git repo root .github/workflows/ — the parent of src/BookTracker/, alongside the existing retype-action.yml. GitHub Actions only runs workflows from the repo root, so don't put them under src/BookTracker/.


2. Make the agent testable — add the LLM seam

The C6 AgentService is tied to the concrete AnthropicClient, so it can't be mocked. Introduce a neutral, SDK-free seam in Core so the loop is testable without the SDK:

  • In BookTracker.Core/Services/IAgentLlm.cs, define neutral turn types and the seam interfaces:
    • AgentToolRequest(Id, Name, Input), AgentToolOutcome(Id, Result, IsError), and AgentTurn (carrying FinalText, ToolRequests, and input/output token counts, with a WantsTools helper).

    • IAgentConversation with SendUserMessageAsync(ct) (first turn) and SubmitToolOutcomesAsync(outcomes, ct) (each subsequent turn), both returning an AgentTurn.

    • IAgentLlm.StartConversation(userMessage) returning an IAgentConversation.

  • Move all SDK message/block handling into an adapter in BookTracker.Api/Services/AnthropicAgentLlm.cs (with a private AnthropicAgentConversation). Core stays SDK-free.
  • Refactor AgentService.RunAsync to drive the seam (StartConversationSendUserMessageAsync → loop over SubmitToolOutcomesAsync) instead of touching AnthropicClient.

Now the loop can be exercised with a scripted fake IAgentLlm — no SDK, no network.


3. Unit-test the loop with NSubstitute

Mock the Core services the agent's tools call (IBookService, IReadingProgressService) so there's no real DB or API. Build a small harness of scripted seam doubles, then write the tests. This is the payoff for the C2 test-writer Haiku subagent — point it at the loop and have it draft these.

# from src/BookTracker — packages this lab adds to BookTracker.Tests
dotnet add BookTracker.Tests package NSubstitute

Create the files:

  • BookTracker.Tests/Agent/AgentTestHarness.cs — scripted seam doubles (ScriptedLlm replays a list of AgentTurns; AlwaysCallsToolLlm never stops) plus small helpers (Final, ToolTurn, Args).
  • BookTracker.Tests/Agent/AgentServiceTests.cs — the deterministic behaviors:
    • happy path — one tool call → final answer;
    • multi-tool — two tool calls in order before the final (e.g. find_bookupdate_reading_progress);
    • tool error — a mocked service throws → the loop returns a tool_result with is_error, no crash;
    • cancellation — a cancelled CancellationToken propagates and stops the loop.
  • BookTracker.Tests/Agent/AgentEdgeCaseTests.cs — the adversarial pass:
    • the iteration-cap guard (the model never stops → guard message after the cap);
    • an unknown tool name (dispatch returns a message, not an exception);
    • a tool that returns no data but still completes.
dotnet test BookTracker.sln

4. Integration tests with TestContainers

Real dependencies in Docker, no mocks. Docker must be running. Add the packages:

dotnet add BookTracker.Tests package Testcontainers.Qdrant
dotnet add BookTracker.Tests package Testcontainers.MsSql
dotnet add BookTracker.Tests package Microsoft.EntityFrameworkCore.SqlServer

First, a guard so the suite stays green without Docker:

  • BookTracker.Tests/Integration/DockerFactAttribute.cs — a [DockerFact] deriving from xUnit v2's FactAttribute that auto-skips when Docker isn't available (and accepts an env-var name, e.g. [DockerFact("RUN_MSSQL_TESTS")], to gate expensive tests). In CI, ubuntu has Docker, so these run automatically.

Then the tests:

  • BookTracker.Tests/Integration/QdrantRagTests.cs — spin a real Qdrant container and drive the C7 QdrantVectorStore directly with hand-built vectors (no Ollama needed): create the collection, upsert three records, assert the count, SearchAsync top-K, and assert the nearest-neighbour book comes back. Mark it [DockerFact].

  • BookTracker.Tests/Integration/SqlServerTests.cs (stretch) — spin a real SQL Server container, point a BookTrackerDbContext at it via UseSqlServer, build the schema, and assert the HasData seed. Mark it [DockerFact("RUN_MSSQL_TESTS")] because the SQL Server image is large.

⚠️ DB-provider caveat. Dev/run uses SQLite, and EF Core migrations are provider-specific — the SQLite InitialCreate/AddReadingProgress migrations won't apply to SQL Server. So the SQL Server test builds its schema with Database.EnsureCreatedAsync() (model-driven, provider-agnostic), not Migrate(). EnsureCreated also applies the HasData seed, which the test asserts.

# Qdrant test runs whenever Docker is up; the MsSql stretch is gated:
RUN_MSSQL_TESTS=1 dotnet test BookTracker.sln

5. CI — AI code review on every PR

Now promote the C2 code-review discipline into the pipeline. At the repository root, create .github/workflows/ai-code-review.yml that, on pull_request to main:

  • checks out with fetch-depth: 0, installs Claude Code (npm install -g @anthropic-ai/claude-code);

  • diffs the base against HEAD into pr.diff;

  • runs Claude headlessclaude -p "$PROMPT" --model claude-haiku-4-5 < pr.diff > review.md — with a prompt covering correctness, security (injection / missing auth/validation / leaked secrets), missing error handling, and CLAUDE.md + .claude/rules/ violations (DTOs not entities, thin async endpoints, parameterized queries, migrations in Data);

  • posts the result with gh pr comment.

Give the job permissions: pull-requests: write and pass ANTHROPIC_API_KEY from secrets.

The headless flag is claude -p (print mode), not --no-interactive. Haiku keeps each review inexpensive. This workflow is the C2 /code-review skill — the same discipline, now automated.


6. CI — AI release notes on v* tags

Add .github/workflows/release-notes.yml at the repo root, triggered on push: tags: ['v*']:

  • collect commits since the previous v* tag (git describe --tags --abbrev=0 --match 'v*' "${TAG}^", then git log "$RANGE" … > commits.txt);

  • summarize them headless with Claude into grouped notes (claude -p "$PROMPT" --model claude-haiku-4-5 < commits.txt > notes.md, with headings What's New / Bug Fixes / Breaking Changes / Developer Notes);

  • publish with gh release create / gh release edit --notes-file notes.md (job needs permissions: contents: write).

Fire only on v* release tags — not the checkpoint/c* workshop tags — so cutting a checkpoint doesn't trigger a release.


7. Wire up secrets and verify the pipeline

# add the API key as a repo Actions secret (never commit it)
gh secret set ANTHROPIC_API_KEY

Confirm Actions is enabled on your fork, then:

  • open a demo PR and confirm the AI code review workflow runs and a comment appears;
  • push a v* tag and confirm AI release notes are generated on the Release.

Checkpoint — you're done when:

  • dotnet test is green: AgentService unit tests (happy / multi-tool / error / cancellation) and the edge-case pass (iteration cap / unknown tool / empty result).

  • The IAgentLlm / IAgentConversation seam lives in Core, the AnthropicAgentLlm adapter holds all SDK handling in Api, and AgentService.RunAsync drives the seam.

  • At least one TestContainers test passes with Docker running (Qdrant; SQL Server too if you did the stretch with EnsureCreated()), and the suite still passes with Docker stopped (auto-skip).

  • A PR triggers the AI code-review workflow and a comment appears.

  • A v* tag push produces AI-generated release notes on the Release.

  • ANTHROPIC_API_KEY is an Actions secret (not committed) and both workflows use Haiku.

When all the boxes are checked, your branch matches checkpoint/c9-tests-cicd.


What's next

C10 (Section 6) — responsible-AI hardening: the final pass adds prompt-injection defense, token budgets, and an audit log — and the quality gate can fail the build on critical review findings. The CI review and tests you built here become the safety net guarding every later change.