# C9 — AI Tests + CI/CD

> **Summary — what this page covers**
> Day 2's testing-and-pipeline lab. You make the **agent loop unit-testable** by introducing an
> SDK-free LLM **seam**, draft **`AgentService` unit tests** (happy path, multi-tool, error, cancellation)
> plus an **adversarial edge-case pass**, add **TestContainers** integration tests (real Qdrant — and a
> stretch SQL Server), then promote your Day 1 **code-review skill into the CI pipeline** with two
> **GitHub Actions** workflows: AI code review on every PR and AI release notes on every `v*` tag.
>
> **Time:** ~60 minutes · **Format:** hands-on, solo · **You start from:** `checkpoint/c8-mcp-server` · **You end at:** `checkpoint/c9-tests-cicd`

C6 gave you a working tool-using agent — but its loop called the concrete `AnthropicClient` directly, so
the *loop* couldn't be tested without the SDK and the network. C9 fixes that, then builds out the test
suite the workshop has been deferring, and finally turns the C2 code-review discipline into automation
that guards every future change. By the end the pipeline is your safety net.

You need **Docker** running (TestContainers spins real dependencies), a **demo PR** to review, **GitHub
Actions enabled** on your fork, and **`ANTHROPIC_API_KEY` added as a repo Actions secret** (the CI jobs
call the API). The workflows use **Haiku** to keep CI cheap.

> The matching tag, `checkpoint/c9-tests-cicd`, is the answer key — peek there if you get stuck, but
> build it yourself first.

---

## 1. Start from the C8 checkpoint

Each lab starts from the previous checkpoint. Make sure your branch is at the C8 state, or branch from
the tag:

```bash
git switch -c my-c9 checkpoint/c8-mcp-server
cd src/BookTracker
dotnet build BookTracker.sln
```

The Actions workflows you add later live at the **git repo root** `.github/workflows/` — the parent of
`src/BookTracker/`, alongside the existing `retype-action.yml`. GitHub Actions only runs workflows from
the repo root, so don't put them under `src/BookTracker/`.

---

## 2. Make the agent testable — add the LLM seam

The C6 `AgentService` is tied to the concrete `AnthropicClient`, so it can't be mocked. Introduce a
neutral, **SDK-free** seam in **Core** so the loop is testable without the SDK:

- In `BookTracker.Core/Services/IAgentLlm.cs`, define neutral turn types and the seam interfaces:
  - `AgentToolRequest(Id, Name, Input)`, `AgentToolOutcome(Id, Result, IsError)`, and `AgentTurn`
    (carrying `FinalText`, `ToolRequests`, and input/output token counts, with a `WantsTools` helper).
  - `IAgentConversation` with `SendUserMessageAsync(ct)` (first turn) and
    `SubmitToolOutcomesAsync(outcomes, ct)` (each subsequent turn), both returning an `AgentTurn`.
  - `IAgentLlm.StartConversation(userMessage)` returning an `IAgentConversation`.
- Move all SDK message/block handling into an **adapter** in `BookTracker.Api/Services/AnthropicAgentLlm.cs`
  (with a private `AnthropicAgentConversation`). Core stays SDK-free.
- Refactor `AgentService.RunAsync` to drive the seam (`StartConversation` → `SendUserMessageAsync` →
  loop over `SubmitToolOutcomesAsync`) instead of touching `AnthropicClient`.

Now the loop can be exercised with a scripted fake `IAgentLlm` — no SDK, no network.

---

## 3. Unit-test the loop with NSubstitute

Mock the **Core services** the agent's tools call (`IBookService`, `IReadingProgressService`) so there's
no real DB or API. Build a small harness of scripted seam doubles, then write the tests. This is the
payoff for the C2 **`test-writer` Haiku subagent** — point it at the loop and have it draft these.

```bash
# from src/BookTracker — packages this lab adds to BookTracker.Tests
dotnet add BookTracker.Tests package NSubstitute
```

Create the files:

- **`BookTracker.Tests/Agent/AgentTestHarness.cs`** — scripted seam doubles (`ScriptedLlm` replays a
  list of `AgentTurn`s; `AlwaysCallsToolLlm` never stops) plus small helpers (`Final`, `ToolTurn`, `Args`).
- **`BookTracker.Tests/Agent/AgentServiceTests.cs`** — the deterministic behaviors:
  - **happy path** — one tool call → final answer;
  - **multi-tool** — two tool calls in order before the final (e.g. `find_book` → `update_reading_progress`);
  - **tool error** — a mocked service throws → the loop returns a `tool_result` with `is_error`, no crash;
  - **cancellation** — a cancelled `CancellationToken` propagates and stops the loop.
- **`BookTracker.Tests/Agent/AgentEdgeCaseTests.cs`** — the adversarial pass:
  - the **iteration-cap** guard (the model never stops → guard message after the cap);
  - an **unknown tool** name (dispatch returns a message, not an exception);
  - a tool that returns **no data** but still completes.

```bash
dotnet test BookTracker.sln
```

---

## 4. Integration tests with TestContainers

Real dependencies in Docker, no mocks. **Docker must be running.** Add the packages:

```bash
dotnet add BookTracker.Tests package Testcontainers.Qdrant
dotnet add BookTracker.Tests package Testcontainers.MsSql
dotnet add BookTracker.Tests package Microsoft.EntityFrameworkCore.SqlServer
```

First, a guard so the suite stays green without Docker:

- **`BookTracker.Tests/Integration/DockerFactAttribute.cs`** — a `[DockerFact]` deriving from xUnit
  **v2**'s `FactAttribute` that **auto-skips when Docker isn't available** (and accepts an env-var name,
  e.g. `[DockerFact("RUN_MSSQL_TESTS")]`, to gate expensive tests). In CI, ubuntu has Docker, so these
  run automatically.

Then the tests:

- **`BookTracker.Tests/Integration/QdrantRagTests.cs`** — spin a real Qdrant container and drive the C7
  `QdrantVectorStore` directly with **hand-built vectors (no Ollama needed)**: create the collection,
  upsert three records, assert the count, `SearchAsync` top-K, and assert the nearest-neighbour book
  comes back. Mark it `[DockerFact]`.
- **`BookTracker.Tests/Integration/SqlServerTests.cs`** *(stretch)* — spin a real SQL Server container,
  point a `BookTrackerDbContext` at it via `UseSqlServer`, build the schema, and assert the `HasData`
  seed. Mark it `[DockerFact("RUN_MSSQL_TESTS")]` because the SQL Server image is large.

> **⚠️ DB-provider caveat.** Dev/run uses **SQLite**, and EF Core **migrations are provider-specific** —
> the SQLite `InitialCreate`/`AddReadingProgress` migrations won't apply to SQL Server. So the SQL Server
> test builds its schema with **`Database.EnsureCreatedAsync()`** (model-driven, provider-agnostic), not
> `Migrate()`. `EnsureCreated` also applies the `HasData` seed, which the test asserts.

```bash
# Qdrant test runs whenever Docker is up; the MsSql stretch is gated:
RUN_MSSQL_TESTS=1 dotnet test BookTracker.sln
```

---

## 5. CI — AI code review on every PR

Now promote the C2 code-review discipline into the pipeline. At the **repository root**, create
**`.github/workflows/ai-code-review.yml`** that, on `pull_request` to `main`:

- checks out with `fetch-depth: 0`, installs Claude Code (`npm install -g @anthropic-ai/claude-code`);
- diffs the base against `HEAD` into `pr.diff`;
- runs Claude **headless** — `claude -p "$PROMPT" --model claude-haiku-4-5 < pr.diff > review.md` — with a
  prompt covering correctness, security (injection / missing auth/validation / leaked secrets), missing
  error handling, and `CLAUDE.md` + `.claude/rules/` violations (DTOs not entities, thin async endpoints,
  parameterized queries, migrations in Data);
- posts the result with `gh pr comment`.

Give the job `permissions: pull-requests: write` and pass `ANTHROPIC_API_KEY` from secrets.

> The headless flag is **`claude -p`** (print mode), *not* `--no-interactive`. Haiku keeps each review
> inexpensive. This workflow **is** the C2 `/code-review` skill — the same discipline, now automated.

---

## 6. CI — AI release notes on `v*` tags

Add **`.github/workflows/release-notes.yml`** at the repo root, triggered on `push: tags: ['v*']`:

- collect commits since the previous `v*` tag
  (`git describe --tags --abbrev=0 --match 'v*' "${TAG}^"`, then `git log "$RANGE" … > commits.txt`);
- summarize them headless with Claude into grouped notes
  (`claude -p "$PROMPT" --model claude-haiku-4-5 < commits.txt > notes.md`, with headings *What's New /
  Bug Fixes / Breaking Changes / Developer Notes*);
- publish with `gh release create` / `gh release edit --notes-file notes.md` (job needs
  `permissions: contents: write`).

> Fire only on `v*` **release tags** — not the `checkpoint/c*` workshop tags — so cutting a checkpoint
> doesn't trigger a release.

---

## 7. Wire up secrets and verify the pipeline

```bash
# add the API key as a repo Actions secret (never commit it)
gh secret set ANTHROPIC_API_KEY
```

Confirm Actions is enabled on your fork, then:

- open a **demo PR** and confirm the **AI code review** workflow runs and a comment appears;
- push a **`v*` tag** and confirm **AI release notes** are generated on the Release.

---

## ✅ Checkpoint — you're done when:

- [ ] `dotnet test` is green: `AgentService` unit tests (happy / multi-tool / error / cancellation) **and**
      the edge-case pass (iteration cap / unknown tool / empty result).
- [ ] The `IAgentLlm` / `IAgentConversation` seam lives in Core, the `AnthropicAgentLlm` adapter holds all
      SDK handling in Api, and `AgentService.RunAsync` drives the seam.
- [ ] At least one **TestContainers** test passes with Docker running (Qdrant; SQL Server too if you did
      the stretch with `EnsureCreated()`), and the suite still passes with Docker stopped (auto-skip).
- [ ] A PR triggers the **AI code-review** workflow and a comment appears.
- [ ] A `v*` tag push produces AI-generated **release notes** on the Release.
- [ ] `ANTHROPIC_API_KEY` is an Actions **secret** (not committed) and both workflows use **Haiku**.

When all the boxes are checked, your branch matches `checkpoint/c9-tests-cicd`.

---

## What's next

**C10 (Section 6) — responsible-AI hardening:** the final pass adds prompt-injection defense, token
budgets, and an audit log — and the quality gate can fail the build on critical review findings. The CI
review and tests you built here become the safety net guarding every later change.