# Troubleshooting

> **Summary — what this page covers**
> The "I hit a wall, what now?" page. Every entry is something that has actually broken in a
> workshop — install, auth, hooks not firing, MCP not appearing, RAG failing, CI quirks. Each
> entry has the symptom, the cause in one line, and the fix. Before flagging an issue to a TA,
> try the matching entry here.

## How to use this page

Search the page (`Ctrl/Cmd-F`) for the error message or symptom. Each entry is **Symptom → Cause
→ Fix**. If your issue isn't here, capture the exact error and ask a TA — and we'll add it.

---

## Day 1 — Install & authentication

### `claude: command not found`

- **Cause:** the global npm bin isn't on your `PATH`, or the install hit a permissions wall.
- **Fix:** confirm install (`npm list -g @anthropic-ai/claude-code`); if missing, reinstall
  without `sudo` — use a Node version manager (nvm) or fix npm's prefix
  (`npm config set prefix ~/.npm-global` and add `~/.npm-global/bin` to `PATH`).

### Windows: `claude` works in PowerShell but hooks/subagents misbehave

- **Cause:** Claude Code's full feature set requires **WSL2**.
- **Fix:** install WSL2 (`wsl --install`) and run **every** Claude Code command inside the WSL2
  shell. Don't mix PowerShell and WSL2 sessions.

### Auth keeps prompting / `401 Unauthorized` on Day 1

- **Cause:** logged out, or the wrong auth method.
- **Fix:** for the Pro/Max OAuth path, run `claude` and complete the browser flow once. For API
  auth, set `ANTHROPIC_API_KEY` in your shell profile and start a new terminal.

### "Claude Code requires a paid plan"

- **Cause:** Free tier. Day 1 needs Pro minimum.
- **Fix:** upgrade in Claude.ai → Settings → Plan. See [Prerequisites](prerequisites.md).

---

## Day 1 — Section 2 (Steering)

### My skill never fires automatically

- **Cause:** the skill description doesn't include the trigger words you'd actually say.
- **Fix:** rewrite the `description:` for the **model**, not the human. Include verbs and nouns
  from the user request ("add endpoint", "new route"). Then invoke explicitly once with `/name`
  to confirm it's wired; tighten the description and re-test auto-match.

### Skills aren't being recognized at all

- **Cause:** treating skills as single `.md` files (older docs showed this — it's wrong now).
- **Fix:** skills are **folders**. `.claude/skills/<name>/` must contain a `SKILL.md`. Verify
  with `ls .claude/skills/<name>/SKILL.md`.

### My subagent fails to load with a YAML error

- **Cause:** the file is named `<name>.yaml` instead of `<name>.md` with YAML frontmatter.
- **Fix:** rename to `.claude/agents/<name>.md`; the YAML lives between `---` fences at the top,
  and the body below it becomes the subagent's system prompt.

### My path-scoped rule doesn't load

- **Cause:** either you haven't read a matching file yet, or you're using a user-level rule with
  `paths:` (currently unreliable).
- **Fix:** confirm the rule's `paths:` glob matches a file you actually opened in the session
  (run `/memory` to see what's loaded). If it's a user-level rule (`~/.claude/rules/`), move it
  to **project-level** (`.claude/rules/`). Note: rules load on **read**, not on **create** —
  for a "every new file must…" rule, drop `paths:` so the rule is unscoped.

### My `PreToolUse` hook doesn't block anything

- **Cause:** the session was started before the hook was registered, or the script isn't
  executable, or it isn't returning exit code 2.
- **Fix:** `chmod +x .claude/hooks/<script>.sh`, confirm registration in `.claude/settings.json`,
  then **start a fresh session** (hooks load at session start). Test the script standalone:
  `echo '{"tool_input":{"command":"rm -rf /tmp"}}' | .claude/hooks/<script>.sh; echo $?` —
  should print `2`.

### A `PostToolUse` build/format hook fires but nothing seems to happen

- **Cause:** the hook command is probably succeeding silently; Claude only sees stderr by default.
- **Fix:** ensure your command writes meaningful output (errors **and** success markers) to
  stderr or stdout. For `dotnet build`, pipe through `grep -E '(error|warning|succeeded|FAILED)'`
  so Claude has something to react to.

---

## Day 1 — Section 3 (IDE + MCP)

### VS Code extension installed but no Claude UI

- **Cause:** authentication didn't complete inside the IDE, or the extension didn't reload.
- **Fix:** reload window (`Ctrl/Cmd-Shift-P → Developer: Reload Window`), then run the
  extension's sign-in command from the command palette.

### Rider/IntelliJ: plugin installed but no Claude tool window

- **Cause:** restart needed; the plugin uses a tool window that only appears after a full IDE restart.
- **Fix:** fully restart the IDE (not just "Invalidate Caches" — restart).

### `claude mcp list` doesn't show the GitHub server

- **Cause:** misregistered, or the GitHub PAT scope is wrong.
- **Fix:** check the registration command's exit code; confirm the PAT has at least `repo` (and
  `read:org` if you're targeting org-owned repos). Recreate the PAT if unsure.

### GitHub MCP demo times out / network blocked

- **Cause:** conference Wi-Fi or corporate proxy blocking GitHub API.
- **Fix:** tether to a phone; if even that fails, the recorded backup demo in `_instructor/`
  is the fallback. The lab can be completed against the recorded session.

---

## Day 2 — SDK & Auth

### `Anthropic` package not found / wrong package installed

- **Cause:** picked a community package by name. The **official** one is just **`Anthropic`** (v12.x).
- **Fix:** `dotnet add package Anthropic` — confirm it resolves to v12.x. The old
  `tryAGI.Anthropic` v3.x and `Anthropic.SDK` are separate lineages.

### `401 Unauthorized` from the API on Day 2

- **Cause:** missing/invalid API key, or expecting Pro to grant API access.
- **Fix:** Pro/Max **does not** include API access. Create an API key in
  [console.anthropic.com](https://console.anthropic.com) and set it via user-secrets in the
  `BookTracker.Api` project. See [Prerequisites](prerequisites.md).

### `429 Too Many Requests` / `529 Overloaded`

- **Cause:** rate limit (429) or transient capacity (529).
- **Fix:** the Polly policy from Section 1 handles both — confirm it's registered. For 529,
  retries with backoff resolve almost all cases.

### Socket exhaustion under load / `SocketException`

- **Cause:** `AnthropicClient` registered as Scoped or Transient.
- **Fix:** register as **Singleton** — it holds an `HttpClient` internally (same reason you'd
  use `IHttpClientFactory`).

---

## Day 2 — Streaming & Tools

### SSE endpoint returns the whole response at once (not streamed)

- **Cause:** response isn't being flushed per chunk, or buffering middleware is in the pipeline.
- **Fix:** set `Content-Type: text/event-stream`, write `data: ...\n\n` frames, and flush after
  each write (`await Response.Body.FlushAsync()`). Check that compression/output-cache middleware
  isn't buffering the response.

### Agent loop runs forever

- **Cause:** no max-iteration guard.
- **Fix:** add a cap (e.g. 10) on the loop; log iteration count; surface the tool call sequence
  for debugging.

### Tool isn't being called when it obviously should be

- **Cause:** the tool's **description** is too vague — Claude doesn't know when to use it.
- **Fix:** rewrite the description for the **model** (verbs and nouns from realistic prompts).
  Same discipline as a skill description.

---

## Day 2 — RAG (Section 3)

### Qdrant connection refused

- **Cause:** Docker isn't running, or the container didn't start.
- **Fix:** `docker ps` — confirm `qdrant/qdrant` is up. If not: `docker run -p 6333:6333 qdrant/qdrant`.

### `Vector dimension mismatch` when upserting

- **Cause:** the Qdrant collection's vector size doesn't match the embedding model's dimensions.
- **Fix:** recreate the collection with the right size. OpenAI `text-embedding-3-small` is 1536;
  Ollama `nomic-embed-text` is **768**. Set the size to whichever model you're using.

### RAG answers are vague / off-topic

- **Cause:** chunking, not the LLM. Chunks too big (retrieval is fuzzy) or too small (no context).
- **Fix:** tune chunk size and overlap; re-ingest; re-evaluate against the same query. The
  recurring theme of Section 3.

---

## Day 2 — MCP server (Section 4)

### Claude Code doesn't see my MCP server

- **Cause:** server not registered, wrong transport, or wrong URL.
- **Fix:** check `claude mcp list`. Confirm you registered the correct URL and that the server is
  configured for **Streamable HTTP** (not the older HTTP+SSE transport).

### Tool calls fail with a serialization error

- **Cause:** the C# tool's input/output types don't round-trip cleanly to JSON (e.g. a property
  that can't be deserialized, or a cyclic graph from an EF entity).
- **Fix:** use DTOs for tool inputs/outputs — same rule as your endpoints. Never expose an EF
  entity directly through a tool.

---

## Day 2 — CI/CD (Section 5)

### GitHub Actions can't find `claude` in the workflow

- **Cause:** the runner doesn't have Claude Code installed yet.
- **Fix:** add an `npm install -g @anthropic-ai/claude-code` step before invoking it; set
  `ANTHROPIC_API_KEY` from a repo secret.

### The AI review costs more than expected

- **Cause:** running the default model on every PR.
- **Fix:** use `--model claude-haiku-4-5` for CI reviews. Limit the diff size sent (skip large
  generated files, lock files, etc.).

### Release-notes job runs on every push instead of tag push

- **Cause:** workflow trigger.
- **Fix:** scope the workflow to `on: push: tags: ['v*']` (or use `release: types: [created]`).

---

## Still stuck?

Capture: the exact command you ran, the full error output, and your OS / .NET / Node versions.
Bring it to a TA — and we'll add the entry here so the next attendee doesn't hit the same wall.
