# C7 — RAG Recommendations (`POST /api/recommend`)

> **Summary — what this page covers**
> Give BookTracker a **retrieval-augmented generation** pipeline. You'll add a new
> `BookTracker.VectorStore` project (embeddings, chunking, a vector store), introduce a
> `Book.Description` corpus, ingest books + reviews into vectors, and build a grounded
> **`POST /api/recommend`** endpoint that answers from *your* data instead of the model's memory.
>
> **Time:** ~60 minutes · **Format:** hands-on, solo · **You start from:** `checkpoint/c6-streaming-agent` · **You end at:** `checkpoint/c7-rag`

C7 is **Day 2, Lab 3**. The thesis of this lab: *retrieval quality, not the model, is what makes
answers good.* You build the seam where embeddings, chunking, and vector search plug in — then watch a
grounded answer beat an ungrounded one for the same question.

The as-built solution defaults to an **in-memory vector store**, so the app runs **without Docker**.
The live Qdrant + Ollama path is the "real" stack you'll wire up and can switch to with one config
value. You still need your **Anthropic API key** (from C5/C6) for the final generation step.

---

## 1. Prerequisites

You already have the .NET 10 SDK and your **Anthropic API key** wired in from C5/C6. For the **live
retrieval path** (optional — the in-memory store is the Docker-free default), also install:

| Tool | Why | Check |
| --- | --- | --- |
| **Docker** | runs **Qdrant** (the vector DB) | `docker --version` |
| **Ollama** + `nomic-embed-text` | local, free embeddings (768-dim) | `ollama --version` |

```bash
# pull the embedding model and start Ollama (free embeddings on :11434)
ollama pull nomic-embed-text
ollama serve
```

> Everything below builds and runs with **no Docker and no Ollama** thanks to the in-memory store —
> but if Ollama isn't reachable, ingestion is skipped (the app logs a warning and still starts), so
> `/api/recommend` will have nothing to retrieve. Run Ollama to see real retrieval.

---

## 2. Start from the C6 checkpoint

Each lab starts from the previous checkpoint; the matching tag (`checkpoint/c7-rag`) is the answer key.

```bash
# from the repo root, branch from the C6 checkpoint
git switch -c my-c7 checkpoint/c6-streaming-agent

# everything below runs inside the solution folder
cd src/BookTracker
```

---

## 3. Create the `BookTracker.VectorStore` project

This new project holds the whole retrieval pipeline. It references **Core only** — it reads entities
and seed text through Core's repository ports, so it never touches EF directly. The **Anthropic SDK
stays in Api**; VectorStore is pure retrieval.

```bash
dotnet new classlib -n BookTracker.VectorStore -o BookTracker.VectorStore
dotnet sln BookTracker.sln add BookTracker.VectorStore/BookTracker.VectorStore.csproj
dotnet add BookTracker.VectorStore reference BookTracker.Core/BookTracker.Core.csproj
```

Add the packages (these exact versions are the as-built set):

```bash
dotnet add BookTracker.VectorStore package Qdrant.Client --version 1.18.1
dotnet add BookTracker.VectorStore package Microsoft.Extensions.AI --version 10.7.0
dotnet add BookTracker.VectorStore package OllamaSharp --version 5.4.25
dotnet add BookTracker.VectorStore package Microsoft.Extensions.Options --version 10.0.9
```

Then have **Api** reference VectorStore:

```bash
dotnet add BookTracker.Api reference BookTracker.VectorStore/BookTracker.VectorStore.csproj
```

---

## 4. Embeddings behind one seam

Wrap Microsoft.Extensions.AI's `IEmbeddingGenerator<string, Embedding<float>>` in a thin
`IEmbeddingService` / `EmbeddingService`. The point of the seam: **swapping OpenAI ↔ Ollama is just
which generator you register** — nothing downstream changes. The default is local Ollama.

```csharp
// OllamaSharp implements IEmbeddingGenerator — this is the free local path
IEmbeddingGenerator<string, Embedding<float>> embedder =
    new OllamaApiClient(new Uri("http://localhost:11434"), "nomic-embed-text");

var result = await embedder.GenerateAsync([text], cancellationToken: ct);
float[] vector = result[0].Vector.ToArray();   // 768-dim for nomic-embed-text
```

> Bind these settings via `VectorStoreOptions` (`IOptions<>`): `OllamaUrl`, `EmbeddingModel`,
> `VectorSize` (768), `Provider`, `ChunkSize`, `ChunkOverlap`, `TopK`. Defaults run Docker-free.

---

## 5. Chunk the text — the quality lever

Add `ITextChunker` / `TextChunker`: split text into **size + overlap** windows (the as-built defaults
are ~**500-char** windows with ~**80-char** overlap), respecting **sentence boundaries** — don't split
mid-sentence. This is where retrieval quality is won or lost; you'll tune it in Part C.

---

## 6. Add the vector store (two implementations)

Define `IVectorStore` and back it with two implementations selected by `VectorStore:Provider`:

```csharp
public interface IVectorStore
{
    Task EnsureCollectionAsync(int vectorSize, CancellationToken ct);
    Task UpsertAsync(IEnumerable<VectorRecord> records, CancellationToken ct);
    Task<IReadOnlyList<VectorHit>> SearchAsync(float[] query, int topK, CancellationToken ct);
    Task<long> CountAsync(CancellationToken ct);   // skip ingestion when already populated
}
```

- **`InMemoryVectorStore`** — cosine similarity in memory. This is the **default**
  (`Provider = "InMemory"`), so the app runs with no Docker.
- **`QdrantVectorStore`** — the real DB over **gRPC**:

```csharp
var qdrant = new QdrantClient("localhost", 6334);   // gRPC — NOT the REST port 6333
await qdrant.CreateCollectionAsync("books",
    new VectorParams { Size = 768, Distance = Distance.Cosine });  // Size = model dims
```

> **Vector size must equal the model's dimensions** or upserts fail: `nomic-embed-text` = **768**,
> OpenAI `text-embedding-3-small` = 1536.

---

## 7. Add the corpus: `Book.Description`

The corpus you retrieve over = book **Title + Genre + Description** + each review's **Body**.
`Book.Description` is introduced here. Add the entity field, seed a blurb on each seeded book, and
create the migration:

```bash
dotnet ef migrations add AddBookDescription --project BookTracker.Data --startup-project BookTracker.Api
```

> Keep `Description` **entity-only** — do not expose it in `BookDto`. It's source text for retrieval,
> not part of the public API shape.

---

## 8. Ingest the corpus

Add `CorpusIngestionService`: read books (with `Description`) and reviews via the **Core repository
ports** (`IBookRepository`, `IReviewRepository`) — *not* the DbContext — then **chunk → embed →
upsert** each chunk with payload (`bookId`, `title`, the chunk text).

Run it **best-effort at startup**: it's idempotent (skip when `CountAsync` shows the store is already
populated), and if Ollama/the embedding provider is unreachable it **logs a warning and continues** so
the app still starts from a clone.

---

## 9. Build the RAG endpoint

In **Api**, add `RagService` (retrieve → augment → generate) and `RecommendEndpoints`:

```text
1. embed the user query                         (IEmbeddingService)
2. top-K cosine search                          (IVectorStore.SearchAsync)
3. build an augmented prompt:
     system = instructions + retrieved chunks   ← cache this block (reuse C5 caching)
     user   = the query
4. client.Messages.Create(...)                  → grounded answer + source books
```

Put the **retrieved context in the cached system block** using `CacheControlEphemeral`, reusing the C5
prompt-caching pattern (same minimum-tokens caveat). The endpoint stays thin and delegates to the
service:

```csharp
app.MapPost("/api/recommend", async Task<Results<Ok<RecommendResult>, ValidationProblem>> (
    RecommendRequest request, IRagService rag, CancellationToken ct) =>
{
    if (string.IsNullOrWhiteSpace(request.Query))
        return TypedResults.ValidationProblem(new Dictionary<string, string[]>
        {
            ["query"] = ["Query is required."],
        });

    return TypedResults.Ok(await rag.RecommendAsync(request.Query, ct));
}).WithTags("Recommend");
```

In `Program.cs`: register the `IEmbeddingGenerator` (Ollama or OpenAI), the `IVectorStore` (by config),
`CorpusIngestionService`, and `RagService`; ingest on startup; and map the endpoint.

---

## 10. Build, run, and verify

```bash
dotnet build BookTracker.sln
dotnet run --project BookTracker.Api          # in-memory store; needs Ollama for ingestion
```

Ask for a grounded recommendation:

```bash
curl -X POST http://localhost:5255/api/recommend \
  -H "Content-Type: application/json" \
  -d '{"query":"I liked Clean Code — what should I read next?"}'
```

You should get a recommendation **grounded** in retrieved books, with the **source** titles it drew
from. Compare it against the same question asked *without* retrieval (e.g. your C5/C6 chat endpoint) —
the answers should **visibly differ**.

**Part C — the quality lever.** Change `ChunkSize` / `ChunkOverlap` in config, restart, and re-ask:
retrieval changes. That's the whole point — quality comes from chunking, not the model.

**Optional — the real Qdrant stack.** Bring up Qdrant mapping **both** ports (the .NET client is gRPC
on 6334; 6333 is REST/UI), then flip the provider:

```bash
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant   # or: docker compose up -d
```

Set `VectorStore:Provider` to `"Qdrant"` in `appsettings`, restart, and confirm the collection is
created with **size 768** and the corpus ingests (count > 0).

---

## ✅ Checkpoint — you're done when:

- [ ] `dotnet build` is green; `BookTracker.VectorStore` exists and **Api references it**.
- [ ] `Book.Description` exists with seed blurbs and the `AddBookDescription` migration.
- [ ] The corpus ingests at startup (books + reviews chunked, embedded, upserted — count > 0).
- [ ] `POST /api/recommend` returns recommendations **grounded** in retrieved context, with sources.
- [ ] Grounded vs ungrounded answers for the same query **visibly differ**.
- [ ] Changing chunk size/overlap **changes retrieval**.
- [ ] The **in-memory** provider works as a Docker-free fallback; (optional) Qdrant works on both ports
      with vector size 768.

You're now at `checkpoint/c7-rag`.

---

## What's next

**Lab 4 (C7 → C8):** you'll build your **own MCP server in C#** — exposing BookTracker's Core services
as tools an MCP client (like Claude) can call.