C7 — RAG Recommendations (POST /api/recommend)
Summary — what this page covers Give BookTracker a retrieval-augmented generation pipeline. You'll add a new
BookTracker.VectorStoreproject (embeddings, chunking, a vector store), introduce aBook.Descriptioncorpus, ingest books + reviews into vectors, and build a groundedPOST /api/recommendendpoint that answers from your data instead of the model's memory.Time: ~60 minutes · Format: hands-on, solo · You start from:
checkpoint/c6-streaming-agent· You end at:checkpoint/c7-rag
C7 is Day 2, Lab 3. The thesis of this lab: retrieval quality, not the model, is what makes answers good. You build the seam where embeddings, chunking, and vector search plug in — then watch a grounded answer beat an ungrounded one for the same question.
The as-built solution defaults to an in-memory vector store, so the app runs without Docker. The live Qdrant + Ollama path is the "real" stack you'll wire up and can switch to with one config value. You still need your Anthropic API key (from C5/C6) for the final generation step.
1. Prerequisites
You already have the .NET 10 SDK and your Anthropic API key wired in from C5/C6. For the live retrieval path (optional — the in-memory store is the Docker-free default), also install:
# pull the embedding model and start Ollama (free embeddings on :11434)
ollama pull nomic-embed-text
ollama serve
Everything below builds and runs with no Docker and no Ollama thanks to the in-memory store — but if Ollama isn't reachable, ingestion is skipped (the app logs a warning and still starts), so
/api/recommendwill have nothing to retrieve. Run Ollama to see real retrieval.
2. Start from the C6 checkpoint
Each lab starts from the previous checkpoint; the matching tag (checkpoint/c7-rag) is the answer key.
# from the repo root, branch from the C6 checkpoint
git switch -c my-c7 checkpoint/c6-streaming-agent
# everything below runs inside the solution folder
cd src/BookTracker
3. Create the BookTracker.VectorStore project
This new project holds the whole retrieval pipeline. It references Core only — it reads entities and seed text through Core's repository ports, so it never touches EF directly. The Anthropic SDK stays in Api; VectorStore is pure retrieval.
dotnet new classlib -n BookTracker.VectorStore -o BookTracker.VectorStore
dotnet sln BookTracker.sln add BookTracker.VectorStore/BookTracker.VectorStore.csproj
dotnet add BookTracker.VectorStore reference BookTracker.Core/BookTracker.Core.csproj
Add the packages (these exact versions are the as-built set):
dotnet add BookTracker.VectorStore package Qdrant.Client --version 1.18.1
dotnet add BookTracker.VectorStore package Microsoft.Extensions.AI --version 10.7.0
dotnet add BookTracker.VectorStore package OllamaSharp --version 5.4.25
dotnet add BookTracker.VectorStore package Microsoft.Extensions.Options --version 10.0.9
Then have Api reference VectorStore:
dotnet add BookTracker.Api reference BookTracker.VectorStore/BookTracker.VectorStore.csproj
4. Embeddings behind one seam
Wrap Microsoft.Extensions.AI's IEmbeddingGenerator<string, Embedding<float>> in a thin
IEmbeddingService / EmbeddingService. The point of the seam: swapping OpenAI ↔ Ollama is just
which generator you register — nothing downstream changes. The default is local Ollama.
// OllamaSharp implements IEmbeddingGenerator — this is the free local path
IEmbeddingGenerator<string, Embedding<float>> embedder =
new OllamaApiClient(new Uri("http://localhost:11434"), "nomic-embed-text");
var result = await embedder.GenerateAsync([text], cancellationToken: ct);
float[] vector = result[0].Vector.ToArray(); // 768-dim for nomic-embed-text
Bind these settings via
VectorStoreOptions(IOptions<>):OllamaUrl,EmbeddingModel,VectorSize(768),Provider,ChunkSize,ChunkOverlap,TopK. Defaults run Docker-free.
5. Chunk the text — the quality lever
Add ITextChunker / TextChunker: split text into size + overlap windows (the as-built defaults
are ~500-char windows with ~80-char overlap), respecting sentence boundaries — don't split
mid-sentence. This is where retrieval quality is won or lost; you'll tune it in Part C.
6. Add the vector store (two implementations)
Define IVectorStore and back it with two implementations selected by VectorStore:Provider:
public interface IVectorStore
{
Task EnsureCollectionAsync(int vectorSize, CancellationToken ct);
Task UpsertAsync(IEnumerable<VectorRecord> records, CancellationToken ct);
Task<IReadOnlyList<VectorHit>> SearchAsync(float[] query, int topK, CancellationToken ct);
Task<long> CountAsync(CancellationToken ct); // skip ingestion when already populated
}
-
InMemoryVectorStore— cosine similarity in memory. This is the default (Provider = "InMemory"), so the app runs with no Docker. -
QdrantVectorStore— the real DB over gRPC:
var qdrant = new QdrantClient("localhost", 6334); // gRPC — NOT the REST port 6333
await qdrant.CreateCollectionAsync("books",
new VectorParams { Size = 768, Distance = Distance.Cosine }); // Size = model dims
Vector size must equal the model's dimensions or upserts fail:
nomic-embed-text= 768, OpenAItext-embedding-3-small= 1536.
7. Add the corpus: Book.Description
The corpus you retrieve over = book Title + Genre + Description + each review's Body.
Book.Description is introduced here. Add the entity field, seed a blurb on each seeded book, and
create the migration:
dotnet ef migrations add AddBookDescription --project BookTracker.Data --startup-project BookTracker.Api
Keep
Descriptionentity-only — do not expose it inBookDto. It's source text for retrieval, not part of the public API shape.
8. Ingest the corpus
Add CorpusIngestionService: read books (with Description) and reviews via the Core repository
ports (IBookRepository, IReviewRepository) — not the DbContext — then chunk → embed →
upsert each chunk with payload (bookId, title, the chunk text).
Run it best-effort at startup: it's idempotent (skip when CountAsync shows the store is already
populated), and if Ollama/the embedding provider is unreachable it logs a warning and continues so
the app still starts from a clone.
9. Build the RAG endpoint
In Api, add RagService (retrieve → augment → generate) and RecommendEndpoints:
1. embed the user query (IEmbeddingService)
2. top-K cosine search (IVectorStore.SearchAsync)
3. build an augmented prompt:
system = instructions + retrieved chunks ← cache this block (reuse C5 caching)
user = the query
4. client.Messages.Create(...) → grounded answer + source books
Put the retrieved context in the cached system block using CacheControlEphemeral, reusing the C5
prompt-caching pattern (same minimum-tokens caveat). The endpoint stays thin and delegates to the
service:
app.MapPost("/api/recommend", async Task<Results<Ok<RecommendResult>, ValidationProblem>> (
RecommendRequest request, IRagService rag, CancellationToken ct) =>
{
if (string.IsNullOrWhiteSpace(request.Query))
return TypedResults.ValidationProblem(new Dictionary<string, string[]>
{
["query"] = ["Query is required."],
});
return TypedResults.Ok(await rag.RecommendAsync(request.Query, ct));
}).WithTags("Recommend");
In Program.cs: register the IEmbeddingGenerator (Ollama or OpenAI), the IVectorStore (by config),
CorpusIngestionService, and RagService; ingest on startup; and map the endpoint.
10. Build, run, and verify
dotnet build BookTracker.sln
dotnet run --project BookTracker.Api # in-memory store; needs Ollama for ingestion
Ask for a grounded recommendation:
curl -X POST http://localhost:5255/api/recommend \
-H "Content-Type: application/json" \
-d '{"query":"I liked Clean Code — what should I read next?"}'
You should get a recommendation grounded in retrieved books, with the source titles it drew from. Compare it against the same question asked without retrieval (e.g. your C5/C6 chat endpoint) — the answers should visibly differ.
Part C — the quality lever. Change ChunkSize / ChunkOverlap in config, restart, and re-ask:
retrieval changes. That's the whole point — quality comes from chunking, not the model.
Optional — the real Qdrant stack. Bring up Qdrant mapping both ports (the .NET client is gRPC on 6334; 6333 is REST/UI), then flip the provider:
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant # or: docker compose up -d
Set VectorStore:Provider to "Qdrant" in appsettings, restart, and confirm the collection is
created with size 768 and the corpus ingests (count > 0).
✅ Checkpoint — you're done when:
-
dotnet buildis green;BookTracker.VectorStoreexists and Api references it. -
Book.Descriptionexists with seed blurbs and theAddBookDescriptionmigration. - The corpus ingests at startup (books + reviews chunked, embedded, upserted — count > 0).
-
POST /api/recommendreturns recommendations grounded in retrieved context, with sources. - Grounded vs ungrounded answers for the same query visibly differ.
- Changing chunk size/overlap changes retrieval.
- The in-memory provider works as a Docker-free fallback; (optional) Qdrant works on both ports with vector size 768.
You're now at checkpoint/c7-rag.
What's next
Lab 4 (C7 → C8): you'll build your own MCP server in C# — exposing BookTracker's Core services as tools an MCP client (like Claude) can call.