# Lab 3 — Build a Book Recommendation RAG Engine

> **Summary — what this page covers**
> Attendees build `/api/recommend`: ingest book data, embed and store it in Qdrant, retrieve by
> similarity, and return grounded recommendations — committed. Provide the free Ollama path
> inline so no one is blocked on an OpenAI key.

**Duration: 40 min · Deliverable: working `/api/recommend` with grounded responses — committed**

## Part A — Vector store + embeddings (≈15 min)

Start Qdrant and create the collection with the **vector size set to your embedding model's
dimensions** (768 for `nomic-embed-text`). Implement `EmbeddingService` — OpenAI or the free Ollama
`nomic-embed-text` path. Then ingest the book data, **chunk** it, embed each chunk, and **upsert**.

```bash
docker run -p 6333:6333 qdrant/qdrant
```

> If the collection's vector size doesn't match the model's output dimensions, upserts fail — this
> is the #1 gotcha. Match them.

## Part B — Retrieval + augmented prompt (≈15 min)

Implement `RagService`: embed the query → **top-K cosine search** in Qdrant → build an **augmented
prompt** injecting the retrieved chunks as context → call Claude. Wire `POST /api/recommend`.

## Part C — Evaluate (≈10 min)

Run the **same query** with and without retrieved context and compare the answers — the grounded one
cites real books, the ungrounded one guesses. Then **tune chunk size/overlap** and re-run: watch the
retrieval quality (and the answer) change. Reinforce the section's thesis: **quality comes from
chunking, not the LLM.**

## Checkpoint

- [ ] Qdrant collection created with matching vector size
- [ ] Book data ingested, chunked, embedded, and stored
- [ ] `/api/recommend` returns recommendations grounded in retrieved context
- [ ] You compared grounded vs ungrounded output
- [ ] Committed to your fork

## Bonus — Hybrid Search & Evaluation

No time pressure. Add **hybrid search** — combine keyword matching with vector similarity so exact
title/author hits rank alongside semantic matches — and a **small eval harness** that scores
retrieval quality (did the right chunks come back?) so you can measure your chunking tweaks instead
of eyeballing them.