Lab 3 — Build a Book Recommendation RAG Engine
Summary — what this page covers Attendees build
/api/recommend: ingest book data, embed and store it in Qdrant, retrieve by similarity, and return grounded recommendations — committed. Provide the free Ollama path inline so no one is blocked on an OpenAI key.
Duration: 40 min · Deliverable: working /api/recommend with grounded responses — committed
Part A — Vector store + embeddings (≈15 min)
Start Qdrant and create the collection with the vector size set to your embedding model's
dimensions (768 for nomic-embed-text). Implement EmbeddingService — OpenAI or the free Ollama
nomic-embed-text path. Then ingest the book data, chunk it, embed each chunk, and upsert.
docker run -p 6333:6333 qdrant/qdrant
If the collection's vector size doesn't match the model's output dimensions, upserts fail — this is the #1 gotcha. Match them.
Part B — Retrieval + augmented prompt (≈15 min)
Implement RagService: embed the query → top-K cosine search in Qdrant → build an augmented
prompt injecting the retrieved chunks as context → call Claude. Wire POST /api/recommend.
Part C — Evaluate (≈10 min)
Run the same query with and without retrieved context and compare the answers — the grounded one cites real books, the ungrounded one guesses. Then tune chunk size/overlap and re-run: watch the retrieval quality (and the answer) change. Reinforce the section's thesis: quality comes from chunking, not the LLM.
Checkpoint
- Qdrant collection created with matching vector size
- Book data ingested, chunked, embedded, and stored
-
/api/recommendreturns recommendations grounded in retrieved context - You compared grounded vs ungrounded output
- Committed to your fork
Bonus — Hybrid Search & Evaluation
No time pressure. Add hybrid search — combine keyword matching with vector similarity so exact title/author hits rank alongside semantic matches — and a small eval harness that scores retrieval quality (did the right chunks come back?) so you can measure your chunking tweaks instead of eyeballing them.