Week 7: Retrieval-Augmented Generation (RAG)

Introduction

Large Language Models (LLMs) are powerful but inherently limited by their training cutoff and lack of dynamic access to factual knowledge. Retrieval-Augmented Generation (RAG)¹ is a technique that combines the power of LLMs with external information retrieval systems to generate factually grounded, up-to-date responses. In a RAG pipeline, the model retrieves relevant documents from a vector store (like FAISS, Chroma, or Pinecone) based on a user query and then conditions its response on those documents.

RAG addresses key LLM limitations: knowledge cutoff dates, hallucination on factual queries, and inability to access private or domain-specific data. By augmenting generation with retrieved context, RAG systems can provide accurate, attributable answers while maintaining the fluency of modern LLMs.

Goals for the Week

Understand the motivation and architecture of RAG: retrieval + generation.
Learn document preprocessing: chunking strategies, metadata extraction.
Use embedding models to convert text into semantic vectors.
Store and retrieve documents from vector databases efficiently.
Implement end-to-end RAG pipelines with LangChain, LlamaIndex, or HuggingFace.
Evaluate retrieval quality (precision, recall) and generation quality (faithfulness, relevance).

Learning Guide

Videos & Courses

Recommended Courses:

LangChain: Chat with Your Data — DeepLearning.AI
- Covers document loading, splitting, embeddings, vector stores, and retrieval
- Note: Some APIs (RetrievalQA, ConversationBufferMemory) are outdated; see updated tutorials below
Building Applications with Vector Databases — DeepLearning.AI (Optional)
- Hands-on with Pinecone, Weaviate, and other vector DBs

Implementation Guides

LangChain:

Building a RAG Application — Official tutorial with updated APIs
Chat with Your Data - GitHub Repo — Course materials and notebooks

HuggingFace:

RAG with SmolAgents — Simple RAG with open models
RAG with Transformers — Using HF RAG architecture

Provider APIs:

OpenAI Retrieval Guide — Using embeddings and fine-tuning
Anthropic Contextual Retrieval — Advanced retrieval techniques
Cohere RAG — RAG with Cohere embeddings and reranking

Vector Databases:

FAISS Documentation — Facebook’s similarity search library
Chroma — Open-source embedding database
Pinecone — Managed vector database
Weaviate — Open-source vector search engine

Key Research Papers

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks¹: Original RAG paper combining retrieval with seq2seq models.
In-Context Retrieval-Augmented Language Models²: REALM approach for pre-training with retrieval.
Improving Language Models by Retrieving from Trillions of Tokens³: RETRO architecture integrating retrieval at scale.
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection⁴: Adaptive retrieval and self-correction in RAG.

Practice

1. Basic RAG Pipeline

Objective: Build an end-to-end RAG system.

Tasks:

Ingest a document collection (PDFs, text files, web scraping)
Implement chunking strategies: fixed-size, sentence-based, semantic
Generate embeddings with sentence-transformers or OpenAI
Store in FAISS or ChromaDB
Build query interface: retrieve top-k, format context, generate answer
Compare: answers with vs. without retrieval

2. Chunking Strategy Comparison

Objective: Compare different chunking methods.

Tasks:

Implement 3 strategies:
- Fixed-size (500 tokens, 100 overlap)
- Sentence-based (group 3-5 sentences)
- Semantic (split on topic boundaries)
Measure retrieval accuracy for each
Analyze: How does chunk size affect precision/recall?

3. Hybrid Retrieval

Objective: Combine semantic and keyword search.

Tasks:

Implement BM25 (keyword) retrieval
Implement dense (embedding) retrieval
Combine scores: final_score = α * semantic + (1-α) * keyword
Find optimal α via grid search
Compare: hybrid vs. dense-only vs. BM25-only

4. RAG Evaluation

Objective: Measure system quality.

Tasks:

Create test set: 20 questions with ground-truth answers
Compute retrieval metrics: precision@5, recall@5, MRR
Evaluate generation: Use LLM-as-judge or RAGAS
Error analysis: Categorize failure modes

5. Domain-Specific RAG

Objective: Build RAG for a specific domain.

Options:

Academic: Papers from arXiv in your field
Technical: API documentation, Stack Overflow
Legal: Case law, regulations
Medical: Research articles, clinical guidelines

Requirements:

100+ documents
Metadata extraction (authors, dates, categories)
Filtered retrieval (e.g., “papers after 2020”)
Citation tracking

References

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., … & Kiela, D. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, 33. arXiv:2005.11401. ↩︎ ↩︎
Guu, K., Lee, K., Tung, Z., Pasupat, P., & Chang, M. (2020). REALM: Retrieval-Augmented Language Model Pre-Training. International Conference on Machine Learning (ICML). arXiv:2002.08909. ↩︎
Borgeaud, S., Mensch, A., Hoffmann, J., Cai, T., Rutherford, E., Millican, K., … & Sifre, L. (2022). Improving Language Models by Retrieving from Trillions of Tokens. International Conference on Machine Learning (ICML). arXiv:2112.04426. ↩︎
Asai, A., Wu, Z., Wang, Y., Sil, A., & Hajishirzi, H. (2023). Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. arXiv preprint. arXiv:2310.11511. ↩︎