Week 7: Retrieval-Augmented Generation (RAG)

Introduction

Large Language Models (LLMs) are powerful but inherently limited by their training cutoff and lack of dynamic access to factual knowledge. Retrieval-Augmented Generation (RAG)1 is a technique that combines the power of LLMs with external information retrieval systems to generate factually grounded, up-to-date responses. In a RAG pipeline, the model retrieves relevant documents from a vector store (like FAISS, Chroma, or Pinecone) based on a user query and then conditions its response on those documents.

RAG addresses key LLM limitations: knowledge cutoff dates, hallucination on factual queries, and inability to access private or domain-specific data. By augmenting generation with retrieved context, RAG systems can provide accurate, attributable answers while maintaining the fluency of modern LLMs.

Goals for the Week

  • Understand the motivation and architecture of RAG: retrieval + generation.
  • Learn document preprocessing: chunking strategies, metadata extraction.
  • Use embedding models to convert text into semantic vectors.
  • Store and retrieve documents from vector databases efficiently.
  • Implement end-to-end RAG pipelines with LangChain, LlamaIndex, or HuggingFace.
  • Evaluate retrieval quality (precision, recall) and generation quality (faithfulness, relevance).

Learning Guide

Videos & Courses

Recommended Courses:

Implementation Guides

LangChain:

HuggingFace:

Provider APIs:

Vector Databases:

Key Research Papers

  • Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks1: Original RAG paper combining retrieval with seq2seq models.
  • In-Context Retrieval-Augmented Language Models2: REALM approach for pre-training with retrieval.
  • Improving Language Models by Retrieving from Trillions of Tokens3: RETRO architecture integrating retrieval at scale.
  • Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection4: Adaptive retrieval and self-correction in RAG.

Practice

1. Basic RAG Pipeline

Objective: Build an end-to-end RAG system.

Tasks:

  • Ingest a document collection (PDFs, text files, web scraping)
  • Implement chunking strategies: fixed-size, sentence-based, semantic
  • Generate embeddings with sentence-transformers or OpenAI
  • Store in FAISS or ChromaDB
  • Build query interface: retrieve top-k, format context, generate answer
  • Compare: answers with vs. without retrieval

2. Chunking Strategy Comparison

Objective: Compare different chunking methods.

Tasks:

  • Implement 3 strategies:
    • Fixed-size (500 tokens, 100 overlap)
    • Sentence-based (group 3-5 sentences)
    • Semantic (split on topic boundaries)
  • Measure retrieval accuracy for each
  • Analyze: How does chunk size affect precision/recall?

3. Hybrid Retrieval

Objective: Combine semantic and keyword search.

Tasks:

  • Implement BM25 (keyword) retrieval
  • Implement dense (embedding) retrieval
  • Combine scores: final_score = α * semantic + (1-α) * keyword
  • Find optimal α via grid search
  • Compare: hybrid vs. dense-only vs. BM25-only

4. RAG Evaluation

Objective: Measure system quality.

Tasks:

  • Create test set: 20 questions with ground-truth answers
  • Compute retrieval metrics: precision@5, recall@5, MRR
  • Evaluate generation: Use LLM-as-judge or RAGAS
  • Error analysis: Categorize failure modes

5. Domain-Specific RAG

Objective: Build RAG for a specific domain.

Options:

  • Academic: Papers from arXiv in your field
  • Technical: API documentation, Stack Overflow
  • Legal: Case law, regulations
  • Medical: Research articles, clinical guidelines

Requirements:

  • 100+ documents
  • Metadata extraction (authors, dates, categories)
  • Filtered retrieval (e.g., “papers after 2020”)
  • Citation tracking

References


  1. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., … & Kiela, D. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, 33. arXiv:2005.11401↩︎ ↩︎

  2. Guu, K., Lee, K., Tung, Z., Pasupat, P., & Chang, M. (2020). REALM: Retrieval-Augmented Language Model Pre-Training. International Conference on Machine Learning (ICML). arXiv:2002.08909↩︎

  3. Borgeaud, S., Mensch, A., Hoffmann, J., Cai, T., Rutherford, E., Millican, K., … & Sifre, L. (2022). Improving Language Models by Retrieving from Trillions of Tokens. International Conference on Machine Learning (ICML). arXiv:2112.04426↩︎

  4. Asai, A., Wu, Z., Wang, Y., Sil, A., & Hajishirzi, H. (2023). Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. arXiv preprint. arXiv:2310.11511↩︎