Introduction
Large Language Models (LLMs) are powerful but inherently limited by their training cutoff and lack of dynamic access to factual knowledge. Retrieval-Augmented Generation (RAG)1 is a technique that combines the power of LLMs with external information retrieval systems to generate factually grounded, up-to-date responses. In a RAG pipeline, the model retrieves relevant documents from a vector store (like FAISS, Chroma, or Pinecone) based on a user query and then conditions its response on those documents.
RAG addresses key LLM limitations: knowledge cutoff dates, hallucination on factual queries, and inability to access private or domain-specific data. By augmenting generation with retrieved context, RAG systems can provide accurate, attributable answers while maintaining the fluency of modern LLMs.
Goals for the Week
- Understand the motivation and architecture of RAG: retrieval + generation.
- Learn document preprocessing: chunking strategies, metadata extraction.
- Use embedding models to convert text into semantic vectors.
- Store and retrieve documents from vector databases efficiently.
- Implement end-to-end RAG pipelines with LangChain, LlamaIndex, or HuggingFace.
- Evaluate retrieval quality (precision, recall) and generation quality (faithfulness, relevance).
Learning Guide
Videos & Courses
Recommended Courses:
- LangChain: Chat with Your Data — DeepLearning.AI
- Covers document loading, splitting, embeddings, vector stores, and retrieval
- Note: Some APIs (RetrievalQA, ConversationBufferMemory) are outdated; see updated tutorials below
- Building Applications with Vector Databases — DeepLearning.AI (Optional)
- Hands-on with Pinecone, Weaviate, and other vector DBs
Implementation Guides
LangChain:
- Building a RAG Application — Official tutorial with updated APIs
- Chat with Your Data - GitHub Repo — Course materials and notebooks
HuggingFace:
- RAG with SmolAgents — Simple RAG with open models
- RAG with Transformers — Using HF RAG architecture
Provider APIs:
- OpenAI Retrieval Guide — Using embeddings and fine-tuning
- Anthropic Contextual Retrieval — Advanced retrieval techniques
- Cohere RAG — RAG with Cohere embeddings and reranking
Vector Databases:
- FAISS Documentation — Facebook’s similarity search library
- Chroma — Open-source embedding database
- Pinecone — Managed vector database
- Weaviate — Open-source vector search engine
Key Research Papers
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks1: Original RAG paper combining retrieval with seq2seq models.
- In-Context Retrieval-Augmented Language Models2: REALM approach for pre-training with retrieval.
- Improving Language Models by Retrieving from Trillions of Tokens3: RETRO architecture integrating retrieval at scale.
- Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection4: Adaptive retrieval and self-correction in RAG.
Practice
1. Basic RAG Pipeline
Objective: Build an end-to-end RAG system.
Tasks:
- Ingest a document collection (PDFs, text files, web scraping)
- Implement chunking strategies: fixed-size, sentence-based, semantic
- Generate embeddings with sentence-transformers or OpenAI
- Store in FAISS or ChromaDB
- Build query interface: retrieve top-k, format context, generate answer
- Compare: answers with vs. without retrieval
2. Chunking Strategy Comparison
Objective: Compare different chunking methods.
Tasks:
- Implement 3 strategies:
- Fixed-size (500 tokens, 100 overlap)
- Sentence-based (group 3-5 sentences)
- Semantic (split on topic boundaries)
- Measure retrieval accuracy for each
- Analyze: How does chunk size affect precision/recall?
3. Hybrid Retrieval
Objective: Combine semantic and keyword search.
Tasks:
- Implement BM25 (keyword) retrieval
- Implement dense (embedding) retrieval
- Combine scores:
final_score = α * semantic + (1-α) * keyword - Find optimal α via grid search
- Compare: hybrid vs. dense-only vs. BM25-only
4. RAG Evaluation
Objective: Measure system quality.
Tasks:
- Create test set: 20 questions with ground-truth answers
- Compute retrieval metrics: precision@5, recall@5, MRR
- Evaluate generation: Use LLM-as-judge or RAGAS
- Error analysis: Categorize failure modes
5. Domain-Specific RAG
Objective: Build RAG for a specific domain.
Options:
- Academic: Papers from arXiv in your field
- Technical: API documentation, Stack Overflow
- Legal: Case law, regulations
- Medical: Research articles, clinical guidelines
Requirements:
- 100+ documents
- Metadata extraction (authors, dates, categories)
- Filtered retrieval (e.g., “papers after 2020”)
- Citation tracking
References
-
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., … & Kiela, D. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, 33. arXiv:2005.11401. ↩︎ ↩︎
-
Guu, K., Lee, K., Tung, Z., Pasupat, P., & Chang, M. (2020). REALM: Retrieval-Augmented Language Model Pre-Training. International Conference on Machine Learning (ICML). arXiv:2002.08909. ↩︎
-
Borgeaud, S., Mensch, A., Hoffmann, J., Cai, T., Rutherford, E., Millican, K., … & Sifre, L. (2022). Improving Language Models by Retrieving from Trillions of Tokens. International Conference on Machine Learning (ICML). arXiv:2112.04426. ↩︎
-
Asai, A., Wu, Z., Wang, Y., Sil, A., & Hajishirzi, H. (2023). Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. arXiv preprint. arXiv:2310.11511. ↩︎