Instructor: Archit Parnami, PhD
Semester: Spring 2025
Timings: Tuesday, 5:30 pm - 8:15 pm
Introduction
This 14-week course introduces students to the practical applications of large language models, covering key concepts such as transformers, LLM architecture, prompting, fine-tuning, retrieval-augmented generation, evaluation, deployment, and building full-stack applications.
Week 1: Introduction to Generative AI and LLMs
- Overview of generative AI and NLP evolution
- LLM capabilities and risks
- Model families and usage (OpenAI, Hugging Face, Anthropic, Cohere)
Week 2: Transformers
- Transformer architecture: encoder, decoder, attention
- Self-attention, positional encoding, multi-head attention
- Pretraining tasks: MLM, CLM
Week 3: Introduction to Large Language Models
- High-level LLM architecture and tokenization
- Text generation, sampling techniques (top-k, nucleus, temperature)
- From pretraining to instruction tuning
Week 4: Prompt Engineering
- Zero-, one-, few-shot prompting
- Instruction prompting, role prompting
- Prompt templates and libraries (PromptTools, LangChain)
Week 5: Retrieval-Augmented Generation (RAG)
- Limitations of LLM memory
- Dense vector retrieval, embeddings
- Building a simple RAG pipeline
Week 6: LLM Tool Use and LangChain
- LLM tool APIs (search, calculator, code execution)
- LangChain agents and tools
- Chains and routing logic
Week 7: Evaluation of LLMs
- Human evaluation, BLEU, ROUGE, METEOR
- LLM-as-a-judge and preference-based evaluation
- RAGAS, TruLens, Promptfoo
Week 8: Fine-Tuning LLMs
- Non-generative (BERT) fine-tuning
- SFT: instruction fine-tuning of decoder models
- Tools: Hugging Face Trainer, ChatTemplates
Week 9: Multimodal LLMs
- Visual Language Models (e.g., GPT-4V, LLaVA)
- Architectures for combining image and text
- Applications: captioning, VQA, visual chat
Week 10: Agents, Memory, and Planning
- What are LLM agents?
- Memory types (short/long-term)
- Planning, state tracking, ReAct, AutoGPT
Week 11: LLM Applications and APIs
- OpenAI, Anthropic, Hugging Face API access
- Flask, FastAPI integration with LLMs
- Streamlit frontend integration
Week 12: LLM Deployment
- Hosting LLMs: vLLM, Ollama, TGI
- Inference optimization: quantization, batching, streaming
- Cloud deployment via Docker, Hugging Face Spaces
Week 13: Scaling and Cost Efficiency
- Batching, KV caching, rate limiting
- Quantization with bitsandbytes, 4/8-bit inference
- FrugalGPT and model cascades
Week 14: Final Project – Full LLM Application
- Build and deploy a complete LLM-powered app
- Use prompts, APIs, RAG, LangChain, Streamlit
- Deliverables: code, demo, README, video/slide
Course Deliverables
- Weekly exercises & notebooks
- Final project (GitHub repo + demo)
- Participation in peer feedback and weekly discussions