Instructor: Archit Parnami, PhD
Semester: Spring 2026
Timings: Tuesday, 5:30 pm - 8:15 pm
Introduction
This 14-week course introduces students to the practical applications of large language models, covering foundational ML concepts, transformer architecture, prompting, tool use, fine-tuning, retrieval-augmented generation, multimodality, agents, evaluation, safety, deployment, scaling, and building full-stack applications.
Lecture 1: Machine Learning Basics
- Supervised vs unsupervised learning, classification vs regression
- Neural networks: MLPs, CNNs, activation functions, loss functions, optimization
- NLP fundamentals: word embeddings, tokenization, RNNs
- Working knowledge of NumPy, Pandas, PyTorch, TensorFlow
Lecture 2: Attention & Transformers
- Core transformer components: embeddings, multi-head self-attention, positional encoding
- Feed-forward layers and layer normalization
- Architectural variants: BERT, GPT
- Hands-on with HuggingFace transformers library
Lecture 3: Introduction to Large Language Models
- High-level LLM architecture and tokenization
- Text generation pipeline: inference and decoding strategies
- Sampling techniques: greedy, top-k, nucleus (top-p), temperature scaling
- End-to-end generation with HuggingFace
Lecture 4: Prompt Engineering
- Foundational formats: instruction, few-shot, chain-of-thought prompting
- Prompt structure and constraints affecting output quality
- Programmatic templates and prompt libraries
- Building prompts for summarization, translation, classification, extraction
Lecture 5: Tool Use with LLMs
- Function calling: tool schemas, parameter passing, result handling
- Integrating external tools: search, calculators, APIs, code interpreters
- Single-step and multi-step tool calling with OpenAI, Anthropic, open-source models
- Error handling and retry logic
Lecture 6: Fine-Tuning Large Language Models
- Task-specific fine-tuning for classification (BERT)
- Instruction fine-tuning (SFT) for decoder-only models
- Parameter-efficient methods: LoRA, adapters
- Chat templates and HuggingFace Trainer
Lecture 7: Retrieval-Augmented Generation (RAG)
- RAG architecture: retrieval + generation for grounded responses
- Document processing: chunking, metadata extraction, embeddings
- Vector databases: FAISS, Chroma, Pinecone
- Building end-to-end RAG pipelines with LangChain or LlamaIndex
Lecture 8: Large Multimodal Models
- Vision-language architectures: CLIP, GPT-4V, LLaVA, Gemini
- Cross-modal embedding alignment and contrastive learning
- Applications: image captioning, VQA, document understanding, multimodal RAG
- Evaluation for relevance, consistency, and hallucination
Lecture 9: Agents & Planning with LLMs
- Agent building blocks: planning, acting, observing, memory
- Architectures: ReAct, planner-executor, CoT + tool use
- Frameworks: LangChain, LlamaIndex, AutoGen, LangGraph, SmolAgents
- Evaluating agent reliability and safety
Lecture 10: Evaluation of LLMs
- Evaluation dimensions: correctness, coherence, faithfulness, toxicity, bias
- Task-specific metrics: BLEU, ROUGE, METEOR, EM/F1
- Model-based evaluation, human preference ranking, A/B testing
- Tools: HELM, TruLens, OpenAI Evals, RAGAS
Lecture 11: Safety, Bias and Ethics in LLMs
- Types of bias: gender, racial, cultural, linguistic, geographic
- Safety risks: prompt injection, jailbreaks, hallucination, toxic generation
- Alignment techniques: RLHF, Constitutional AI, red teaming
- Ethical deployment considerations
Lecture 12: LLM APIs & Deployment
- Integrating LLM APIs with LangChain (OpenAI, Anthropic, Cohere, HuggingFace)
- Web frameworks: Streamlit, FastAPI, LangServe
- Self-hosting models: Ollama, vLLM, Text Generation Inference
- Rate limiting, monitoring, and observability with LangSmith
Lecture 13: Scaling & Cost-Efficiency
- Scaling laws: compute-performance relationships
- Inference optimization: batching, KV caching, continuous batching
- Quantization techniques: GPTQ, AWQ, QLoRA
- Cost reduction strategies: FrugalGPT, model cascades
Lecture 14: Final Project – Full-Stack LLM Application
- Build and deploy a complete LLM-powered application
- Apply techniques: prompting, APIs, RAG, tools, evaluation
- Deliverables: code, demo, technical documentation
Course Deliverables
- Weekly Project Milestones
- Production Ready LLM Application
- Peer Evaluation