Week 6: Fine-Tuning Large Language Models

Introduction

Fine-tuning adapts pre-trained language models to specific tasks or behaviors. This week focuses on two core strategies: (1) supervised fine-tuning for classification and structured tasks using encoder models like BERT, and (2) instruction fine-tuning of decoder-only LLMs (e.g., LLaMA) to follow human instructions or exhibit dialogue-friendly behavior. We’ll cover how fine-tuning changes model weights to align with task objectives, when to use parameter‑efficient techniques like LoRA1 or adapters2, and how to interact with pre‑fine‑tuned instruct models using chat templates.

Goals for the Week

  • Understand the difference between task-specific fine-tuning and instruction tuning.
  • Fine-tune a classification model (e.g., BERT) using Hugging Face Transformers and Datasets.
  • Experiment with chat-oriented LLMs using Chat Templates without performing full fine-tuning.
  • Learn the challenges and infrastructure requirements of full supervised fine-tuning (SFT)3.
  • Explore parameter-efficient methods like LoRA1 and adapters2.
  • Evaluate the effects of fine-tuning on output quality and behavior.

Learning Guide

Videos

Fine-tuning LLMs: Short Course (1 hr 25 mins)

Readings

Task-Specific Supervised Fine-Tuning (classification, seq labeling, etc.)
Fine-tune encoder(-decoder) models (e.g., BERT/T5) on labeled tasks to optimize task-specific losses (e.g., classification cross-entropy).

Instruction Fine-Tuning (SFT for chat/instructions)
Supervised next-token training of decoder-only LLMs on instruction→response or chat data to improve instruction following and dialogue behavior.

Parameter-Efficient Fine-Tuning (PEFT)

RLHF Pipeline (optional/advanced)

  • Instruction tuning with human feedback (RLHF)3: SFT on human demonstrations → reward modeling → PPO

Programming Practice

  • Task-Specific Supervised Fine-Tuning: Fine-tune a bert-base-uncased model on the IMDb or AG News classification dataset using Hugging Face Transformers.
  • Instruction Fine-Tuning: Load a pre‑fine‑tuned instruction‑following model like HuggingFaceH4/zephyr-7b-beta and interact with it using chat templates and role prompting.
  • Multimodal Chat Templates (Optional): Build a short prompt that mixes text plus an image using AutoProcessor.apply_chat_template and run it with an image-text-to-text pipeline.
  • Supervised Fine-Tuning (Optional): Write a custom conversation dataset and simulate fine‑tuning using the SFTTrainer (TRL) or peft in dry‑run mode (small batch size, limited steps).
  • Visualize the effect of fine-tuning by comparing predictions before and after adaptation.

References


  1. Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., … & Chen, W. (2021). LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685↩︎ ↩︎ ↩︎

  2. Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., … & Gelly, S. (2019). Parameter-Efficient Transfer Learning for NLP. In International Conference on Machine Learning (ICML). arXiv:1902.00751↩︎ ↩︎ ↩︎

  3. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., … & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35. arXiv:2203.02155↩︎ ↩︎

  4. Li, X. L., & Liang, P. (2021). Prefix-Tuning: Optimizing Continuous Prompts for Generation. arXiv:2101.00190↩︎

  5. Liu, X., Ji, K., Fu, Y., Tam, W. L., Du, Z., Yang, Z., & Tang, J. (2021). P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv:2110.07602↩︎

  6. Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). QLoRA: Efficient Finetuning of Quantized LLMs. arXiv:2305.14314↩︎