Introduction
Fine-tuning adapts pre-trained language models to specific tasks or behaviors. This week focuses on two core strategies: (1) supervised fine-tuning for classification and structured tasks using encoder models like BERT, and (2) instruction fine-tuning of decoder-only LLMs (e.g., LLaMA) to follow human instructions or exhibit dialogue-friendly behavior. We’ll cover how fine-tuning changes model weights to align with task objectives, when to use parameter‑efficient techniques like LoRA1 or adapters2, and how to interact with pre‑fine‑tuned instruct models using chat templates.
Goals for the Week
- Understand the difference between task-specific fine-tuning and instruction tuning.
- Fine-tune a classification model (e.g., BERT) using Hugging Face Transformers and Datasets.
- Experiment with chat-oriented LLMs using Chat Templates without performing full fine-tuning.
- Learn the challenges and infrastructure requirements of full supervised fine-tuning (SFT)3.
- Explore parameter-efficient methods like LoRA1 and adapters2.
- Evaluate the effects of fine-tuning on output quality and behavior.
Learning Guide
Videos
Fine-tuning LLMs: Short Course (1 hr 25 mins)
- Reference material: ksm26/Finetuning-Large-Language-Models
Readings
Task-Specific Supervised Fine-Tuning (classification, seq labeling, etc.)
Fine-tune encoder(-decoder) models (e.g., BERT/T5) on labeled tasks to optimize task-specific losses (e.g., classification cross-entropy).
- Fine-Tuning for Text Classification – Hugging Face
- Example: fine-tuning a BERT model for sequence classification. With a labeled downstream dataset, set up a training loop to adapt the base model to your task.
- Fine-Tuning for Classification – Cohere
- Similar objective using the Cohere API.
Instruction Fine-Tuning (SFT for chat/instructions)
Supervised next-token training of decoder-only LLMs on instruction→response or chat data to improve instruction following and dialogue behavior.
-
- Decoder based generative models (base) are instruction-fine-tuned to work in conversational manner. The process is called Supervised Fine-Tuning is not an easy to do process by oneself. Ready to use instruct models are available on HuggingFace for us to experiment with. Using ChatTemplates we can assign roles to model and prompt it to desired feedback.
Parameter-Efficient Fine-Tuning (PEFT)
- Techniques include LoRA1, adapters2, prefix-tuning4, and P-Tuning v25
- QLoRA6 for efficient fine-tuning of quantized large language models
- LoRA: Low-Rank Adaptation
RLHF Pipeline (optional/advanced)
- Instruction tuning with human feedback (RLHF)3: SFT on human demonstrations → reward modeling → PPO
Programming Practice
- Task-Specific Supervised Fine-Tuning: Fine-tune a
bert-base-uncasedmodel on the IMDb or AG News classification dataset using Hugging Face Transformers. - Instruction Fine-Tuning: Load a pre‑fine‑tuned instruction‑following model like
HuggingFaceH4/zephyr-7b-betaand interact with it using chat templates and role prompting. - Multimodal Chat Templates (Optional): Build a short prompt that mixes text plus an image using
AutoProcessor.apply_chat_templateand run it with animage-text-to-textpipeline. - Supervised Fine-Tuning (Optional): Write a custom conversation dataset and simulate fine‑tuning using the
SFTTrainer(TRL) orpeftin dry‑run mode (small batch size, limited steps). - Visualize the effect of fine-tuning by comparing predictions before and after adaptation.
References
-
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., … & Chen, W. (2021). LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685. ↩︎ ↩︎ ↩︎
-
Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., … & Gelly, S. (2019). Parameter-Efficient Transfer Learning for NLP. In International Conference on Machine Learning (ICML). arXiv:1902.00751. ↩︎ ↩︎ ↩︎
-
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., … & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35. arXiv:2203.02155. ↩︎ ↩︎
-
Li, X. L., & Liang, P. (2021). Prefix-Tuning: Optimizing Continuous Prompts for Generation. arXiv:2101.00190. ↩︎
-
Liu, X., Ji, K., Fu, Y., Tam, W. L., Du, Z., Yang, Z., & Tang, J. (2021). P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv:2110.07602. ↩︎
-
Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). QLoRA: Efficient Finetuning of Quantized LLMs. arXiv:2305.14314. ↩︎