Week 4: Prompt Engineering

Introduction

Prompt engineering is the art and science of crafting inputs that elicit the desired output from a large language model. Since most LLMs (e.g., GPT-4, Claude, Cohere, open-source chat models) are not fine-tuned for every use case, prompts serve as the primary interface to shape model behavior. This week, we’ll cover how to structure prompts—from basic instructions to few-shot examples and chain-of-thought (CoT) reasoning¹—plus practical templates, iteration strategies, and evaluation. Prompt engineering enables task adaptation without retraining the model and is the fastest lever to improve LLM application quality.

Goals for the Week

Understand foundational prompt formats: instruction², few-shot³, and CoT prompting¹.
Analyze how prompt phrasing, constraints, and structure affect output quality and consistency.
Use prompt templates programmatically (string templates, chat message stacks, prompt libraries).
Build robust prompts for common tasks: summarization, translation, classification, extraction, reasoning.
Evaluate and iterate prompts with checklists and lightweight metrics.

Learning Guide

Videos

ChatGPT Prompt Engineering for Developers - Short Course

Readings

Prompt engineering guide by DAIR.AI: Exhaustive list of prompt techniques with examples

Foundational Research:

Chain-of-Thought Prompting¹: Enables complex reasoning in large language models through intermediate reasoning steps
Few-Shot Learning³: In-context learning with demonstrations for task adaptation without parameter updates
Instruction Following²: Training and prompting language models to follow natural language instructions
Prompt Design⁴: Systematic approaches to iterative prompt development and optimization
Prompting examples by API’s
- HuggingFace API
- OpenAI API
  - Prompt Engineering - OpenAI API
  - OpenAI - Guide to Prompt Engineering
- Anthropic API
  - Claude - Prompt Engineering
  - Github Notebooks /Examples
- Cohere API

Best Practices

Use these practical patterns to improve reliability and control:

Be clear and explicit
- Define the task, audience, constraints, and desired tone.
- Provide the input within delimiters (e.g., triple backticks …).
Structure the output
- Request specific formats: JSON, tables, bullet lists, HTML.
- Specify keys/fields, ordering, and validation rules.
Set constraints
- Word/sentence/character limits; required sections; enumerated steps.
- Temperature and randomness control in API settings.
Encourage reasoning
- Ask for step-by-step analysis or “explain before answering”.
- For math/logic, require explicit intermediate steps or self-checks¹.
Use few-shot examples
- Show 1–3 examples with inputs and ideal outputs³.
- Keep examples representative and concise.
Iterate systematically
- Identify issues (length, focus, format, correctness) and refine⁴.
- Add missing constraints; change audience; enforce schema.

Prompt Templates (Copy/Paste)

Summarization (length + focus)

prompt = f"""
You are an assistant that writes concise summaries.
Summarize the text delimited by triple backticks in at most 3 sentences,
focusing on shipping and delivery issues.

Text: ```{text}```
"""

Classification (labels + justification)

prompt = f"""
Classify the sentiment of the review as one of: positive, neutral, negative.
Return JSON with fields: label, confidence, rationale (1–2 sentences).

Review: ```{review}```
"""

Information Extraction (strict JSON)

prompt = f"""
Extract the following fields from the text, return STRICT JSON only:
{{
  "product": string,
  "brand": string,
  "issue": string|null,
  "shipping": {{
     "mentioned": boolean,
     "details": string|null
  }}
}}

If unknown, use null.
Text: ```{text}```
"""

Chain-of-Thought (visible reasoning)

prompt = f"""
Solve the problem step by step. Show your reasoning, then provide the final answer
on a new line prefixed with "Answer:".

Problem: ```{problem}```
"""

Chain-of-Thought (concise, hidden reasoning)

prompt = f"""
Think through the problem privately. Then provide only the final answer
on a single line prefixed with "Answer:" without revealing intermediate steps.

Problem: ```{problem}```
"""

Data Validation (self-check)

prompt = f"""
Return JSON per the schema below. After producing the JSON, verify that:
- All required keys are present; no extra keys.
- Values match the expected types.
If the validation fails, correct the JSON and output only the corrected JSON.

Schema: {{
  "title": string,
  "items": array<string>,
  "count": integer
}}

Input: ```{input_text}```
"""

Debugging and Iteration

Common issues and targeted fixes:

Output too long → Add word/sentence limits; require bullet points.
Wrong focus → Specify audience and what to emphasize/omit.
Unstructured output → Enforce JSON/table format with explicit keys.
Hallucinated facts → Require citations or “only use provided context”.
Schema drift → Add self-checks and explicit validation instructions.

Iteration loop:

Draft prompt → Run → Inspect output → Identify gaps → Add constraints/examples → Re-run

Evaluation & Guardrails

Determinism: Use temperature=0 for reproducibility in evaluation.
Schema checks: Parse/validate JSON; reject invalid outputs.
Self-consistency: Ask for multiple solutions and pick majority (optional)⁵.
Safety: Instruct the model to refuse harmful content; constrain to provided context⁶.
Metrics: Track task accuracy, format compliance rate, length adherence.

Programming Practice

Core Tasks

Create and test prompts for summarization, sentiment classification, extraction, and QA using HuggingFace/OpenAI/Cohere APIs.
Implement zero-shot vs. few-shot vs. CoT variants; compare accuracy and format compliance.
Add JSON schema validation to extraction prompts; record valid vs. invalid rates.
Iterate prompts to fix one concrete issue (e.g., too verbose, missing fields) and document before/after outputs.

Quick Start Code (OpenAI Example)

import openai
import json

def test_prompt(prompt, test_cases):
    results = []
    for case in test_cases:
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": prompt.format(**case)}],
            temperature=0
        )
        results.append({
            "input": case,
            "output": response.choices[0].message.content,
            "valid_json": is_valid_json(response.choices[0].message.content)
        })
    return results

def is_valid_json(text):
    try:
        json.loads(text)
        return True
    except:
        return False

Assessment Rubric (Flexible - Adapt to Your Needs)

Task performance: Does the output meet the objective? (0–2)
Format compliance: Does it match requested structure? (0–2)
Constraint adherence: Length/keys/tone satisfied? (0–2)
Clarity: Is the output readable and useful? (0–2)
Iteration quality: Are refinements targeted and effective? (0–2)

Max: 10 points. Document examples and decisions.

References

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., … & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35. arXiv:2201.11903. ↩︎ ↩︎ ↩︎ ↩︎
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., … & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35. arXiv:2203.02155. ↩︎ ↩︎
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., … & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33. arXiv:2005.14165. ↩︎ ↩︎ ↩︎
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2021). Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. arXiv:2107.13586. ↩︎ ↩︎
Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., … & Zhou, D. (2022). Self-consistency improves chain of thought reasoning in language models. arXiv:2203.11171. (ICLR 2023) ↩︎
Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., … & Kaplan, J. (2022). Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv:2204.05862. ↩︎