Zero-shot prompting gives the model instructions without examples: "Classify the following customer message as positive, negative, or neutral." Few-shot prompting adds examples that demonstrate the desired behaviour: "Here are three examples of correctly classified messages, then classify this one." The conventional wisdom is that few-shot is better. In practice, it depends — and using the wrong technique for your use case adds prompt length and cost without improving quality.
When Zero-Shot Is Sufficient
Zero-shot works well when the task is well-described in natural language, the desired behaviour is within the model's training distribution, and the output format is simple or standard. For tasks like: "Summarise this document in three bullet points," "Extract the invoice date from this text," or "Is this email spam? Answer yes or no" — clear zero-shot instructions often produce equivalent or better results than few-shot examples, with lower token cost and simpler maintenance.
- Tasks with clear natural language descriptions that the model understands without demonstration
- Common task types: summarisation, basic extraction, simple classification, translation
- When example quality is uncertain — bad examples hurt more than no examples
- When the output format is standard and the model is familiar with it
When Few-Shot Examples Are Worth the Cost
Few-shot examples add value when the task has domain-specific patterns the model is unlikely to infer from instructions alone, when the output format or style is non-standard and hard to describe precisely, or when zero-shot produces inconsistent results that examples can anchor.
- Domain-specific classification: your categories have specific meanings that differ from generic language (e.g., "escalate" means something specific in your support context)
- Specific output format: a structured format with unusual conventions that is easier to demonstrate than describe
- Style and tone: when examples demonstrate voice and register more precisely than instructions can
- Ambiguous boundary cases: examples can show how to handle the specific edge cases your task produces
Designing Effective Few-Shot Examples
- Use real examples from your actual data, not constructed ones — real examples represent the distribution and edge cases that matter
- Cover the boundary cases, not just the easy ones — examples that are clearly in one category do not help the model handle the hard ones
- Balance categories — if you have 3 categories, include examples of all 3 in roughly equal proportion
- Keep examples as similar in length and format to real inputs as possible
- Test the sensitivity of your output to example order — some models are sensitive to which example comes last
- Measure the quality improvement over zero-shot before committing to the added token cost
Chain-of-Thought: When Reasoning Matters
Chain-of-thought prompting asks the model to show its reasoning before providing a final answer. It significantly improves performance on tasks requiring multi-step reasoning: mathematical calculations, logical inference, complex classification that requires considering multiple factors. For simple classification or extraction tasks, chain-of-thought adds output length and cost without improving accuracy — use it specifically for reasoning-intensive tasks where the intermediate steps are required to reach the correct conclusion.