--- title: Chain-of-Thought Prompting: Making AI Think Step by Step description: Learn how asking AI to reason through problems step by step improves accuracy on complex tasks. Research-backed techniques with practical examples. date: January 20, 2026 author: Robert Soares category: prompt-engineering --- When you ask someone a tricky question, you expect them to think before answering. Show their work. Walk through the logic. AI works the same way. Tell it to reason step by step, and it reasons better. This technique is called chain-of-thought prompting. It's one of the most significant discoveries in prompt engineering, and it's remarkably simple: instead of asking for a direct answer, you ask the model to think through the problem first. ## The Core Idea Chain-of-thought prompting guides AI models to break complex problems into intermediate steps before reaching a final answer. Instead of jumping to a conclusion, the model works through the logic piece by piece. [Google's research team documented this in 2022](https://research.google/blog/language-models-perform-reasoning-via-chain-of-thought/), showing that the technique mimics how humans approach difficult problems. We don't solve hard math problems in one leap. We break them down, solve each step, and build toward the answer. The same approach helps AI models think more carefully. **Without chain-of-thought:** > Q: "Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 balls. How many tennis balls does Roger have now?" > A: "11" The model gives an answer, but you can't see how it got there. And sometimes it gets it wrong without the reasoning step. **With chain-of-thought:** > Q: "Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 balls. How many tennis balls does Roger have now?" > A: "Roger started with 5 balls. He bought 2 cans with 3 balls each, so that's 2 x 3 = 6 new balls. 5 + 6 = 11 tennis balls total." Same answer, but the reasoning is visible. And when the model shows its work, it makes fewer mistakes. ## What the Research Shows The original Google research found [striking improvements on complex tasks](https://arxiv.org/abs/2201.11903). On the GSM8K benchmark of math word problems, chain-of-thought prompting with PaLM 540B achieved 58% accuracy, surpassing even fine-tuned GPT-3 models that achieved 55%. Later work on self-consistency, where the model generates multiple reasoning paths and takes the majority answer, pushed this to 74% accuracy. Nearly doubling performance through prompting alone. For some tasks, the gains were even more dramatic. On sports understanding questions, PaLM 540B with chain-of-thought reached 95% accuracy, actually exceeding unaided human performance at 84%. The key finding: you don't need to fine-tune models or build large training datasets. The improvement comes purely from how you prompt. ## Two Ways to Use It There are two main approaches to chain-of-thought prompting. ### Zero-Shot Chain-of-Thought The simplest version: add "let's think step by step" or similar phrasing to your prompt. > "A bat and ball cost $1.10 together. The bat costs $1 more than the ball. How much does the ball cost? Let's think through this step by step." This single phrase prompts the model to show its reasoning. [Research on zero-shot chain-of-thought](https://www.promptingguide.ai/techniques/cot) shows it can substantially improve accuracy on reasoning tasks without any examples. Other effective phrases: - "Think through this carefully before answering." - "Walk me through your reasoning." - "Break this problem into steps." - "Show your work." ### Few-Shot Chain-of-Thought More powerful: provide examples that demonstrate the reasoning process you want. > "Question: Sarah has 8 apples. She gives 3 to Tom and buys 5 more. How many apples does she have? > Reasoning: Sarah started with 8 apples. She gave away 3, so 8 - 3 = 5 apples. Then she bought 5 more, so 5 + 5 = 10 apples. > Answer: 10 apples. > > Question: A store sells notebooks for $4 each. If Jake buys 3 notebooks and pays with a $20 bill, how much change does he get? > Reasoning: Three notebooks at $4 each costs 3 x $4 = $12. Jake pays with $20, so his change is $20 - $12 = $8. > Answer: $8. > > Question: A train travels 60 miles per hour. If it needs to travel 180 miles, how long will the trip take?" The examples teach the model what kind of reasoning you expect. This typically outperforms zero-shot approaches on complex tasks. ## When Chain-of-Thought Helps Most This technique isn't equally useful for everything. It shines in specific situations. **Multi-step reasoning problems.** Anything where you need to connect multiple pieces of information: math word problems, logic puzzles, planning tasks. **Tasks where the answer isn't obvious.** If a question requires inference or deduction rather than simple recall, step-by-step reasoning helps. **Analysis requiring judgment.** Weighing pros and cons, comparing options, evaluating arguments. **Debugging and troubleshooting.** Working through what could be wrong and why. [The Prompt Engineering Guide notes](https://www.promptingguide.ai/techniques/cot) that chain-of-thought prompting shows the biggest gains on arithmetic, commonsense reasoning, and symbolic reasoning tasks. ## When It Doesn't Help Chain-of-thought isn't always the right choice. **Simple, direct questions.** "What's the capital of France?" doesn't need step-by-step reasoning. It just needs retrieval. **Creative tasks.** Writing, brainstorming, generating ideas. These benefit from flow, not methodical reasoning. **Already-good performance.** If the model handles a task well without chain-of-thought, adding it may just waste tokens. [Recent research from Wharton](https://gail.wharton.upenn.edu/research-and-insights/tech-report-chain-of-thought/) found that for reasoning-focused models like GPT-4o and o3-mini, chain-of-thought prompting produced only minimal improvements (2-3%) while increasing response time by 20-80%. The models already reason internally. The takeaway: chain-of-thought helps most when the model would otherwise skip important reasoning steps. If the model is already thinking carefully, prompting it to do so adds overhead without benefit. ## Making It Work Better A few techniques improve chain-of-thought effectiveness. ### Be Specific About What Steps To Include Instead of generic "think step by step," specify what kind of reasoning you want. > "Analyze this marketing proposal. First, identify the main goal. Then evaluate whether the proposed tactics align with that goal. Next, assess the budget implications. Finally, give your recommendation with reasoning." The explicit steps guide the model's reasoning process. ### Use Self-Consistency Generate multiple reasoning paths and take the most common answer. If you ask the model to solve a problem three times with chain-of-thought, and it gets "42" twice and "38" once, go with 42. This technique, from [follow-up research by Google](https://research.google/blog/language-models-perform-reasoning-via-chain-of-thought/), improved GSM8K accuracy from 58% to 74%. Most AI interfaces don't make this easy to do manually, but if accuracy matters, it's worth the extra queries. ### Show High-Quality Examples For few-shot chain-of-thought, example quality matters enormously. Your examples should: - Cover different types of problems within your domain - Show clear, correct reasoning - Match the complexity of problems you'll actually ask - Demonstrate the level of detail you want Bad examples teach bad reasoning. Good examples teach good reasoning. ### Verify the Steps, Not Just the Answer One advantage of chain-of-thought is transparency. You can see the reasoning. Use this. When the model shows its work, check the steps. If the logic is flawed, you'll catch errors even when the final answer looks plausible. ## Practical Examples ### Financial Analysis **Without chain-of-thought:** > "Should we invest in expanding our sales team?" **With chain-of-thought:** > "We're considering expanding our sales team by 3 people. Current team: 5 reps. Average revenue per rep: $400K. Cost per rep (salary + overhead): $120K. Current close rate: 22%. Available leads we can't currently work: 200/month. > > Analyze whether we should expand. Think through the revenue potential, costs, break-even timeline, and risks. Show your reasoning step by step." The second version gives the model data to reason about and requests explicit step-by-step analysis. ### Technical Troubleshooting **Without chain-of-thought:** > "My website is slow. Why?" **With chain-of-thought:** > "My e-commerce website has become slow over the past month. Average page load: 6 seconds (was 2 seconds). Traffic hasn't increased significantly. We added a new product recommendation widget and upgraded our CMS last month. > > Think through the possible causes systematically. Consider each change we made, the infrastructure, and common performance issues. Explain your reasoning before suggesting what to check first." The prompt requests systematic reasoning rather than a quick guess. ### Strategic Decision-Making **Without chain-of-thought:** > "Should we enter the European market?" **With chain-of-thought:** > "We're a B2B SaaS company ($5M ARR, 200 customers, all in US). We're considering European expansion. > > Analyze this decision step by step: > 1. What are the potential benefits? > 2. What are the major challenges and costs? > 3. What would we need to have in place first? > 4. What are the alternatives to full expansion? > 5. Based on this analysis, what's your recommendation? > > Reason through each point before giving your final recommendation." The structured steps ensure the model considers multiple angles rather than jumping to a conclusion. ### Code Review **Without chain-of-thought:** > "Review this function for bugs." **With chain-of-thought:** > "Review this Python function that calculates shipping costs. Walk through the logic step by step: > 1. What does each section of the code do? > 2. Are there edge cases that could cause problems? > 3. Does the logic match what the function name suggests? > 4. Are there potential performance issues? > > [code block] > > Show your analysis for each point, then summarize any issues found." The explicit steps guide a thorough review rather than a surface-level glance. ## Common Mistakes ### Overusing It Not every prompt needs chain-of-thought. Adding it to simple tasks just wastes tokens and time. "Let's think step by step about what the capital of France is" is silly. Save the technique for tasks that actually require reasoning. ### Under-Specifying the Steps "Think step by step" is a starting point, but for complex tasks, you'll get better results by specifying what steps you want. Generic instruction: "Think step by step." Better: "First identify the key variables. Then calculate each intermediate value. Finally, combine them for the answer." ### Ignoring the Reasoning If you ask for step-by-step reasoning, read it. Check it. The whole point is to see the model's thinking so you can catch errors. When you skip past the reasoning to grab the final answer, you lose the technique's main benefit. ### Using It With Wrong Model Types [2025 research](https://gail.wharton.upenn.edu/research-and-insights/tech-report-chain-of-thought/) shows that reasoning models (like o1, o3-mini) gain little from explicit chain-of-thought prompting because they already reason internally. You're essentially asking them to do what they already do. For these models, straightforward prompts often work better. Chain-of-thought adds value mainly for standard models that would otherwise skip reasoning steps. ## Beyond Basic Chain-of-Thought Several variations extend the basic technique. **Tree of Thoughts:** Instead of one reasoning path, explore multiple branches. Useful when there are different approaches to a problem and you want to explore several. **Self-Consistency:** Generate multiple chain-of-thought answers and take the majority. Improves accuracy when a single pass might go wrong. **Chain-of-Table:** For data analysis, manipulate tables step by step rather than reasoning about them in text. [Research shows](https://www.promptingguide.ai/techniques/cot) this improves structured data handling. These are more advanced applications. Master basic chain-of-thought first before moving to variations. ## Quick Reference **What it is:** Prompting AI to show reasoning steps before giving a final answer. **When to use it:** - Multi-step problems - Complex reasoning tasks - Analysis requiring judgment - Troubleshooting and debugging **When to skip it:** - Simple, direct questions - Creative tasks - Tasks where the model already performs well - Reasoning-focused models (o1, o3-mini, etc.) **How to use it:** - Zero-shot: Add "let's think step by step" or similar - Few-shot: Provide examples showing the reasoning process **How to make it better:** - Specify what steps to include - Use multiple passes and take the majority answer - Verify the reasoning, not just the final answer - Match examples to your actual task complexity The core insight: AI models reason better when you ask them to reason explicitly. Sometimes that's the difference between a wrong answer and a right one.