Chain of Thought

Let’s think step by step.

The idea behind chain-of-thought is trivially simple: Instead of directly asking the model to generate an answer, we add a prefix like “Let’s think step by step.”.

Step-by-step answers

Let’s start with the question-answerer and add a parameter to the prompt so that we can see the effect of different prefixes:

chain_of_thought.py
from fvalues import F

from ice.recipe import recipe


def make_chain_of_thought_prompt(question: str, answer_prefix: str = "") -> str:
    return F(
        f"""Answer the following question:

Question: "{question}"
Answer: "{answer_prefix}
"""
    ).strip()


async def chain_of_thought(
    question: str = "What would happen if the average temperature in Northern California went up by 5 degrees Fahrenheit?",
    answer_prefix: str = "Let's think step by step.",
) -> str:
    prompt = make_chain_of_thought_prompt(question, answer_prefix)
    answer = await recipe.agent().complete(prompt=prompt, stop='"')
    return answer


recipe.main(chain_of_thought)

Let’s first run the recipe without answer prefix:

We get an answer:

If we provide “Let’s think step by step.” as an answer prefix…

…we get a much more elaborate answer:

Step-by-step reasoning for concise answers

In the previous example chain-of-thought is used to elicit a more elaborate answer. However, often chain-of-thought is used in cases where all we want to do is improve the correctness of a final answer, without changing the answer format itself.

We can achieve this by separately eliciting the reasoning and the final answer, so that we can more directly compare the answer to the model without chain-of-thought:

If we now run this script:

We get a summary of the long reasoning chain:

Execution trace (view online)

Exercise

Let’s apply this to the math problem we saw in the chapter on checking reasoning steps:

Beth bakes 4x 2 dozen batches of cookies in a week. If these cookies are shared amongst 16 people equally, how many cookies does each person consume?

The answer:

Inspecting the reasoning, we see that something went wrong in step two:

Exercise: Combine generating reasoning chains with verifiers to generate more reliable reasoning.

Get feedback on exercise solutions

If you want feedback on your exercise solutions, submit them through this form. We—the team at Ought—are happy to give our quick take on whether you missed any interesting ideas.

References

  1. Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed Chi, Quoc Le, and Denny Zhou. Chain of Thought Prompting Elicits Reasoning in Large Language Models

  2. Wang, Xuezhi, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, and Denny Zhou. Self-Consistency Improves Chain of Thought Reasoning in Language Models. March 21, 2022

  3. Kojima, Takeshi, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. Large Language Models Are Zero-Shot Reasoners. May 24, 2022

Last updated