Show Your Work: Scratchpads for Intermediate Computation with Language Models

  • Large pre-trained language models perform remarkably well on tasks that can be done “in one pass”, such as generating realistic text (Brown et al., 2020) or synthesizing computer programs (Chen et al., 2021; Austin et al., 2021). However, they struggle with tasks that require unbounded multi-step computation, such as adding integers (Brown et al., 2020) or executing programs (Austin et al., 2021).
  • Surprisingly, we find that these same models are able to perform complex multistep computations—even in the few-shot regime—when asked to perform the operation “step by step”, showing the results of intermediate computations. In particular, we train Transformers to perform multi-step computations by asking them to emit intermediate computation steps into a “scratchpad”. On a series of increasingly complex tasks ranging from long addition to the execution of arbitrary programs, we show that scratchpads dramatically improve the ability of language models to perform multi-step computations. https://arxiv.org/abs/2112.00114 (DeepL)Show your own work: Scratchpad for Intermediate Computation with Language Models
  • Large, pre-trained language models perform notably better on tasks that can be done “in one pass,” such as generating realistic text (Brown et al., 2020) and synthesizing computer programs (Chen et al., 2021; Austin et al., 2021). However, it struggles with tasks that require unbounded multi-step computation, such as integer addition (Brown et al., 2020) and program execution (Austin et al., 2021).
  • Surprisingly, however, the same model was found to be able to perform complex multi-step calculations, even with a few-shot system, when asked to perform operations “step by step,” showing the results of intermediate calculations. In particular, we trained Transformer to perform multi-step calculations by having it output intermediate computation steps to a “scratchpad”. We show that the Scratchpad dramatically improves the language model’s ability to perform multi-step computations in a series of increasingly complex tasks, from long addition to arbitrary program execution.

shodaiiiiii than Let’s think step by step, Let’s work this out in a step by step way to be sure we have the right answer. I hear it’s higher. https://arxiv.org/pdf/2211.01910.pdf… image More interesting than this result is the Automatic Prompt Engineer algorithm proposed by this paper.

  • image

This page is auto-translated from /nishio/Step-by-Step(LLM) using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.