Summary

The paper proposes a new approach to chain-of-thought reasoning. The idea is that LMs could reason more effectively with some intermediate computation that is not in natural language (which is the case in explicit CoT). The main advantage is that it speeds up the process (throughput is very high) because the reasoning happens “vertically” among the hidden states in different layers instead of “horizontally” by producing intermediate words one by one like in explicit CoT.

Dataset / Task

Methodology

Basically, at high level, the idea is knowledge distillation

  1. Mind-Reading the Teacher: We train a student model to “read” the teacher’s “thought process”— the continuous hidden states during intermediate reasoning step generation. The student model, rather than replicating these steps, uses some of the teacher’s hidden states to produce the answer.
  2. Thought Emulation: We then employ knowledge distillation (Hinton et al., 2015; Kim & Rush, 2016) to train an emulator that predicts the teacher’s hidden states from the input “vertically”, across layers, eliminating the need for “horizontal” explicit reasoning steps.
  3. Couple and Optimize: Finally, we combine the emulator, which predicts the teacher’s thought process, with the mind-reading student, which produces the final answer from the emulated teacher’s thought process. This combined system is then optimized end-to-end, allowing the student model to develop its own reasoning methods that might differ from the teacher’s approach.

Untitled

Untitled

Untitled

Experiment

Untitled

Shortcoming & Limitations