1

I wrote up my understanding for how LLMs generate text responses to text prompts (at a somewhat practical yet high level), focusing on example numerical vectors and how they are transformed at each step.

How can I understand the "reasoning" portion of DeepSeek-R1 in a similar way?

From my cursory glance, it seems that R1 generates several alternate "think" steps, which is basically just more text input to the LLM. Then it uses the standard LLM attention mechanisms to figure out the best set of reasoning steps? So I'm trying to better understand, is this actually "reasoning", or just more statistical stuff.

Would be helpful to know how the reasoning is performed, so I can then piece together in my mind how the vectors get transformed, essentially. Don't need to write an involved answer with example numerical vectors, I can do that, but knowing at a higher level what is going on, and hinting at how the reasoning works at a practical level, would be of great help.

1 Answers1

1

First of all, unlike standard LLMs, DeepSeek-R1 employs a Mixture of Experts (MoE) framework comprising 671 billion parameters. However, during any given inference, only 37 billion parameters are activated. This selective activation routes specific inputs to the most relevant subset of expert networks, optimizing computational efficiency and enabling specialization across various reasoning tasks.

DeepSeek-R1’s reasoning process involves two interleaved processes where it samples multiple candidate reasoning paths aka "thinking" steps as structured-text similar to chain-of-thought (CoT) prompting and uses attention across original-prompt/generated-CoT/conclusion to weigh the relevance and coherence of each reasoning step. Unlike standard LLM text generation, DeepSeek-R1 is constrained to output structured CoT steps following special template with token marks like <think> and <answer>. And for each sampled reasoning path, attention heads score the relevance of each step to their conclusion via attention weights between the contextualized embeddings of the conclusion and the intermediate steps. Paths with higher scores are selected to be concatenated with the original prompt and fed back into the model to generate the final answer.

Therefore the 'reasoning' process is emergent from the model’s training on reasoning-heavy datasets such as math problems and logical puzzles. However, it lacks traditional symbolic logic as the 'reasoning' rules here are not hard-coded but approximated via token probabilities predicted by a MoE-RL-fine-tuned LLM.

cinch
  • 11,000
  • 3
  • 8
  • 17