I wrote up my understanding for how LLMs generate text responses to text prompts (at a somewhat practical yet high level), focusing on example numerical vectors and how they are transformed at each step.
How can I understand the "reasoning" portion of DeepSeek-R1 in a similar way?
From my cursory glance, it seems that R1 generates several alternate "think" steps, which is basically just more text input to the LLM. Then it uses the standard LLM attention mechanisms to figure out the best set of reasoning steps? So I'm trying to better understand, is this actually "reasoning", or just more statistical stuff.
Would be helpful to know how the reasoning is performed, so I can then piece together in my mind how the vectors get transformed, essentially. Don't need to write an involved answer with example numerical vectors, I can do that, but knowing at a higher level what is going on, and hinting at how the reasoning works at a practical level, would be of great help.