Reinforcement Learning and LLM Post-Training: How Does DeepSeek R1 Gain Reasoning Capabilities?
Published:
This is a very detailed doc for RL and LLM Reasoning. I folked this page from Wei-Min Lu.
Link to the blog
Reinforcement Learning and LLM Post-Training: How Does DeepSeek R1 Gain Reasoning Capabilities?