|
|
|
<br>DeepSeek-R1 is based upon DeepSeek-V3, a mix of specialists (MoE) model recently open-sourced by DeepSeek. This base design is fine-tuned using Group [Relative Policy](http://59.37.167.938091) Optimization (GRPO), a [reasoning-oriented variant](https://gitlab.payamake-sefid.com) of RL. The research group also [carried](http://209.87.229.347080) out understanding distillation from DeepSeek-R1 to open-source Qwen and Llama designs and launched a number of variations of each |