RL is key
↳ but hard to make it helpful alone.
↳ 4 stage pipeline (good start + reasoning RL + SFT + safety RL) = o1 level performance.
↳ Distilling R1-Zero outputs = o1-mini level.
Model: huggingface.co/deepseek-ai
Paper: github.com/deepseek-ai/...
RL is key
↳ but hard to make it helpful alone.
↳ 4 stage pipeline (good start + reasoning RL + SFT + safety RL) = o1 level performance.
↳ Distilling R1-Zero outputs = o1-mini level.
Model: huggingface.co/deepseek-ai
Paper: github.com/deepseek-ai/...
core
✦ supporting open source Layout Parsing model from
@OpenDataLab_AI
✦ scrapping papers from
@openreviewnet
blog
✦ display papers by the dates added in
@huggingface
Daily Papers. Up to 3 latest days are managed, then archived
link 👇
core
✦ supporting open source Layout Parsing model from
@OpenDataLab_AI
✦ scrapping papers from
@openreviewnet
blog
✦ display papers by the dates added in
@huggingface
Daily Papers. Up to 3 latest days are managed, then archived
link 👇