deep-diver.bsky.social
@deep-diver.bsky.social
Simple Summarization on DeepSeek-R1

RL is key
↳ but hard to make it helpful alone.
↳ 4 stage pipeline (good start + reasoning RL + SFT + safety RL) = o1 level performance.
↳ Distilling R1-Zero outputs = o1-mini level.

Model: huggingface.co/deepseek-ai
Paper: github.com/deepseek-ai/...
January 21, 2025 at 1:03 PM
updates on ai-paper-reviewer!

core
✦ supporting open source Layout Parsing model from
@OpenDataLab_AI

✦ scrapping papers from
@openreviewnet

blog
✦ display papers by the dates added in
@huggingface
Daily Papers. Up to 3 latest days are managed, then archived

link 👇
January 17, 2025 at 8:12 AM