Andrew Vaziri
4threv.com
Andrew Vaziri
@4threv.com
AI, Robotics & Society
There will still be niches where open-source won’t have the specialized data to compete, but you can’t copyright math or logic. The fundamental capability to reason will continue to be actively developed. (8/8)
January 28, 2025 at 7:52 AM
In conclusion, Deepseek's advancements are newsworthy, but the market didn’t seem to know how to interpret the impact. The sky isn't falling if open-source models are competitive. (7/8)
January 28, 2025 at 7:48 AM
I'm not minimizing Deepseek's achievement, but this isn't a totally unprecedented result. The idea that larger models would be distilled into smaller, more practical ones is expected. (6/8)
January 28, 2025 at 7:47 AM
2. The authors claim that it was very important to train R-1 on the outputs of larger models, a process called distilling. This is literally the first thing you’d try to make a model smaller while keeping performance. (5/8)
January 28, 2025 at 7:46 AM
Models that don’t specialize in reasoning, but instead focus on writing style or broad knowledge, will still be needed. These models are more resource-intensive to train than R-1. (4/8)
January 28, 2025 at 7:46 AM
1. R-1 is competing in a specialized class of AI models focused on "chain of reasoning". Some of the (reinforcement learning) tricks used to make it train efficiently only work in areas where there's a definitive correct answer, like math or formal logic. (3/8)
January 28, 2025 at 7:45 AM
R-1 is newsworthy, don't get me wrong, but a few things to keep in mind when assessing how much this should change your worldview: (2/8)
January 28, 2025 at 7:45 AM