Ajitesh Shukla
banner
ajitesh1.bsky.social
Ajitesh Shukla
@ajitesh1.bsky.social
Mathematical Research ( Geometric Topology, Differential Geometry), Large Language Models, Natural Language Processing, Quantum Computing, Cryptography, LORD KRISHNA IS GOD OF MATH
Reposted by Ajitesh Shukla
2) They find that GRPO is biased.
- The length normalization prefers shorter correct answers, and longer incorrect answers. -> length bias
- The std normalization prefers too easy or too hard questions over average questions. -> difficulty bias

They introduce Dr. GRPO to remove above biases.
March 22, 2025 at 2:20 AM
Reposted by Ajitesh Shukla
Paper: Compute Optimal Scaling of Skills: Knowledge vs Reasoning ( arxiv.org/abs/2503.10061 )
Compute Optimal Scaling of Skills: Knowledge vs Reasoning
Scaling laws are a critical component of the LLM development pipeline, most famously as a way to forecast training decisions such as 'compute-optimally' trading-off parameter count and dataset size, a...
arxiv.org
March 21, 2025 at 8:11 PM
Reposted by Ajitesh Shukla
I think it's a limit (a slightly generalised pullback), the universal thing that completes this diagram
March 20, 2025 at 5:13 PM