Lightnews — Scholar-powered news

Ajitesh Shukla

@ajitesh1.bsky.social

110 followers 610 following 1 posts

Mathematical Research ( Geometric Topology, Differential Geometry), Large Language Models, Natural Language Processing, Quantum Computing, Cryptography, LORD KRISHNA IS GOD OF MATH

Posts Replies Media Videos

Reposted by Ajitesh Shukla

Sung Kim

@sungkim.bsky.social

2) They find that GRPO is biased.
- The length normalization prefers shorter correct answers, and longer incorrect answers. -> length bias
- The std normalization prefers too easy or too hard questions over average questions. -> difficulty bias

They introduce Dr. GRPO to remove above biases.