- The length normalization prefers shorter correct answers, and longer incorrect answers. -> length bias
- The std normalization prefers too easy or too hard questions over average questions. -> difficulty bias
They introduce Dr. GRPO to remove above biases.
- The length normalization prefers shorter correct answers, and longer incorrect answers. -> length bias
- The std normalization prefers too easy or too hard questions over average questions. -> difficulty bias
They introduce Dr. GRPO to remove above biases.