π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows Paper • 2605.14678 • Published May 19 • 108
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling Paper • 2605.13301 • Published May 13 • 165
MinT: Managed Infrastructure for Training and Serving Millions of LLMs Paper • 2605.13779 • Published May 13 • 223
Teaching Thinking Models to Reason with Tools: A Full-Pipeline Recipe for Tool-Integrated Reasoning Paper • 2605.06326 • Published May 7 • 26
Diversity-Incentivized Exploration for Versatile Reasoning Paper • 2509.26209 • Published Sep 30, 2025 • 17
CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning Paper • 2509.20712 • Published Sep 25, 2025 • 20