OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation Paper • 2606.17628 • Published 2 days ago • 21
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models Paper • 2602.12036 • Published Feb 12 • 95
Running on CPU Upgrade Featured 3.21k The Smol Training Playbook 📚 3.21k The secrets to building world-class LLMs