NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers? Paper • 2606.24530 • Published 4 days ago • 55
ProcMEM: Learning Reusable Procedural Memory from Experience via Non-Parametric PPO for LLM Agents Paper • 2602.01869 • Published Feb 2 • 1
ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research Paper • 2606.07591 • Published 30 days ago • 95
SkillOpt: Executive Strategy for Self-Evolving Agent Skills Paper • 2605.23904 • Published May 22 • 247
MMSkills: Towards Multimodal Skills for General Visual Agents Paper • 2605.13527 • Published May 14 • 121
SkillX: Automatically Constructing Skill Knowledge Bases for Agents Paper • 2604.04804 • Published Apr 6 • 35
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published Mar 20 • 352
PRBench: End-to-end Paper Reproduction in Physics Research Paper • 2603.27646 • Published Mar 29 • 29
Dr. MAS: Stable Reinforcement Learning for Multi-Agent LLM Systems Paper • 2602.08847 • Published Feb 9 • 29
InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery Paper • 2602.08990 • Published Feb 9 • 79
Spec-o3: A Tool-Augmented Vision-Language Agent for Rare Celestial Object Candidate Vetting via Automated Spectral Inspection Paper • 2601.06498 • Published Jan 10 • 1
Spec-o3 Collection A Tool-Augmented Vision-Language Agent for Rare Celestial Object Candidate Vetting via Automated Spectral Inspection • 5 items • Updated Jan 13 • 1