Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields Paper • 2606.11042 • Published 5 days ago • 20
OProver: A Unified Framework for Agentic Formal Theorem Proving Paper • 2605.17283 • Published 28 days ago • 31
InCoder-32B: Code Foundation Model for Industrial Scenarios Paper • 2603.16790 • Published Mar 17 • 312
Understanding by Reconstruction: Reversing the Software Development Process for LLM Pretraining Paper • 2603.11103 • Published Mar 11 • 9
Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization Paper • 2602.22675 • Published Feb 26 • 23
\$OneMillion-Bench: How Far are Language Agents from Human Experts? Paper • 2603.07980 • Published Mar 9 • 27
Understanding by Reconstruction: Reversing the Software Development Process for LLM Pretraining Paper • 2603.11103 • Published Mar 11 • 9
Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization Paper • 2602.22675 • Published Feb 26 • 23
VideoWorld 2: Learning Transferable Knowledge from Real-world Videos Paper • 2602.10102 • Published Feb 10 • 14
Context Forcing: Consistent Autoregressive Video Generation with Long Context Paper • 2602.06028 • Published Feb 5 • 36
Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities Paper • 2601.21937 • Published Jan 29 • 20
Context Forcing: Consistent Autoregressive Video Generation with Long Context Paper • 2602.06028 • Published Feb 5 • 36
Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities Paper • 2601.21937 • Published Jan 29 • 20
Vibe AIGC: A New Paradigm for Content Generation via Agentic Orchestration Paper • 2602.04575 • Published Feb 4 • 17
CoDiQ: Test-Time Scaling for Controllable Difficult Question Generation Paper • 2602.01660 • Published Feb 2 • 8