TAPE: Tool-Guided Adaptive Planning and Constrained Execution in Language Model Agents Paper • 2602.19633 • Published Feb 23 • 9
Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring Paper • 2605.30834 • Published 22 days ago • 9
Mistral Small 4 Collection A state-of-the-art model, open-weight, with a granular Mixture-of-Experts architecture that fuses instruct, reasoning and agentic skills. • 3 items • Updated Mar 16 • 75
Ministral 3 Collection A collection of edge models, with Base, Instruct and Reasoning variants, in 3 different sizes: 3B, 8B and 14B. All with vision capabilities. • 9 items • Updated Dec 2, 2025 • 170
Mistral Medium 3.5 Collection Our first flaship models handling instruction-following, reasoning, and coding in a single set of opened-weights. • 2 items • Updated Apr 29 • 19
Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs Paper • 2605.09063 • Published May 9 • 81
Exploration and Exploitation Errors Are Measurable for Language Model Agents Paper • 2604.13151 • Published Apr 14 • 25
view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 merve, pcuenq, sergiopaniego, burtenshaw, Steveeeeeeen, alvarobartt, SaylorTwift • Apr 2 • 909
Nemotron-Post-Training-v3 Collection Collection of datasets used in the post-training phase of Nemotron Nano, Super, and Ultra v3. • 50 items • Updated 8 days ago • 158
NVIDIA Nemotron v3 Collection Open, Production-ready Enterprise Models • 23 items • Updated 8 days ago • 328
Olmo 3.1 Collection The latest members of the Olmo 3 family: another 3 weeks of RL for 32B Think, the 32B Instruct model, large post-training research datasets... • 9 items • Updated Dec 23, 2025 • 52
Olmo 3 Post-training Collection All artifacts for post-training Olmo 3. Datasets follow the model that resulted from training on them. • 32 items • Updated Dec 23, 2025 • 55
view article Article A Guide to Reinforcement Learning Post-Training for LLMs: PPO, DPO, GRPO, and Beyond karina-zadorozhny • Jan 19 • 28
UniGame: Turning a Unified Multimodal Model Into Its Own Adversary Paper • 2511.19413 • Published Nov 24, 2025 • 21
Reliable and Responsible Foundation Models: A Comprehensive Survey Paper • 2602.08145 • Published Feb 4 • 8
Thinking Makes LLM Agents Introverted: How Mandatory Thinking Can Backfire in User-Engaged Agents Paper • 2602.07796 • Published Feb 8 • 7
Towards Reducible Uncertainty Modeling for Reliable Large Language Model Agents Paper • 2602.05073 • Published Feb 4 • 11
view article Article Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment NormalUhr • Feb 11, 2025 • 126