Taking Alpamayo to New Heights with Driving Foundation Models and Closed-Loop Training

Community Article Published June 1, 2026

Alpamayo 2

NVIDIA today announced a significant expansion of the Alpamayo open platform for developing reasoning-based autonomous vehicles (AVs). Introduced at CES 2026 and expanded at GTC 2026, the Alpamayo open platform provides researchers and developers with a flexible, high-performance, and scalable suite of models, datasets, simulation, and training tools for developing and evaluating modern reasoning-based AV stacks in realistic closed-loop settings.

Since its launch, the Alpamayo open platform has seen rapid adoption across both industry and academia, with Alpamayo reasoning models collectively surpassing 400,000 downloads to date. The Alpamayo open platform has further been highlighted with a Computex 2026 Best Choice Award in the Vehicle Technology & Smart Cockpit category for pioneering open, reasoning-based AV development.

Based in part on feedback from the community, we are introducing several new additions to the Alpamayo open platform across models and training infrastructure. Collectively, these additions enable developers to build AV stacks with more powerful reasoning models and train them in settings closer to deployment. Further, we are releasing benchmarks to measure progress in the field and accelerate research into reasoning and closed-loop driving.

NVIDIA Alpamayo 2 Super: Growing Reasoning VLAs into Multi-Task Driving Foundation Models

Alpamayo 2 Super is a significant update to NVIDIA’s suite of open reasoning VLA models, designed to be a multi-task driving foundation model for the AV community. Alpamayo 2 Super is built on the Cosmos 3 Super Reasoner 32-billion parameter VLM backbone (3x the number of parameters as prior Alpamayo models), is RL post-trained, and introduces support for surround-view camera inputs, reasoning autolabeling, 2D grounding, and meta-action outputs.

✨ Key Highlights of this New Release

  • 3x parameter scale: Alpamayo 2 Super scales to 32B parameters (compared to previous 10B-parameter generations), improving reasoning, 3D spatial understanding and trajectory prediction in long‑tail scenarios.

  • Full-surround perception: Expands from front-focused cameras to 360-degree situational awareness across front, side and rear views, giving the model complete context for safer lane changes, merges and intersection crossing.

  • Meta-Actions: Adds Meta-Action outputs — macro actions such as yield, lane change and stop — so the model predicts high-level driving decisions for downstream planning in addition to trajectories and chain-of-causation (CoC) traces.

  • Reasoning auto-labeling and 2D grounding: Introduces reasoning auto‑labeling with 2D grounding so the foundation model can provide high-quality reasoning labels, accelerating data annotation cycles.

  • State-of-the-art performance in multiple aspects including reasoning quality, trajectory accuracy, alignment, and more.

  • Easy-to-use scripts and notebooks that enable application across a wide range of use cases, from autolabeling new data to fine-tuning with it.

🤖 Popular Use Cases of Alpamayo 2 Super

  • AV model distillation – Leverage the pretrained weights as an offline teacher to develop onboard-ready models (e.g., via output or feature supervision during training).

  • Data labeling and curation - Identify interesting scenarios and label them with plausible future trajectories and reasoning traces.

  • Model customization - Post-train the model with your own data and labels, specialize Alpamayo to best suit your needs.

  • Visual question answering – Ask specific questions about scenes, leverage the outputs for data curation or autolabeling.

  • AV evaluation – Generate trajectories and reasoning traces to evaluate the outputs of smaller, edge-deployed models, or assess alternative outcomes by providing different navigation commands to the Alpamayo 2 Super model.

Closed-Loop Training Through NVIDIA AlpaGym

AlpaGym is an open source, high‑throughput, closed‑loop reinforcement learning (RL) framework. Where open‑loop training evaluates models against recorded data and generates a single round of actions, AlpaGym runs models through continuous decision and observation cycles in AlpaSim, with every braking, steering and navigation choice affecting the environment. As a result, AlpaGym exposes compounding errors and edge‑case failures that static datasets miss and allows models to learn from experience.

Built on AlpaSim simulation microservices, AlpaGym enables efficient, scalable, closed-loop RL to push the frontier of driving performance.

AlpaGym

✨ Key Highlights of this New Release

  • Fully-fledged RL training infrastructure: AlpaGym launches with all the tooling and components necessary to get started with closed-loop reinforcement learning immediately. Specifically, GRPO with default reward functions is supported at release.

  • Intelligent reactive agents: We are releasing a learning-based traffic model (based on CAT-K) that enables simulated agents to react and respond to the ego-vehicle’s actions in closed-loop.

  • Scalable from 1 GPU to multi-node clusters: AlpaGym was designed with scale in mind, supporting local development with a single GPU all the way up to large-scale multi-node runs on clusters.

  • Modularity and Extensibility: Users can extend AlpaGym with their own infrastructural components (e.g., policy models, rewards, RL algorithms) and run on their own data and compute infrastructure. Further, the amount of resources allocated to different components (i.e., numbers of rollout and training workers) can all be easily configured.

🤖 Popular Use Cases of AlpaGym

  • AV model training – Train your end-to-end AV model in closed-loop on a variety of scenarios from the Physical AI AV and Physical AI AV NuRec datasets (or any other dataset, or even generative model, supported by AlpaSim).

  • Closing the Training <-> Deployment Gap – Most AV training recipes are open-loop in nature (e.g., supervised fine-tuning to match ground truth data), however, real-world deployment is closed-loop as the ego-vehicle’s actions influence the surrounding world. Closed-loop training helps address this gap by exposing AV models to the consequences of their actions at training time.

AV Benchmarks: Measuring Progress in the Field

Autonomous driving research has made significant progress in recent years, but it remains hard to evaluate policies in a realistic, reproducible way. To address this, we are releasing two HF Mirror challenges to benchmark closed-loop driving behavior as well as reasoning capability.

  1. The AlpaSim End-to-End Closed-Loop Challenge evaluates submitted driving policies in closed-loop, measuring how long models can drive without at-fault incidents (e.g., collisions, road excursions, etc) in a variety of reconstructed real-world scenarios.
  2. The Physical AI AV Reasoning Challenge invites the research community to build models that can reason about long-tail scenarios in natural language.

We hope that these challenges will bolster new innovations in the field, and ultimately accelerate the development of level 4 AV systems.

NVIDIA Alpamayo Recipes: A New Home for AV Development

Alpamayo Recipes is a new centralized hub of end-to-end Alpamayo workflows designed to help developers quickly build, adapt, and deploy Alpamayo-based applications. This repo brings together battle-tested workflows across the Alpamayo ecosystem, including post-training recipes (supervised fine-tuning and reinforcement learning), quantization recipes, etc. Whether you are experimenting locally or building a full production stack, this repository is intended to be the primary starting point for developers to learn, customize, and extend Alpamayo for their own use cases.

🤖 Popular Recipes

  • Post-Training Scripts – Fine-tune Alpamayo models to perform well on your own tasks of interest, with your own data, labels, and losses.

  • Quantization – Optimize Alpamayo models’ memory usage by reducing their numerical precision.

Conclusion

Reasoning models in AVs will unlock new capabilities and levels of safety for the next generation of autonomous systems. We hope that this update, and future Alpamayo releases, will accelerate the whole industry with new resources and tools.

Resources

For more information about this release, please check the below resources.

Alpamayo Models:

  • Alpamayo 2 Super Model Weights → coming summer 2026
  • Alpamayo 2 Super Inference Code → coming summer 2026 (with post-training scripts to follow shortly after)

AlpaGym:

AV Challenges:

Alpamayo Recipes:

Research on Reasoning VLA Models, End-To-End AV Simulation and Training, and Physical AI Safety:

Community

Sign up or log in to comment