LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published 24 days ago • 143
Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning Paper • 2601.09708 • Published Jan 14 • 56
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration Paper • 2511.21689 • Published Nov 26, 2025 • 128
view article Article Welcome the NVIDIA Llama Nemotron Nano VLM to HF Mirror Hub nvidia • Jun 27, 2025 • 31
view article Article NVIDIA Releases 3 Million Sample Dataset for OCR, Visual Question Answering, and Captioning Tasks nvidia • Aug 11, 2025 • 76
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs Paper • 2506.05328 • Published Jun 5, 2025 • 21
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Paper • 2504.15271 • Published Apr 21, 2025 • 69
QLIP Collection QLIP is a family of image tokenizers with SOTA reconstruction quality and zero-shot image understanding. • 3 items • Updated 7 days ago • 12
Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models Paper • 2501.14818 • Published Jan 20, 2025 • 9
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published Aug 28, 2024 • 88