99

cutechicken

5 8 438

cutechicken99

AI & ML interests

None yet

Recent Activity

reacted to SeaWolf-AI's post with 👀 about 6 hours ago

🚀 Adding a GPU without building one AI is usually framed as "how smart is the model / how many GPUs did you buy." The real bottleneck is elsewhere — how efficiently you use the GPUs you already have. Training happens once; inference runs the entire time users use your product. So a service's economics come down to cost per token. Inference acceleration uses software to pull several times more out of the same GPU — the effect of plugging in one more "virtual GPU." VIDRAFT's VKAE, measured (B200, same-harness, no quality loss): Qwen3.5-35B-A3B (MoE): 25.7 → 601 tok/s (23.4×) Darwin-36B-Opus (in-house MoE): 25.0 → 280.8 (11.2×) 10,000+ tok/s peak aggregate under concurrency The key: it's reproducible — model + serving shipped as one container. docker pull vidraft/qwen35-vkae:601 Don't take our word for it — run it yourself. The mechanism will be released as a paper. 🏆 Leaderboard & demo 👉 https://huggingface.co/spaces/VIDraft/vkae Articles 👉 https://huggingface.co/blog/FINAL-Bench/vkae-leaderboard

liked a model about 7 hours ago

FINAL-Bench/Darwin-36B-Opus-VKAE

liked a model about 8 hours ago

FINAL-Bench/Qwen3.5-35B-A3B-VKAE

View all activity

Organizations

reacted to SeaWolf-AI's post with 👀 about 6 hours ago

Post

887

🚀 Adding a GPU without building one

AI is usually framed as "how smart is the model / how many GPUs did you buy." The real bottleneck is elsewhere — how efficiently you use the GPUs you already have.

Training happens once; inference runs the entire time users use your product. So a service's economics come down to cost per token. Inference acceleration uses software to pull several times more out of the same GPU — the effect of plugging in one more "virtual GPU."

VIDRAFT's VKAE, measured (B200, same-harness, no quality loss):

Qwen3.5-35B-A3B (MoE): 25.7 → 601 tok/s (23.4×)
Darwin-36B-Opus (in-house MoE): 25.0 → 280.8 (11.2×)
10,000+ tok/s peak aggregate under concurrency
The key: it's reproducible — model + serving shipped as one container.

docker pull vidraft/qwen35-vkae:601
Don't take our word for it — run it yourself. The mechanism will be released as a paper.

🏆 Leaderboard & demo 👉 VIDraft/vkae
Articles 👉 https://huggingface.co/blog/FINAL-Bench/vkae-leaderboard

liked a model about 7 hours ago

FINAL-Bench/Darwin-36B-Opus-VKAE

Text Generation • Updated about 7 hours ago • 19

liked a model about 8 hours ago

FINAL-Bench/Qwen3.5-35B-A3B-VKAE

Text Generation • Updated about 8 hours ago • 22

liked a model 1 day ago

FINAL-Bench/metacog-adapter-JGOS-31B-Citizen

Updated 3 days ago • 36 • 17

liked a Space 2 days ago

VKAE Leaderboard

🚀

VIDRAF Kernel-level inference acceleration engine

liked 10 models 4 days ago

upvoted a collection 4 days ago

Metacognition Adapters

Collection

Per-model metacognition adapters from VIDRAFT Darwin/Chimera platform + AETHER metacognition-emergence technology. • 11 items • Updated 3 days ago • 18

liked a dataset 4 days ago

ginigen-ai/Metacognition-Bench

Updated about 8 hours ago • 140 • 24

liked a Space 4 days ago

Metacognition Leaderboard

🧠

Explore LLM metacognition rankings and submit models

reacted to ginigen-ai's post with ❤️🔥 4 days ago

Post

5156

🍳 The RoboCasa Kitchen Leaderboard
What does it take for a robot to handle kitchen chores the way a person does? It has to see (Vision), understand instructions (Language), and actually act (Action) — and VLA (Vision-Language-Action) models are emerging as the answer. They're the bridge between large multimodal models and real-world embodied control.

RoboCasa Kitchen is a leading robot-learning benchmark in which a single-arm robot (Franka Panda) performs 24 atomic manipulation tasks — picking up cups and bowls, opening drawers and doors, turning faucets, pressing buttons, and more — inside a photorealistic simulated kitchen. Because the layout and object placement are randomized every episode, it tests genuine generalization rather than memorized motions. The score (success rate, SR) is the average fraction of the 24 tasks completed as instructed, measured over multiple seeds so results aren't down to luck.

The catch: this benchmark has no official leaderboard, and protocols (number of demonstrations, evaluation setup) differ from paper to paper, leaving scores scattered. Lining the numbers up naively quickly turns into an apples-to-oranges comparison.

This leaderboard fixes that by collecting published scores with their sources and comparing only what is genuinely comparable. It's split into three tables:

🏆 Kitchen 24-task (matched) — head-to-head under identical conditions (per the RLDX-1 Technical Report). This is the core ranking you can actually trust.
➕ Other protocols — self-reported under different setups (e.g. fewer demos). Not directly comparable, so kept separate.
🤖 GR1-Tabletop — a different, humanoid-based variant suite, separated to avoid confusion.

Any researcher can submit their own model's score directly, and submissions are reviewed before they appear on the board. Every number links to its source paper, so you can verify it yourself.

👉 ginigen-ai/robocasa-kitchen-leaderboard

99

AI & ML interests

Recent Activity

Organizations

cutechicken's activity

VKAE Leaderboard

Metacognition Leaderboard