---
title: "Z-Image-Turbo"
sdk: gradio
app_file: app.py
python_version: "3.10"
short_description: "Z-Image-Turbo text-to-image generation demo"
---

# Z-Image-Turbo Demo

A private HF Mirror Space demo for [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo).

## Runtime Strategy

- **Module-level model load**: The 6B-parameter `ZImagePipeline` is loaded at startup with `torch.bfloat16` and moved to CUDA (`pipe.to("cuda")`), compatible with ZeroGPU's module-level CUDA emulation.
- **Inference**: The `@spaces.GPU(duration=120)` decorator wraps the generation function. Real inference runs 8–9 flow-matching steps.
- **No compilation**: `torch.compile` and AoTI are disabled for ZeroGPU compatibility. SDPA is used as the safe attention fallback.
- **No external APIs**: The original DashScope prompt enhancement and safety checker are omitted to keep the Space self-contained.

## Task

Text-to-image generation. Enter a prompt, pick a resolution, and click **Generate**.

## Limitations

- **Prompt enhancement**: Not included (requires an external LLM API).
- **Safety checker**: Not included to keep the app lightweight; use responsibly.
- **Duration**: `@spaces.GPU(duration=120)` is a conservative starting estimate. Calibrate with real calls (cold-start ~30–60s, warm ~10–20s on RTX PRO 6000 Blackwell) and adjust to `measured_max × 1.4`.
- **VRAM**: ~18 GB at bf16; fits in ZeroGPU `large` (48 GB).

## How to Test

```python
from gradio_client import Client
import os

client = Client("fffiloni/space-factory-universal-20260606-204445-77e49c93", token=os.environ["HF_TOKEN"])
result = client.predict(
    "A serene mountain landscape at sunset",
    "1024x1024",
    42,
    True,
    9,
    api_name="/generate",
)
print(result)
```

## Health

Check the `/health` endpoint (`api_name="health"`) for a lightweight status ping that does not load weights or run GPU work.