--- title: "Z-Image-Turbo" sdk: gradio app_file: app.py python_version: "3.10" short_description: "Z-Image-Turbo text-to-image generation demo" --- # Z-Image-Turbo Demo A private HF Mirror Space demo for [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo). ## Runtime Strategy - **Module-level model load**: The 6B-parameter `ZImagePipeline` is loaded at startup with `torch.bfloat16` and moved to CUDA (`pipe.to("cuda")`), compatible with ZeroGPU's module-level CUDA emulation. - **Inference**: The `@spaces.GPU(duration=120)` decorator wraps the generation function. Real inference runs 8–9 flow-matching steps. - **No compilation**: `torch.compile` and AoTI are disabled for ZeroGPU compatibility. SDPA is used as the safe attention fallback. - **No external APIs**: The original DashScope prompt enhancement and safety checker are omitted to keep the Space self-contained. ## Task Text-to-image generation. Enter a prompt, pick a resolution, and click **Generate**. ## Limitations - **Prompt enhancement**: Not included (requires an external LLM API). - **Safety checker**: Not included to keep the app lightweight; use responsibly. - **Duration**: `@spaces.GPU(duration=120)` is a conservative starting estimate. Calibrate with real calls (cold-start ~30–60s, warm ~10–20s on RTX PRO 6000 Blackwell) and adjust to `measured_max × 1.4`. - **VRAM**: ~18 GB at bf16; fits in ZeroGPU `large` (48 GB). ## How to Test ```python from gradio_client import Client import os client = Client("fffiloni/space-factory-universal-20260606-204445-77e49c93", token=os.environ["HF_TOKEN"]) result = client.predict( "A serene mountain landscape at sunset", "1024x1024", 42, True, 9, api_name="/generate", ) print(result) ``` ## Health Check the `/health` endpoint (`api_name="health"`) for a lightweight status ping that does not load weights or run GPU work.