Spaces:
Running
on
Zero
Running
on
Zero
| # Z-Image Turbo - Technical Stack Report | |
| **Version:** 15.0 | |
| **Last Updated:** December 2025 | |
| **Space URL:** https://huggingface.co/spaces/lulavc/Z-Image-Turbo | |
| --- | |
| ## Overview | |
| Z-Image Turbo is a high-performance AI image generation and transformation application built on Hugging Face Spaces. It leverages the Z-Image-Turbo model from Alibaba's Tongyi-MAI team with multiple performance optimizations for fast, high-quality image synthesis. | |
| --- | |
| ## Core Model | |
| ### Z-Image-Turbo (Tongyi-MAI) | |
| | Specification | Details | | |
| |---------------|---------| | |
| | **Model Name** | `Tongyi-MAI/Z-Image-Turbo` | | |
| | **Architecture** | Scalable Single-Stream Diffusion Transformer (S3-DiT) | | |
| | **Parameters** | 6 Billion | | |
| | **License** | Apache 2.0 | | |
| | **Precision** | BFloat16 | | |
| | **Inference Steps** | 8 (optimized distilled model) | | |
| | **Guidance Scale** | 0.0 (classifier-free) | | |
| ### Key Model Features | |
| - **Sub-second latency** on enterprise GPUs | |
| - **Photorealistic image generation** with exceptional detail | |
| - **Bilingual text rendering** (English & Chinese) | |
| - **Distilled architecture** for fast inference without quality loss | |
| - **Consumer GPU compatible** (<16GB VRAM) | |
| --- | |
| ## Hardware Infrastructure | |
| ### ZeroGPU (Hugging Face Spaces) | |
| | Specification | Details | | |
| |---------------|---------| | |
| | **GPU** | NVIDIA H200 | | |
| | **VRAM** | 70GB per workload | | |
| | **Compute Capability** | 9.0 | | |
| | **Allocation** | Dynamic (on-demand) | | |
| | **Tensor Packing** | ~28.7GB | | |
| ### Benefits | |
| - Free GPU access for demos | |
| - Dynamic allocation reduces idle costs | |
| - H200 enables advanced optimizations (FP8, FlashAttention-2) | |
| - No dedicated GPU management required | |
| --- | |
| ## Performance Optimizations | |
| ### 1. FP8 Dynamic Quantization (torchao) | |
| ```python | |
| from torchao.quantization import quantize_, float8_dynamic_activation_float8_weight | |
| quantize_(pipe.transformer, float8_dynamic_activation_float8_weight()) | |
| ``` | |
| | Metric | Improvement | | |
| |--------|-------------| | |
| | **Inference Speed** | 30-50% faster | | |
| | **Memory Usage** | ~50% reduction | | |
| | **Quality Impact** | Minimal (imperceptible) | | |
| **How it works:** Quantizes transformer weights and activations to FP8 format dynamically during inference, reducing memory bandwidth requirements and enabling faster matrix operations on H200's FP8 tensor cores. | |
| --- | |
| ### 2. FlashAttention-2 via SDPA | |
| ```python | |
| torch.backends.cuda.enable_flash_sdp(True) | |
| torch.backends.cuda.enable_mem_efficient_sdp(True) | |
| ``` | |
| | Metric | Improvement | | |
| |--------|-------------| | |
| | **Attention Speed** | 2-4x faster | | |
| | **Memory Usage** | O(n) instead of O(nΒ²) | | |
| | **Quality Impact** | None (mathematically equivalent) | | |
| **How it works:** PyTorch's Scaled Dot-Product Attention (SDPA) backend automatically uses FlashAttention-2 on compatible hardware (H200), computing attention without materializing the full attention matrix. | |
| --- | |
| ### 3. cuDNN Auto-Tuning | |
| ```python | |
| torch.backends.cudnn.benchmark = True | |
| ``` | |
| | Metric | Improvement | | |
| |--------|-------------| | |
| | **Convolution Speed** | 5-15% faster | | |
| | **First Run** | Slightly slower (tuning) | | |
| | **Subsequent Runs** | Optimized kernels cached | | |
| **How it works:** Enables cuDNN's auto-tuner to find the fastest convolution algorithms for the specific input sizes and hardware configuration. | |
| --- | |
| ### 4. VAE Tiling | |
| ```python | |
| pipe.vae.enable_tiling() | |
| ``` | |
| | Metric | Improvement | | |
| |--------|-------------| | |
| | **Max Resolution** | Unlimited (memory permitting) | | |
| | **Memory Usage** | Significantly reduced for large images | | |
| | **Quality Impact** | Minimal (potential tile boundaries) | | |
| **How it works:** Processes large images in tiles rather than all at once, enabling generation of high-resolution images (2K+) without running out of VRAM. | |
| --- | |
| ### 5. VAE Slicing | |
| ```python | |
| pipe.vae.enable_slicing() | |
| ``` | |
| | Metric | Improvement | | |
| |--------|-------------| | |
| | **Batch Processing** | More memory efficient | | |
| | **Memory Usage** | Reduced peak usage | | |
| | **Quality Impact** | None | | |
| **How it works:** Processes VAE encoding/decoding in slices along the batch dimension, reducing peak memory usage when processing multiple images. | |
| --- | |
| ## Software Stack | |
| ### Dependencies | |
| | Package | Version | Purpose | | |
| |---------|---------|---------| | |
| | `diffusers` | Latest (git) | Diffusion model pipelines | | |
| | `transformers` | β₯4.44.0 | Text encoders, tokenizers | | |
| | `accelerate` | β₯0.33.0 | Device management, optimization | | |
| | `torchao` | β₯0.5.0 | FP8 quantization | | |
| | `sentencepiece` | Latest | Tokenization | | |
| | `gradio` | Latest | Web UI framework | | |
| | `spaces` | Latest | ZeroGPU integration | | |
| | `torch` | 2.8.0+cu128 | Deep learning framework | | |
| | `PIL/Pillow` | Latest | Image processing | | |
| ### Runtime Environment | |
| | Component | Details | | |
| |-----------|---------| | |
| | **Python** | 3.10 | | |
| | **CUDA** | 12.8 | | |
| | **Platform** | Hugging Face Spaces | | |
| | **SDK** | Gradio | | |
| --- | |
| ## Application Features | |
| ### Generate Tab (Text-to-Image) | |
| | Feature | Details | | |
| |---------|---------| | |
| | **Pipelines** | `DiffusionPipeline` | | |
| | **Input** | Text prompt | | |
| | **Styles** | 10 presets (None, Photorealistic, Cinematic, Anime, Digital Art, Oil Painting, Watercolor, 3D Render, Fantasy, Sci-Fi) | | |
| | **Aspect Ratios** | 18 options (1024px to 2048px) | | |
| | **Steps** | 4-16 (default: 8) | | |
| | **Seed Control** | Manual or random | | |
| | **Output Format** | PNG | | |
| | **Share** | HuggingFace CDN upload | | |
| ### Transform Tab (Image-to-Image) | |
| | Feature | Details | | |
| |---------|---------| | |
| | **Pipeline** | `ZImageImg2ImgPipeline` | | |
| | **Input** | Image upload + text prompt | | |
| | **Strength** | 0.1-1.0 (transformation intensity) | | |
| | **Styles** | Same 10 presets | | |
| | **Auto-Resize** | Supports 512-2048px (multiple of 16) | | |
| | **Steps** | 4-16 (default: 8) | | |
| | **Output Format** | PNG | | |
| ### Supported Resolutions | |
| | Category | Resolutions | | |
| |----------|-------------| | |
| | **Standard** | 1024x1024, 1344x768, 768x1344, 1152x896, 896x1152, 1536x640, 1216x832, 832x1216 | | |
| | **XL** | 1536x1536, 1920x1088, 1088x1920, 1536x1152, 1152x1536 | | |
| | **MAX** | 2048x2048, 2048x1152, 1152x2048, 2048x1536, 1536x2048 | | |
| --- | |
| ## UI/UX Design | |
| ### Theme | |
| - **Color Scheme:** Blue gradient (#e8f4fc to #d4e9f7) | |
| - **Primary Color:** #2563eb (buttons, active elements) | |
| - **Secondary Color:** #3b82f6 (accents) | |
| - **Background:** Light blue gradient | |
| - **Cards:** White with subtle shadows | |
| ### Components | |
| - Centered header with lightning bolt icon | |
| - Tabbed interface (Generate / Transform) | |
| - Two-column layout (controls | output) | |
| - Example prompts with one-click loading | |
| - Share button for CDN uploads | |
| - Copy-to-clipboard for image links | |
| --- | |
| ## Performance Benchmarks | |
| ### Generation Speed (1024x1024, 8 steps) | |
| | Configuration | Time | | |
| |---------------|------| | |
| | **Baseline (BF16 only)** | ~5-6 seconds | | |
| | **With All Optimizations** | ~3-4 seconds | | |
| | **Improvement** | ~2 seconds faster (~40%) | | |
| ### Memory Usage | |
| | Configuration | VRAM | | |
| |---------------|------| | |
| | **Baseline (BF16)** | ~12GB | | |
| | **With FP8 Quantization** | ~6GB | | |
| | **Reduction** | ~50% | | |
| --- | |
| ## Architecture Diagram | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Gradio Web Interface β | |
| β βββββββββββββββββββββββ βββββββββββββββββββββββββββββββ β | |
| β β π¨ Generate Tab β β β¨ Transform Tab β β | |
| β β - Prompt input β β - Image upload β β | |
| β β - Style selector β β - Transformation prompt β β | |
| β β - Aspect ratio β β - Strength slider β β | |
| β β - Steps/Seed β β - Style/Steps/Seed β β | |
| β βββββββββββββββββββββββ βββββββββββββββββββββββββββββββ β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β | |
| βΌ | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β ZeroGPU (@spaces.GPU) β | |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β NVIDIA H200 (70GB VRAM) β β | |
| β β βββββββββββββββββββ βββββββββββββββββββββββββββ β β | |
| β β β pipe_t2i β β pipe_i2i β β β | |
| β β β (Text-to-Img) β β (Img-to-Img) β β β | |
| β β ββββββββββ¬βββββββββ ββββββββββββββ¬βββββββββββββ β β | |
| β β β β β β | |
| β β βΌ βΌ β β | |
| β β βββββββββββββββββββββββββββββββββββββββββββββββ β β | |
| β β β Z-Image Transformer (6B) β β β | |
| β β β βββββββββββββββββββββββββββββββββββββββ β β β | |
| β β β β FP8 Quantized (torchao) β β β β | |
| β β β β FlashAttention-2 (SDPA backend) β β β β | |
| β β β βββββββββββββββββββββββββββββββββββββββ β β β | |
| β β βββββββββββββββββββββββββββββββββββββββββββββββ β β | |
| β β β β β | |
| β β βΌ β β | |
| β β βββββββββββββββββββββββββββββββββββββββββββββββ β β | |
| β β β VAE Decoder β β β | |
| β β β βββββββββββββββββββββββββββββββββββββββ β β β | |
| β β β β Tiling enabled (large images) β β β β | |
| β β β β Slicing enabled (memory efficient) β β β β | |
| β β β βββββββββββββββββββββββββββββββββββββββ β β β | |
| β β βββββββββββββββββββββββββββββββββββββββββββββββ β β | |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β | |
| βΌ | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Output β | |
| β - PNG image (full quality) β | |
| β - Seed value (reproducibility) β | |
| β - Optional: HuggingFace CDN share link β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| --- | |
| ## Known Limitations | |
| ### torch.compile Incompatibility | |
| The Z-Image transformer contains code patterns (`device = x[0].device`) that are incompatible with PyTorch's dynamo tracer. This prevents using `torch.compile` for additional speedup. | |
| ### FlashAttention-3 | |
| `FlashAttention3Processor` is not yet available in diffusers for the Z-Image architecture. The application uses FlashAttention-2 via SDPA backend instead. | |
| ### torchao Version Warning | |
| A deprecation warning appears for `float8_dynamic_activation_float8_weight`. This is cosmetic and doesn't affect functionality. | |
| --- | |
| ## Future Optimization Opportunities | |
| 1. **Ahead-of-Time Compilation (AoTI)** - When Z-Image becomes compatible with torch.compile | |
| 2. **INT8 Quantization** - Alternative to FP8 for broader hardware support | |
| 3. **Model Sharding** - For even larger batch processing | |
| 4. **Speculative Decoding** - Potential speedup for iterative generation | |
| 5. **LoRA Support** - Custom style fine-tuning | |
| --- | |
| ## Credits | |
| - **Model:** Alibaba Tongyi-MAI Team (Z-Image-Turbo) | |
| - **Infrastructure:** Hugging Face (Spaces, ZeroGPU, Diffusers) | |
| - **Optimizations:** PyTorch Team (SDPA, torchao) | |
| - **Application:** Built with Gradio | |
| --- | |
| ## License | |
| - **Model:** Apache 2.0 | |
| - **Application Code:** MIT | |
| - **Dependencies:** Various open-source licenses | |
| --- | |
| *This report documents the technical implementation of Z-Image Turbo v15 as deployed on Hugging Face Spaces.* | |