Tongyi-MAI/Z-Image

An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Image Generation
Tongyi-MAI/Z-Image

Z-Image is the foundation model of the โšก๏ธ- Image family, engineered for good quality, robust generative diversity, broad stylistic coverage, and precise prompt adherence. While Z-Image-Turbo is built for speed, Z-Image is a full-capacity, undistilled transformer designed to be the backbone for creators, researchers, and developers who require the highest level of creative freedom.

z-image

๐ŸŒŸ Key Features

  • Undistilled Foundation: As a non-distilled base model, Z-Image preserves the complete training signal. It supports full Classifier-Free Guidance (CFG), providing the precision required for complex prompt engineering and professional workflows.
  • Aesthetic Versatility: Z-Image masters a vast spectrum of visual languagesโ€”from hyper-realistic photography and cinematic digital art to intricate anime and stylized illustrations. It is the ideal engine for scenarios requiring rich, multi-dimensional expression.
  • Enhanced Output Diversity: Built for exploration, Z-Image delivers significantly higher variability in composition, facial identity, and lighting across different seeds, ensuring that multi-person scenes remain distinct and dynamic.
  • Built for Development: The ideal starting point for the community. Its non-distilled nature makes it a good base for LoRA training, structural conditioning (ControlNet) and semantic conditioning.
  • Robust Negative Control: Responds with high fidelity to negative prompting, allowing users to reliably suppress artifacts and adjust compositions.

๐Ÿ†š Z-Image vs Z-Image-Turbo

AspectZ-ImageZ-Image-Turbo
CFGโœ…โŒ
Steps28~508
Fintunablityโœ…โŒ
Negative Promptingโœ…โŒ
DiversityHighLow
Visual QualityHighVery High
RLโŒโœ…
๐Ÿ”Ž

Similar to Tongyi-MAI/Z-Image

openai/gpt-image-1.5
openai/gpt-image-1.5
OpenAI's latest image generation model with better instruction following and adherence to prompts
Image Generation
tencent/hunyuan-image-3
tencent/hunyuan-image-3
A powerful native multimodal model for image generation (PrunaAI squeezed)
Image Generation
stability-ai/stable-diffusion-3.5-large
stability-ai/stable-diffusion-3.5-large
A text-to-image model that generates high-resolution images with fine details. It supports various artistic styles and produces diverse outputs from the same prompt, thanks to Query-Key Normalization.
Image Generation
google/nano-banana
google/nano-banana
Google's latest image editing model in Gemini 2.5
Image Generation
prunaai/p-image
prunaai/p-image
A sub 1 second text-to-image model built for production use cases.
Image Generation
recraft-ai/recraft-v3
recraft-ai/recraft-v3
Recraft V3 (code-named red_panda) is a text-to-image model with the ability to generate long texts, and images in a wide list of styles. As of today, it is SOTA in image generation, proven by the Text-to-Image Benchmark by Artificial Analysis
Image Generation