qwen/qwen-image

An image generation foundation model in the Qwen series that achieves significant advances in complex text rendering.

Text-to-Image

Introduction
We are thrilled to release Qwen-Image, an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing. Experiments show strong general capabilities in both image generation and editing, with exceptional performance in text rendering, especially for Chinese.

News
2025.08.04: We released the Technical Report of Qwen-Image!
2025.08.04: We released Qwen-Image weights! Check at huggingface and Modelscope!
2025.08.04: We released Qwen-Image! Check our blog for more details!

Showcase
One of its standout capabilities is high-fidelity text rendering across diverse images. Whether it’s alphabetic languages like English or logographic scripts like Chinese, Qwen-Image preserves typographic details, layout coherence, and contextual harmony with stunning accuracy. Text isn’t just overlaid—it’s seamlessly integrated into the visual fabric.

Beyond text, Qwen-Image excels at general image generation with support for a wide range of artistic styles. From photorealistic scenes to impressionist paintings, from anime aesthetics to minimalist design, the model adapts fluidly to creative prompts, making it a versatile tool for artists, designers, and storytellers.

When it comes to image editing, Qwen-Image goes far beyond simple adjustments. It enables advanced operations such as style transfer, object insertion or removal, detail enhancement, text editing within images, and even human pose manipulation—all with intuitive input and coherent output. This level of control brings professional-grade editing within reach of everyday users.

But Qwen-Image doesn’t just create or edit—it understands. It supports a suite of image understanding tasks, including object detection, semantic segmentation, depth and edge (Canny) estimation, novel view synthesis, and super-resolution. These capabilities, while technically distinct, can all be seen as specialized forms of intelligent image editing, powered by deep visual comprehension.

Together, these features make Qwen-Image not just a tool for generating pretty pictures, but a comprehensive foundation model for intelligent visual creation and manipulation—where language, layout, and imagery converge.

🔎

Similar to qwen/qwen-image

Tongyi-MAI/Z-Image

An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Text-to-Image

openai/gpt-image-1.5

OpenAI's latest image generation model with better instruction following and adherence to prompts

Text-to-Image

tencent/hunyuan-image-3

A powerful native multimodal model for image generation (PrunaAI squeezed)

Text-to-Image

stability-ai/stable-diffusion-3.5-large

A text-to-image model that generates high-resolution images with fine details. It supports various artistic styles and produces diverse outputs from the same prompt, thanks to Query-Key Normalization.

Text-to-Image

google/nano-banana

Google's latest image editing model in Gemini 2.5

Text-to-Image

prunaai/p-image

A sub 1 second text-to-image model built for production use cases.

Text-to-Image