tencent/hunyuan-image-3

A powerful native multimodal model for image generation (PrunaAI squeezed)

Image Generation
tencent/hunyuan-image-3

đź“– Introduction

HunyuanImage-3.0 is a groundbreaking native multimodal model that unifies multimodal understanding and generation within an autoregressive framework. Our text-to-image module achieves performance comparable to or surpassing leading closed-source models.

✨ Key Features

  • đź§  Unified Multimodal Architecture: Moving beyond the prevalent DiT-based architectures, HunyuanImage-3.0 employs a unified autoregressive framework. This design enables a more direct and integrated modeling of text and image modalities, leading to surprisingly effective and contextually rich image generation.
  • 🏆 The Largest Image Generation MoE Model: This is the largest open-source image generation Mixture of Experts (MoE) model to date. It features 64 experts and a total of 80 billion parameters, with 13 billion activated per token, significantly enhancing its capacity and performance.
  • 🎨 Superior Image Generation Performance: Through rigorous dataset curation and advanced reinforcement learning post-training, we’ve achieved an optimal balance between semantic accuracy and visual excellence. The model demonstrates exceptional prompt adherence while delivering photorealistic imagery with stunning aesthetic quality and fine-grained details.
  • đź’­ Intelligent World-Knowledge Reasoning: The unified multimodal architecture endows HunyuanImage-3.0 with powerful reasoning capabilities. It leverages its extensive world knowledge to intelligently interpret user intent, automatically elaborating on sparse prompts with contextually appropriate details to produce superior, more complete visual outputs.

📝 Prompt Guide

Manually Writing Prompts.

The Pretrain Checkpoint does not automatically rewrite or enhance input prompts, Instruct Checkpoint can rewrite or enhance input prompts with thinking. For optimal results currently, we recommend community partners consulting our official guide on how to write effective prompts.

Reference: HunyuanImage 3.0 Prompt Handbook

System Prompt For Automatic Rewriting the Prompt.

We’ve included two system prompts in the PE folder of this repository that leverage DeepSeek to automatically enhance user inputs:

  • system_prompt_universal: This system prompt converts photographic style, artistic prompts into a detailed one.
  • system_prompt_text_rendering: This system prompt converts UI/Poster/Text Rending prompts to a deailed on that suits the model.

Note that these system prompts are in Chinese because Deepseek works better with Chinese system prompts. If you want to use it for English oriented model, you may translate it into English or refer to the comments in the PE file as a guide.

We also create a Yuanqi workflow to implement the universal one, you can directly try it.

Advanced Tips

  • Content Priority: Focus on describing the main subject and action first, followed by details about the environment and style. A more general description framework is: Main subject and scene + Image quality and style + Composition and perspective + Lighting and atmosphere + Technical parameters. Keywords can be added both before and after this structure.
  • Image resolution: Our model not only supports multiple resolutions but also offers both automatic and specified resolution options. In auto mode, the model automatically predicts the image resolution based on the input prompt. In specified mode (like traditional DiT), the model outputs an image resolution that strictly aligns with the user’s chosen resolution.

Our model can follow complex instructions to generate high‑quality, creative images.

Our model can effectively process very long text inputs, enabling users to precisely control the finer details of generated images. Extended prompts allow for intricate elements to be accurately captured, making it ideal for complex projects requiring precision and creativity.

🔎

Similar to tencent/hunyuan-image-3

Tongyi-MAI/Z-Image
Tongyi-MAI/Z-Image
An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Image Generation
openai/gpt-image-1.5
openai/gpt-image-1.5
OpenAI's latest image generation model with better instruction following and adherence to prompts
Image Generation
stability-ai/stable-diffusion-3.5-large
stability-ai/stable-diffusion-3.5-large
A text-to-image model that generates high-resolution images with fine details. It supports various artistic styles and produces diverse outputs from the same prompt, thanks to Query-Key Normalization.
Image Generation
google/nano-banana
google/nano-banana
Google's latest image editing model in Gemini 2.5
Image Generation
prunaai/p-image
prunaai/p-image
A sub 1 second text-to-image model built for production use cases.
Image Generation
recraft-ai/recraft-v3
recraft-ai/recraft-v3
Recraft V3 (code-named red_panda) is a text-to-image model with the ability to generate long texts, and images in a wide list of styles. As of today, it is SOTA in image generation, proven by the Text-to-Image Benchmark by Artificial Analysis
Image Generation