Discover The Best AI Websites & Models

27931 AIs and 88 categories in the best AI tools directory.

AI Assistant Code Assistant Video Generation Image Generation Art Generation Chat Developer Tools More

New AI Models

zai-org/GLM-4.7

A state-of-the-art text generation model with 358B parameters, supporting English and Chinese, optimized for agentic reasoning, coding, and complex tool use.

Text Generation

MiniMaxAI/MiniMax-M2.1

MiniMax M2.1 is a state-of-the-art (SOTA) model designed specifically for real-world development and autonomous agents, focusing on coding, tool use, and long-horizon planning.

Text Generation

moonshotai/Kimi-K2.5

Kimi K2.5 is an open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base

Any-to-Any

Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice

The Qwen3-TTS-Tokenizer-12Hz model which can encode the input speech into codes and decode them back into speech.

Text-to-Speech

openbmb/MiniCPM-o-4_5

A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Mulitmodal Live Streaming on Your Phone

Any-to-Any

deepseek-ai/DeepSeek-OCR-2

DeepSeek-OCR is a model designed to explore the boundaries of visual-text compression, investigating the role of vision encoders from an LLM-centric viewpoint.

Image-to-Text

zai-org/GLM-OCR

GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture

Image-to-Text

PaddlePaddle/PaddleOCR-VL-1.5

PaddleOCR-VL-1.5 is an advanced next-generation model of PaddleOCR-VL, achieving a new state-of-the-art accuracy of 94.5% on OmniDocBench v1.5

Image-to-Text

Qwen/Qwen3-ASR-1.7B

The Qwen3-ASR family includes Qwen3-ASR-1.7B and Qwen3-ASR-0.6B, which support language identification and ASR for 52 languages and dialects.

Automatic Speech Recognition

Qwen/Qwen3-Coder-Next

an open-weight language model designed specifically for coding agents and local development

Text Generation

Tongyi-MAI/Z-Image

An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Text-to-Image

openai/whisper-large-v3

Convert speech in audio to text

Automatic Speech Recognition

openai/sora-2-pro

OpenAI's Most advanced synced-audio video generation

Text-to-Video

openai/gpt-image-1.5

OpenAI's latest image generation model with better instruction following and adherence to prompts

Text-to-Image

tencent/hunyuan-image-3

A powerful native multimodal model for image generation (PrunaAI squeezed)

Text-to-Image

stability-ai/stable-diffusion-3.5-large

A text-to-image model that generates high-resolution images with fine details. It supports various artistic styles and produces diverse outputs from the same prompt, thanks to Query-Key Normalization.

Text-to-Image

google/nano-banana

Google's latest image editing model in Gemini 2.5

Text-to-Image

prunaai/p-image

A sub 1 second text-to-image model built for production use cases.

Text-to-Image

recraft-ai/recraft-v3

Recraft V3 (code-named red_panda) is a text-to-image model with the ability to generate long texts, and images in a wide list of styles. As of today, it is SOTA in image generation, proven by the Text-to-Image Benchmark by Artificial Analysis

Text-to-Image