Explore Top AI Models

All Models

openai/gpt-image-2/text-to-image

Transform natural language prompts into high-quality, extremely detailed visuals. As OpenAI's latest image model, it delivers exceptional typography precision, perfectly solving text rendering needs in image generation.

text-to-image

new

openai/gpt-image-2/edit

OpenAI's premier instruction-driven editing model enables "semantic-level" pixel reimagining through natural language and multiple reference images; it completely eliminates the need for complex masking, precisely preserving subject features and perfectly rendering multi-language text to help you achieve industrial-grade character outfitting, product composition, and UI redesign.

image-to-image

new

openai/gpt-5.5

GPT-5.5 is OpenAI's most powerful model to date, specifically designed for high-level real-world scenarios including autonomous coding, computer use, knowledge work, and early scientific research. While maintaining the same per-token latency as GPT-5.4 in real-world serving, it delivers significantly higher reasoning capabilities while using far fewer tokens to complete identical tasks. The model achieves state-of-the-art (SOTA) results across multiple benchmarks, including Terminal-Bench 2.0, OSWorld-Verified, GDPval, FrontierMath, and CyberGym.

llm

new

deepseek/deepseek-v4-pro

DeepSeek V4 Pro is an advanced Mixture-of-Experts model by DeepSeek, equipped with 1.6T total parameters and 49B activated parameters. It integrates a hybrid attention system derived from the DeepSeek V4 Flash baseline, ensuring highly efficient long-context processing. Built to conquer complex workloads such as full-codebase analysis, multi-step automation, and large-scale information synthesis, it excels in advanced reasoning and coding. The model supports dynamic reasoning efforts (high and xhigh, with xhigh mapping to max reasoning), driving exceptional metric performance across math, knowledge, and software engineering benchmarks.

llm

new

deepseek/deepseek-v4-flash

Designed for ultimate responsiveness and cost efficiency, DeepSeek V4 Flash is a 284B parameter (13B activated) Mixture-of-Experts model from DeepSeek. It integrates hybrid attention to streamline long-context processing while maintaining exceptional reasoning and coding output during high-throughput workloads. Built-in execution support for high and xhigh (max reasoning) efforts offers scalable logic depth. It is technically primed for demanding integration scenarios, including agent workflows, coding assistants, and real-time chat systems.

llm

new

anthropic/claude-opus-4.7

Claude Opus 4.7 represents the next frontier in Anthropic’s model lineup. Engineered specifically for asynchronous agent pipelines, it builds upon the legendary coding and reasoning foundations of Opus 4.6. This model is optimized for complex, multi-step tasks that unfold over time, ensuring flawless execution across extended workflows. From orchestrating large-scale codebase refactoring to managing multi-stage debugging, Opus 4.7 delivers the reliability required for mission-critical, end-to-end project orchestration.

llm

hot

bytedance/seedance-2.0/text-to-video

Seedance 2.0 is a multimodal AI video model developed by ByteDance. With exceptional motion stability as its core strength, it empowers creators with full control over performance, lighting, and camera movements. It generates cinema-quality visuals that meet industry standards, delivering highly realistic immersive experiences while maintaining strong consistency across multi-shot storytelling and significantly improving overall production efficiency.

text-to-video

hot

bytedance/seedance-2.0/image-to-video

Transform static images into consistent cinematic visuals. bytedance/seedance-2.0/image-to-video uses precise multimodal control to extract composition and style. It overcomes generation randomness, supporting complex camera movements, realistic physics, and natural motion simulation. Whether in intense action sequences or multi-shot narratives, the model maintains strict consistency in character traits and physical inertia. Combined with native audio beat-sync, it perfectly aligns editing rhythm with visual tension for stable, high-impact output.

image-to-video

hot

bytedance/seedance-2.0/omni-reference

Shatter the physical boundaries of traditional video production. bytedance/seedance-2.0/omni-reference features an exceptionally acute "omni-reference" analytical capability, deeply deconstructing the cinematic audiovisual language of any reference footage. It flawlessly replicates complex camera paths and character blocking, while precisely extracting and reconstructing highly dynamic transition rhythms. Whether executing profound visual style reshaping or generating perfectly synced audiovisual sequences, it maintains absolute visual consistency and strict adherence to physical laws across multiple shots.

video-to-video

new

alibaba/happyhorse-1.0/text-to-video

Developed by Alibaba ATH, HappyHorse-1.0 is a trending newcomer in AI video generation. It delivers robust text-to-video performance, integrated with native audio support and multilingual capabilities.

text-to-video

new

alibaba/happyhorse-1.0/image-to-video

Developed by Alibaba ATH, HappyHorse-1.0 is a trending newcomer in AI video generation. It delivers robust image-to-video performance, integrated with native audio support and multilingual capabilities.

image-to-video

new

alibaba/happyhorse-1.0/reference-to-video

Developed by Alibaba ATH, HappyHorse-1.0 is a trending newcomer in AI video generation. It delivers robust reference-to-video performance, integrated with native multilingual processing and audio support.

reference-to-video

new

alibaba/happyhorse-1.0/video-edit

Developed by Alibaba ATH, HappyHorse-1.0 is a prominent newcomer in AI video. It delivers powerful video-edit performance, fully integrated with native audio support and multilingual capabilities.

video-to-video

kwaivgi/kling-o3/video-to-video

Powered by Kling O3 Omni, you can generate new shots guided by an input reference video, precisely preserving core cinematic language such as motion trajectories and camera styles to ensure seamless scene continuity. Additionally, combined with Kling Omni Video O3 Video-Edit, you will experience conversational video editing driven by natural language commands. With simple instructions, you can easily remove objects, change backgrounds, modify visual styles, adjust weather and lighting, or even completely transform scenes.

video-to-video

kwaivgi/kling-o3/text-to-video

Achieving advanced semantic alignment for Text-to-Video generation, Kling Omni Video O3 produces cinematic-grade visuals featuring natural physics simulation and consistent subject tracking. This high-performance solution integrates high-fidelity audio synchronization and operates with zero coldstarts, ensuring every frame delivers exceptional visual impact and professional-tier quality.

text-to-video

kwaivgi/kling-o3/image-to-video

ShortAPI's kling-o3/image-to-video seamlessly merges the core capabilities of Image-to-Video and Reference-to-Video. Leveraging Kling Omni Video O3 and MVL technology, simply input any character, prop, or scene reference, and it automatically extracts key features to generate dynamic videos with strict identity consistency across frames. From highly accurate physics simulation to silky natural motion and synchronized audio generation, every request outputs a cinematic sequence. Built for hyper-efficient workflows, it delivers the best performance with immediate execution and absolutely zero coldstarts (no coldstarts).

image-to-video

new

suno/suno-v5.5/generate

Suno v5.5 is a next-generation AI music generator that lets anyone create full songs in seconds—no musical skills required. Simply enter a prompt to generate high-quality music with vocals, melody, and arrangement. Whether for short videos, marketing, or personal projects, Suno v5.5 significantly improves efficiency and reduces production costs.

text-to-music

bytedance/seedream-5.0/text-to-image

Seedream 5.0 text-to-image capability features core competencies in knowledge reasoning, precise semantic understanding, and general knowledge logical deduction, enabling it to handle generation requests involving complex logic.

text-to-image

bytedance/seedream-5.0/edit

Seedream 5.0 edit function enables precise and controllable image manipulation. It employs strong instruction adherence to significantly reduce hallucinations, and supports feature migration and reference examples, automatically learning transformation logic for the one-click reuse of styles or operations.

image-to-image

google/veo-3.1/text-to-video

Google Veo 3.1 produces high-quality, native 1080p videos with synchronized audio from text prompts. The service delivers best performance and affordable pricing without the delay of coldstarts.

text-to-video

google/veo-3.1/image-to-video

Google Veo 3.1 is an Image-to-Video model that generates high-quality videos from images, featuring native 1080P output for greater creative flexibility and enhanced detail. It offers best performance and affordable pricing with no coldstarts.

image-to-video

google/veo-3.1/extend-video

Extend and prolong Veo 3.1 videos with fluid motion, consistent styling, and robust scene integrity. Benefit from peak performance and instant-on access without coldstarts, all at an affordable price.

video-to-video

google/veo-3.1/first-last-frame-to-video

Videos can be produced from a first and last frame with the assistance of Google's Veo 3.1.

image-to-video

google/veo-3.1/reference-to-video

Google Veo3.1 Reference-to-Video specializes in image-to-video generation that maintains a subject's specific identity and appearance based on reference images. This technology enables seamless motion for characters or products across all frames, offering best performance, no coldstarts, and affordable pricing.

image-to-video

new

alibaba/qwen-3.6-plus

Driven by a hybrid design merging sparse mixture-of-experts routing and efficient linear attention, Qwen 3.6 Plus guarantees immense scalability and high-performance inference. It brings paradigm-shifting upgrades over the 3.5 lineup in agentic coding, reasoning, and front-end development, revolutionizing the “vibe coding” process. With a SWE-bench Verified rating of 78.8, this state-of-the-art model expertly handles repository-level problem solving, games, and 3D scenes, defining the next leap in multimodal and pure-text capabilities.

llm

new

alibaba/wan-2.7/image-to-video

Empower your creative storytelling with alibaba/wan-2.7/image-to-video provided by ShortAPI. It reshapes single or multiple reference images into cinematic 1080P dynamic footage, supporting up to 15 seconds of content extension per generation. Featuring outstanding control over character texture and environmental consistency, it puts complex visual effects and smooth motion trajectories right at your fingertips.

image-to-video

new

alibaba/wan-2.7/text-to-image

Seamlessly integrate premium text-to-image capabilities into your workflow with alibaba/wan-2.7/text-to-image on ShortAPI. Featuring a unique Thinking Mode for precise, high-quality visual outputs, this inference API delivers the best performance and eliminates delays with a strict no coldstarts architecture.

text-to-image

new

alibaba/wan-2.7/image-edit

WAN 2.7 Image Edit enables prompt-driven image editing utilizing multiple-image references. Deploy our ready-to-use inference API for peak performance with no coldstarts.

image-to-image

google/nano-banana-2/text-to-image

Nano Banana 2 is Google’s breakthrough Gemini 3.1 Flash image model, engineered for lightning-fast performance and studio-grade quality, with native support for tiered resolutions from 1K and 2K up to professional 4K upscaling. It leverages exceptional text rendering precision and character consistency to deliver high-fidelity visual solutions for e-commerce automation, motion design, and social media content creation in seconds, empowering developers and creators to scale their creative workflows with ease.

text-to-image

google/nano-banana-2/edit

Nano Banana 2 Edit is a professional-grade inpainting and retouching model designed for advanced image editing, supporting ultra-high-definition local repainting, object removal, and style transfer; with superior semantic alignment, it precisely edits character details, textures, and scene elements while maintaining original composition and lighting logic, making it the premier engine for high-end e-commerce asset optimization and creative post-production platforms.

image-to-image

openai/gpt-5.4

GPT-5.4 is OpenAI’s latest frontier model, delivering stronger performance in coding, document understanding, tool use, and instruction following. It serves as a powerful default for both general-purpose tasks and software engineering, capable of generating production-quality code, synthesizing information across multiple sources, and executing complex multi-step workflows with greater efficiency—reducing iterations while improving output quality.

llm

openai/gpt-5.4-pro

GPT-5.4 Pro is OpenAI’s most advanced model, built on the unified architecture of GPT-5.4 and engineered to deliver stronger reasoning for complex, high-stakes tasks. Optimized for step-by-step reasoning, precise instruction following, and accuracy, GPT-5.4 Pro consistently excels in agentic coding, long-context workflows, and multi-step problem solving.

llm

openai/gpt-5.4-mini

GPT-5.4 mini brings the core capabilities of GPT-5.4 into a faster, more efficient model designed for high-throughput workloads. It delivers strong performance in reasoning, coding, and tool use, while significantly reducing latency and operational costs.Purpose-built for production environments, the model achieves a powerful balance between performance and efficiency. It is ideal for chat applications, coding assistants, and large-scale agent workflows, providing reliable instruction following, effective multi-step reasoning, and consistent results across diverse tasks with improved cost efficiency.

llm

openai/gpt-5.4-nano

GPT-5.4 nano is the most lightweight and cost-efficient variant of the GPT-5.4 family, optimized for speed-sensitive and high-volume tasks. It supports text input and is specifically designed for low-latency applications such as classification, data extraction, ranking, and sub-agent execution. The model prioritizes responsiveness and efficiency over deep reasoning, making it ideal for large-scale pipelines that demand fast and reliable outputs. GPT-5.4 nano is particularly well suited for background tasks, real-time systems, and distributed agent architectures where minimizing cost and latency is critical.

llm

google/nano-banana-pro/text-to-image

Boasting sharper 2K imagery, intelligent 4K scaling, improved text rendering, and enhanced character consistency, Google DeepMind’s Nano Banana Pro represents a significant advancement in visual quality for creative and API-driven workflows.

text-to-image

google/nano-banana-pro/edit

image-to-image

kwaivgi/kling-3.0/text-to-video

Kling 3.0 delivers premier text-to-video generation with cinematic visuals, smooth motion, and precise prompt adherence including native audio. This high-performance solution offers affordable pricing for creating professional, ready-to-share clips.

text-to-video

kwaivgi/kling-3.0/image-to-video

Kling 3.0 delivers top-tier image-to-video generation with smooth motion, cinematic visuals, and accurate prompt adherence, featuring native audio for ready-to-share clips. It offers best performance at affordable pricing.

image-to-video

kwaivgi/kling-3.0/motion-control

Kling 3.0 Motion Control brings motion to life from reference videos, turning still images into smooth, realistic animations. Upload a character image and a motion clip — from dance to gesture, effortlessly animated — and watch your characters move realistically. Enjoy fast, reliable performance and cost-effective plans designed for creators.

video-to-video

suno/suno-v5/generate

Suno v5 can transform text prompts into complete tracks featuring both vocals and instrumentation, boasting natural dynamic expression and coherent music theory progression.

text-to-music

google/gemini-3.1-pro-preview

ShortAPI’s Gemini 3.1 Pro Preview is Google’s flagship reasoning model, delivering enhanced software engineering performance, more reliable autonomous task execution, and efficient token usage across complex workflows.Designed for advanced developers and autonomous intelligent systems, Gemini 3.1 Pro Preview improves long-horizon stability and tool orchestration, while introducing a medium thinking level to achieve an optimal balance between cost, speed, and performance. The model excels in agentic coding, structured planning, multimodal analysis, and workflow automation, making it ideal for autonomous agents, financial modeling, spreadsheet automation, and high-context enterprise tasks.

llm

anthropic/claude-sonnet-4.6

ShortAPI’s Claude 4.6 Sonnet is Anthropic’s most powerful Sonnet-class model yet, delivering exceptional performance across coding, intelligent agents, and professional tasks. It excels at iterative development, navigating complex codebases, managing end-to-end projects with memory, producing polished documents, and efficiently performing web QA and workflow automation.

llm

anthropic/claude-opus-4.6

Opus 4.6 is Anthropic’s flagship model for coding and complex professional tasks. Designed to work across entire workflows, it excels in managing large codebases, performing complex refactors, and handling multi-step debugging with deeper context understanding, stronger problem decomposition, and reliable execution.Beyond coding, Opus 4.6 delivers near-production-ready documents, plans, and analyses in a single pass, maintaining consistency over long outputs and extended sessions. It’s ideal for tasks requiring sustained judgment and follow-through, such as technical design, migration planning, and end-to-end project execution.

llm

deepseek/deepseek-v3.2

DeepSeek v3.2 is the latest production release in the DeepSeek V3 family: a large, reasoning-first open-weight language model family designed for long-context understanding, robust agent/tool use, advanced reasoning, coding and math.

llm

new

shortapi/transparent-image

Elevate your creative workflow with ease. This tool lets you generate high-quality, ready-to-use graphics—like logos and icons—with true transparent backgrounds, eliminating the need for tedious background removal.

text-to-image

vidu/vidu-q3/text-to-video

Vidu Q3 Text-to-Video transforms text prompts into high-quality videos with exceptional visual fidelity and diverse motion. It delivers best performance and affordable pricing with no coldstarts required.

text-to-video

vidu/vidu-q3/image-to-video

Vidu Q3 Image-to-Video transforms images into high-quality videos with exceptional visual fidelity and diverse motion using prompts. The model delivers best performance with no coldstarts and highly competitive pricing.

image-to-video

vidu/vidu-q3/start-end-to-video

Vidu Q3 start-end-to-video generates smooth video transitions between the start and end images with faster generation speeds.

image-to-video

new

vidu/vidu-q3/reference-to-video

Integrates advanced multi-entity consistent video generation capabilities. By blending multiple reference images with precise text guidance, the Vidu Q3 Reference-to-Video model maintains exceptional character and entity consistency across complex dynamic scenes.

video-to-video

kwaivgi/kling-v2/ai-avatar

Empower your applications with Kling AI Avatar. Instantly generate cinematic, hyper-detailed AI avatar videos for profiles and social media that strictly obey your prompts. Our ready-to-integrate API guarantees peak performance with absolutely zero cold starts.

image-to-video

HappyHorse 1.0

Developed by Alibaba ATH, HappyHorse-1.0 is a trending newcomer in AI video generation. It delivers comprehensive features—including robust text-to-video, image-to-video, reference-to-video, and video editing capabilities—fully integrated with native audio support and multilingual processing.

Seedance 2.0

Seedance 2.0 centers on exceptional motion stability, empowering creators with full control over performance, lighting, and camera movement. It delivers industry-standard, cinematic-quality visuals, creates a highly realistic immersive experience, and significantly enhances overall production efficiency.

Seedream 5.0

Seedream 5.0 excels in knowledge reasoning, semantic understanding, and logical deduction for complex text-to-image tasks. Its edit function enables precise, controllable manipulation with strong instruction adherence, reduced hallucinations, and one-click style or feature transfer.

Nano Banana 2

Nano Banana 2 — powered by Google’s Gemini 3.1 Flash Image model. Built specifically for developers, it seamlessly combines ultra-fast performance with professional-grade image quality, delivering precise text rendering, exceptional character consistency, and scalable image generation and editing workflows.

Suno v5.5

GPT Image 2

HappyHorse 1.0

Seedance 2.0

Kling 3.0

Wan 2.7

Seedream 5.0

Nano Banana 2

Nano Banana Pro

Veo 3.1

Suno v5.5

Vidu Q3

Kling 2.6

Video Generation

Image Generation

Music Generation

LLM

All Models

openai/gpt-image-2/text-to-image

openai/gpt-image-2/edit

openai/gpt-5.5

deepseek/deepseek-v4-pro

deepseek/deepseek-v4-flash

anthropic/claude-opus-4.7

bytedance/seedance-2.0/text-to-video

bytedance/seedance-2.0/image-to-video

bytedance/seedance-2.0/omni-reference

alibaba/happyhorse-1.0/text-to-video

alibaba/happyhorse-1.0/image-to-video

alibaba/happyhorse-1.0/reference-to-video

alibaba/happyhorse-1.0/video-edit

kwaivgi/kling-o3/video-to-video

kwaivgi/kling-o3/text-to-video

kwaivgi/kling-o3/image-to-video

suno/suno-v5.5/generate

bytedance/seedream-5.0/text-to-image

bytedance/seedream-5.0/edit

google/veo-3.1/text-to-video

google/veo-3.1/image-to-video

google/veo-3.1/extend-video

google/veo-3.1/first-last-frame-to-video

google/veo-3.1/reference-to-video

alibaba/qwen-3.6-plus

alibaba/wan-2.7/image-to-video

alibaba/wan-2.7/text-to-image

alibaba/wan-2.7/image-edit

google/nano-banana-2/text-to-image

google/nano-banana-2/edit

openai/gpt-5.4

openai/gpt-5.4-pro

openai/gpt-5.4-mini

openai/gpt-5.4-nano

google/nano-banana-pro/text-to-image

google/nano-banana-pro/edit

kwaivgi/kling-3.0/text-to-video

kwaivgi/kling-3.0/image-to-video

kwaivgi/kling-3.0/motion-control

suno/suno-v5/generate

google/gemini-3.1-pro-preview

anthropic/claude-sonnet-4.6

anthropic/claude-opus-4.6

deepseek/deepseek-v3.2

shortapi/transparent-image

vidu/vidu-q3/text-to-video

vidu/vidu-q3/image-to-video

vidu/vidu-q3/start-end-to-video

vidu/vidu-q3/reference-to-video

kwaivgi/kling-v2/ai-avatar

GPT Image 2

HappyHorse 1.0

Seedance 2.0

Kling 3.0

Wan 2.7

Seedream 5.0

Nano Banana 2

Nano Banana Pro

Veo 3.1

Suno v5.5

Vidu Q3

Kling 2.6

Video Generation