Google Gemini Logo

Gemini 3 Flash

Frontier reasoning at Flash speed; 3x faster than 2.5 Pro with superior reasoning; 1M token context; 30% fewer tokens; $0.50/1M input token pricing.

Gemini 3 Flash released December 16, 2025 as the new default model in the Gemini app, representing a major capability leap combining frontier Gemini 3 reasoning with Flash-tier speed and efficiency. The model delivers PhD-level reasoning comparable to much larger models while maintaining 3x faster inference than Gemini 2.5 Proachieving 0.21-0.37 second first-token latency and 163 tokens/second output speed. Remarkably, 3 Flash outperforms 2.5 Pro on coding benchmarks (78% vs 91% on SWE-bench Verified) while costing 75% less ($0.50 vs $2+ per million input tokens). The model uses 30% fewer tokens on average than 2.5 Pro on typical traffic while improving performance. Features 1-million-token context, native audio I/O, dynamic reasoning depth adaptation, and multimodal input support. Immediately available globally to Gemini app users at no cost. Knowledge cutoff: January 2025. Now the recommended default for production deployments.

Reviews

No Reviews Yet

Be the first to share your experience with this AI tool

More models from Google Gemini

Fastest and most cost-efficient model in 2.5 family; optimized for classification and summarization at scale; lowest latency with strong performance.

Frontier intelligence with maximum reasoning; supports 2M token context; excels at deep reasoning and complex agentic workflows; optional Deep Think mode.

General-purpose model balancing capability and efficiency; supports diverse task categories; foundation for broader Gemini family integration.

Flagship reasoning model for complex tasks; Google's highest-capability offering in 1.0 generation; optimized for expertise-level problem solving.

Million-token context window enabling long-document processing; advanced reasoning for complex multi-step tasks; 10x larger context than previous generation.

Fast variant optimized for speed over maximum reasoning; million-token context with reduced latency; ideal for real-time applications and high-throughput workloads.

First Flash-Lite variant introducing cost-efficiency tier; optimized for speed and economy; lighter-weight than 2.0 Flash without major capability loss.

Next-generation reasoning with improved MoE architecture; enhanced coding and logic performance; real-time web search integration for current information.

Ultra-fast workhorse model with 1M token context; designed for agentic era with native tool use; maintains capability while prioritizing speed and throughput.

Most advanced reasoning model with native thinking mode; Deep Think for complex problem solving; leads industry benchmarks (LMArena #1); PhD-grade reasoning.

Balanced speed and reasoning with thinking support; 20-30% token efficiency improvement over 2.0; optimized for high-volume intelligent workloads.

On-device efficiency model; optimized for lightweight tasks; runs directly on Pixel 8 Pro without network connection.