Google Gemini Logo

Gemini 2.5 Flash-Lite

Fastest and most cost-efficient model in 2.5 family; optimized for classification and summarization at scale; lowest latency with strong performance.

Gemini 2.5 Flash-Lite released June 17, 2025 as the cost-optimized tier of the 2.5 family, designed for high-throughput applications prioritizing speed and economy. The model achieves the lowest latency in the Gemini 2.5 lineup and highest tokens-per-second decode rate, making it ideal for processing classification tasks, content summarization, and other operations at scale across millions of daily requests. Flash-Lite delivers performance improvements over previous generation 2.0 Flash-Lite across reasoning, coding, multimodality, and long-context understanding. The model supports 1-million-token context and thinking mode capabilities. Pricing and efficiency make it the recommended choice for cost-sensitive deployments while still offering respectable quality. Knowledge cutoff: January 2025. Generally available in production July 2025.

Reviews

No Reviews Yet

Be the first to share your experience with this AI tool

More models from Google Gemini

Frontier intelligence with maximum reasoning; supports 2M token context; excels at deep reasoning and complex agentic workflows; optional Deep Think mode.

General-purpose model balancing capability and efficiency; supports diverse task categories; foundation for broader Gemini family integration.

Flagship reasoning model for complex tasks; Google's highest-capability offering in 1.0 generation; optimized for expertise-level problem solving.

Million-token context window enabling long-document processing; advanced reasoning for complex multi-step tasks; 10x larger context than previous generation.

Fast variant optimized for speed over maximum reasoning; million-token context with reduced latency; ideal for real-time applications and high-throughput workloads.

First Flash-Lite variant introducing cost-efficiency tier; optimized for speed and economy; lighter-weight than 2.0 Flash without major capability loss.

Next-generation reasoning with improved MoE architecture; enhanced coding and logic performance; real-time web search integration for current information.

Ultra-fast workhorse model with 1M token context; designed for agentic era with native tool use; maintains capability while prioritizing speed and throughput.

Most advanced reasoning model with native thinking mode; Deep Think for complex problem solving; leads industry benchmarks (LMArena #1); PhD-grade reasoning.

Balanced speed and reasoning with thinking support; 20-30% token efficiency improvement over 2.0; optimized for high-volume intelligent workloads.

Frontier reasoning at Flash speed; 3x faster than 2.5 Pro with superior reasoning; 1M token context; 30% fewer tokens; $0.50/1M input token pricing.

On-device efficiency model; optimized for lightweight tasks; runs directly on Pixel 8 Pro without network connection.