Google Gemini Logo

Gemini 2.0 Flash-Lite

First Flash-Lite variant introducing cost-efficiency tier; optimized for speed and economy; lighter-weight than 2.0 Flash without major capability loss.

Gemini 2.0 Flash-Lite was released February 1, 2025 as the inaugural Flash-Lite model introducing a third tier to Google's speed-cost-capability spectrum. Flash-Lite delivers impressive performance for applications where cost and latency matter more than maximum reasoning power, such as classification at scale, high-volume content summarization, and routine chatbot interactions. The model achieves faster time-to-first-token and higher throughput than 2.0 Flash while maintaining solid quality on most tasks. Flash-Lite's token efficiency and pricing model make it attractive for cost-sensitive operations and platforms processing millions of daily requests. Performance benchmarks show Flash-Lite outperforms previous generation 1.5 models on coding, math, and science tasks despite lower cost. Knowledge cutoff: August 2024. Model was subsequently superseded by 2.5 Flash-Lite.

Reviews

No Reviews Yet

Be the first to share your experience with this AI tool

More models from Google Gemini

Fastest and most cost-efficient model in 2.5 family; optimized for classification and summarization at scale; lowest latency with strong performance.

Frontier intelligence with maximum reasoning; supports 2M token context; excels at deep reasoning and complex agentic workflows; optional Deep Think mode.

General-purpose model balancing capability and efficiency; supports diverse task categories; foundation for broader Gemini family integration.

Flagship reasoning model for complex tasks; Google's highest-capability offering in 1.0 generation; optimized for expertise-level problem solving.

Million-token context window enabling long-document processing; advanced reasoning for complex multi-step tasks; 10x larger context than previous generation.

Fast variant optimized for speed over maximum reasoning; million-token context with reduced latency; ideal for real-time applications and high-throughput workloads.

Next-generation reasoning with improved MoE architecture; enhanced coding and logic performance; real-time web search integration for current information.

Ultra-fast workhorse model with 1M token context; designed for agentic era with native tool use; maintains capability while prioritizing speed and throughput.

Most advanced reasoning model with native thinking mode; Deep Think for complex problem solving; leads industry benchmarks (LMArena #1); PhD-grade reasoning.

Balanced speed and reasoning with thinking support; 20-30% token efficiency improvement over 2.0; optimized for high-volume intelligent workloads.

Frontier reasoning at Flash speed; 3x faster than 2.5 Pro with superior reasoning; 1M token context; 30% fewer tokens; $0.50/1M input token pricing.

On-device efficiency model; optimized for lightweight tasks; runs directly on Pixel 8 Pro without network connection.