Deepseek Logo

DeepSeek-LLM

General-purpose foundation model; 67B parameters; outperforms LLaMA-2-70B on reasoning and math

Launched in November 2023 shortly after DeepSeek-Coder, DeepSeek-LLM introduced the company's first general-purpose large language model with 67 billion parameters trained on 2 trillion tokens across English and Chinese. Despite being smaller than contemporary competitors, it demonstrated superior performance on reasoning, coding, mathematics, and Chinese comprehension benchmarks compared to LLaMA-2-70B. The model was released as open-source in base and chat variants, reinforcing DeepSeek's commitment to democratized AI development. Its strong performance-to-parameter ratio made it a notable achievement for 2023 and established DeepSeek as a serious contender in the open-source LLM space.

Reviews

No Reviews Yet

Be the first to share your experience with this AI tool

More models from Deepseek

V3.1 refinement (September 2025); improved language consistency; enhanced agent performance; more stable outputs

Experimental model (September 29, 2025); DeepSeek Sparse Attention (DSA) mechanism; 50-75% lower inference costs; long-context optimization

Production release (December 1, 2025); 671B parameters; DeepSeek Sparse Attention; GPT-5-level reasoning; 128K context

Hybrid model (August 2025); 671B parameters; dual-mode (thinking + non-thinking); 128K context; enhanced tool calling

Extended reasoning variant (December 1, 2025); extreme thinking mode; 96% AIME score; gold IMO 2025; outperforms GPT-5-High

Specialized mathematics model; three variants (Base, Instruct, RL); optimized for STEM problem-solving

Second generation MoE model; 236B total parameters (21B active); 128K context window; 50% lower training cost

Advanced coding model; 236B parameters (21B active); 128K context; 338 programming languages; GPT-4-Turbo-level coding

Refinement of V2; improved training data; enhanced transformer architecture; increased computational power

First commercial-grade coding model; 1.3B-33B parameters; supports 80+ programming languages

Mixture of Experts model; efficient inference; reduced memory and computational requirements

Third generation foundation; 671B parameters (37B active); 128K context; major leap in capability and reasoning