DeepSeekMoEMIT

DeepSeek R1 (MoE)

DeepSeek R1 is the reasoning model that shocked the AI world. A massive 671B MoE (37B active per token) that matches or exceeds OpenAI o1 on math, coding, and scientific reasoning benchmarks — released under the permissive MIT license. It u

671.0B

Parameters

37.0B

Active

64K

Max Context

MoE

Architecture

Jan 20, 2025

Released

Text

Modality

About DeepSeek R1 (MoE)

DeepSeek R1 is the reasoning model that shocked the AI world. A massive 671B MoE (37B active per token) that matches or exceeds OpenAI o1 on math, coding, and scientific reasoning benchmarks — released under the permissive MIT license. It uses Multi-head Latent Attention (MLA) which compresses the KV cache by ~95%, dramatically reducing memory overhead for long contexts. The full model requires ~370 GB VRAM at Q4_K_M (server/cluster class), but the distilled variants (R1 Distill Qwen 7B/14B/32B, R1 Distill Llama 70B) bring reasoning capabilities to consumer hardware. DeepSeek R1's chain-of-thought reasoning is verbose but exceptionally thorough on hard problems.

ReasoningMathSTEMResearchCode

Technical Specifications

Total Parameters671.0B
Active Parameters37.0B per token
ArchitectureMixture of Experts
Total Experts256
Active Experts8 per token
Attention TypeMLA (Multi-head Latent Attention)
Hidden Dimensiond = 7,168
Transformer Layers61
Attention Heads56
KV Headsn_kv = 8
Head Dimensiond_head = 128
Activation FunctionSwiGLU
NormalizationRMSNorm
Position EmbeddingYaRN-extended RoPE

System Requirements

Estimated VRAM at 10% overhead for different quantization methods and context sizes.

Quantization1K ctx64K ctx
Q4_K_M0.50 B/W
~97% of FP16
347.1Cluster / Multi-GPU
362.1Cluster / Multi-GPU
Q8_01.00 B/W
~100% of FP16
693.9Cluster / Multi-GPU
708.9Cluster / Multi-GPU
F162.00 B/W
Reference
1387.6Cluster / Multi-GPU
1402.6Cluster / Multi-GPU
Fits 24 GB consumer GPU
Fits 80 GB datacenter GPU
Requires cluster / multi-GPU

Other DeepSeek Models

View All

Find the right GPU for DeepSeek R1 (MoE)

Use the interactive VRAM Calculator to see exactly how much memory you need at any quantization level, context length, and overhead setting.