DeepSeekMoEMIT

DeepSeek V4-Pro (MoE)

DeepSeek V4-Pro is the April 2026 frontier model pushing to 1.6 trillion total parameters with 49B active per token. It introduces Dynamic Sparse Attention (DSA) and token compression for efficient 1M token context processing. At this scale

1.6T

Parameters

49.0B

Active

1.0M

Max Context

MoE

Architecture

Apr 1, 2026

Released

Text

Modality

About DeepSeek V4-Pro (MoE)

DeepSeek V4-Pro is the April 2026 frontier model pushing to 1.6 trillion total parameters with 49B active per token. It introduces Dynamic Sparse Attention (DSA) and token compression for efficient 1M token context processing. At this scale it is cluster-class only — requiring over 800 GB VRAM at Q4_K_M. The architecture represents the bleeding edge of open-weight AI: 80 transformer layers, 8192 hidden dimension, and advanced load-balanced MoE routing. Primarily accessed via API, with open weights available for research and enterprise self-hosting.

Frontier ResearchEnterpriseAgenticCode

Technical Specifications

Total Parameters1.6T
Active Parameters49.0B per token
ArchitectureMixture of Experts
Total Experts256
Active Experts8 per token
Attention TypeDSA (Dynamic Sparse Attention) + Token Compression
Hidden Dimensiond = 8,192
Transformer Layers80
Attention Heads64
KV Headsn_kv = 8
Head Dimensiond_head = 128
Activation FunctionSwiGLU
NormalizationRMSNorm
Position EmbeddingYaRN-extended RoPE

System Requirements

Estimated VRAM at 10% overhead for different quantization methods and context sizes.

Quantization1K ctx195K ctx1.0M ctx1.0M ctx
Q4_K_M0.50 B/W
~97% of FP16
827.3Cluster / Multi-GPU
888.0Cluster / Multi-GPU
1132.2Cluster / Multi-GPU
1147.0Cluster / Multi-GPU
Q8_01.00 B/W
~100% of FP16
1654.3Cluster / Multi-GPU
1715.1Cluster / Multi-GPU
1959.2Cluster / Multi-GPU
1974.0Cluster / Multi-GPU
F162.00 B/W
Reference
3308.4Cluster / Multi-GPU
3369.1Cluster / Multi-GPU
3613.2Cluster / Multi-GPU
3628.1Cluster / Multi-GPU
Fits 24 GB consumer GPU
Fits 80 GB datacenter GPU
Requires cluster / multi-GPU

Other DeepSeek Models

View All

Find the right GPU for DeepSeek V4-Pro (MoE)

Use the interactive VRAM Calculator to see exactly how much memory you need at any quantization level, context length, and overhead setting.