DeepSeekMoEMIT

DeepSeek V3 0324 (MoE)

DeepSeek V3 0324 (MoE) is a mixture-of-experts (MoE) transformer language model from the DeepSeek family, containing 685B parameters across 61 layers. It has 685B total parameters loaded into VRAM with 37B active per token. It supports up t

685.0B

Parameters

37.0B

Active

64K

Max Context

MoE

Architecture

Released

Text

Modality

About DeepSeek V3 0324 (MoE)

DeepSeek V3 0324 (MoE) is a mixture-of-experts (MoE) transformer language model from the DeepSeek family, containing 685B parameters across 61 layers. It has 685B total parameters loaded into VRAM with 37B active per token. It supports up to 66K tokens of context with a hidden dimension of 7168 and 8 KV heads for efficient grouped-query attention (GQA). March 2024 update. 685B total params. MLA compressed KV cache.

ResearchEnterprise

Technical Specifications

Total Parameters685.0B
Active Parameters37.0B per token
ArchitectureMixture of Experts
Total Experts37
Attention TypeGQA (MoE)
Hidden Dimensiond = 7,168
Transformer Layers61
Attention Heads56
KV Headsn_kv = 8
Head Dimensiond_head = 128
Activation FunctionSwiGLU
NormalizationRMSNorm
Position EmbeddingRoPE

System Requirements

Estimated VRAM at 10% overhead for different quantization methods and context sizes.

Quantization1K ctx64K ctx
Q4_K_M0.50 B/W
~97% of FP16
354.3Cluster / Multi-GPU
369.3Cluster / Multi-GPU
Q8_01.00 B/W
~100% of FP16
708.4Cluster / Multi-GPU
723.4Cluster / Multi-GPU
F162.00 B/W
Reference
1416.5Cluster / Multi-GPU
1431.5Cluster / Multi-GPU
Fits 24 GB consumer GPU
Fits 80 GB datacenter GPU
Requires cluster / multi-GPU

Other DeepSeek Models

View All

Find the right GPU for DeepSeek V3 0324 (MoE)

Use the interactive VRAM Calculator to see exactly how much memory you need at any quantization level, context length, and overhead setting.