DeepSeekMoEMIT

DeepSeek V4-Flash (MoE)

DeepSeek V4-Flash (MoE) is a mixture-of-experts (MoE) transformer language model from the DeepSeek family, containing 284B parameters across 48 layers. It has 284B total parameters loaded into VRAM with 13B active per token. It supports up

284.0B

Parameters

13.0B

Active

1.0M

Max Context

MoE

Architecture

Released

Text

Modality

About DeepSeek V4-Flash (MoE)

DeepSeek V4-Flash (MoE) is a mixture-of-experts (MoE) transformer language model from the DeepSeek family, containing 284B parameters across 48 layers. It has 284B total parameters loaded into VRAM with 13B active per token. It supports up to 1.0M tokens of context with a hidden dimension of 6144 and 8 KV heads for efficient grouped-query attention (GQA). April 2026. 284B total / 13B active. 1M context. Economical V4 variant. High-memory server class.

ResearchEnterprise

Technical Specifications

Total Parameters284.0B
Active Parameters13.0B per token
ArchitectureMixture of Experts
Total Experts13
Attention TypeGQA (MoE)
Hidden Dimensiond = 6,144
Transformer Layers48
Attention Heads48
KV Headsn_kv = 8
Head Dimensiond_head = 128
Activation FunctionSwiGLU
NormalizationRMSNorm
Position EmbeddingRoPE

System Requirements

Estimated VRAM at 10% overhead for different quantization methods and context sizes.

Quantization1K ctx195K ctx1.0M ctx1.0M ctx
Q4_K_M0.50 B/W
~97% of FP16
147.0Cluster / Multi-GPU
183.4Cluster / Multi-GPU
329.9Cluster / Multi-GPU
338.8Cluster / Multi-GPU
Q8_01.00 B/W
~100% of FP16
293.8Cluster / Multi-GPU
330.2Cluster / Multi-GPU
476.7Cluster / Multi-GPU
485.6Cluster / Multi-GPU
F162.00 B/W
Reference
587.4Cluster / Multi-GPU
623.8Cluster / Multi-GPU
770.3Cluster / Multi-GPU
779.2Cluster / Multi-GPU
Fits 24 GB consumer GPU
Fits 80 GB datacenter GPU
Requires cluster / multi-GPU

Other DeepSeek Models

View All

Find the right GPU for DeepSeek V4-Flash (MoE)

Use the interactive VRAM Calculator to see exactly how much memory you need at any quantization level, context length, and overhead setting.