MistralMoEApache 2.0

Mixtral 8x7B (MoE)

Mixtral 8x7B was the first major open-source MoE model and proved the architecture could work for local deployment. With 46.7B total parameters across 8 experts (2 active per token, 12.9B active), it delivers quality between dense 13B and 3

46.7B

Parameters

12.9B

Active

32K

Max Context

MoE

Architecture

Dec 11, 2023

Released

Text

Modality

About Mixtral 8x7B (MoE)

Mixtral 8x7B was the first major open-source MoE model and proved the architecture could work for local deployment. With 46.7B total parameters across 8 experts (2 active per token, 12.9B active), it delivers quality between dense 13B and 34B models. At Q4_K_M it needs ~26 GB VRAM — fitting comfortably on 24 GB GPUs with partial offloading. The Apache 2.0 license and mature ecosystem support make it a perennial favorite. While newer MoE models have surpassed it, Mixtral remains the most battle-tested open MoE.

General PurposeMultilingualCodeCommercial

Technical Specifications

Total Parameters46.7B
Active Parameters12.9B per token
ArchitectureMixture of Experts
Total Experts8
Active Experts2 per token
Attention TypeGQA (Grouped Query Attention)
Hidden Dimensiond = 4,096
Transformer Layers32
Attention Heads32
KV Headsn_kv = 8
Head Dimensiond_head = 128
Activation FunctionSwiGLU
NormalizationRMSNorm
Position EmbeddingRoPE

System Requirements

Estimated VRAM at 10% overhead for different quantization methods and context sizes.

Quantization1K ctx32K ctx
Q4_K_M0.50 B/W
~97% of FP16
24.26Datacenter GPU
28.14Datacenter GPU
Q8_01.00 B/W
~100% of FP16
48.40Datacenter GPU
52.28Datacenter GPU
F162.00 B/W
Reference
96.68Cluster / Multi-GPU
100.6Cluster / Multi-GPU
Fits 24 GB consumer GPU
Fits 80 GB datacenter GPU
Requires cluster / multi-GPU

Other Mistral Models

View All

Find the right GPU for Mixtral 8x7B (MoE)

Use the interactive VRAM Calculator to see exactly how much memory you need at any quantization level, context length, and overhead setting.