Mixtral 8x7B (MoE)
Mixtral 8x7B was the first major open-source MoE model and proved the architecture could work for local deployment. With 46.7B total parameters across 8 experts (2 active per token, 12.9B active), it delivers quality between dense 13B and 3…
46.7B
Parameters
12.9B
Active
32K
Max Context
MoE
Architecture
Dec 11, 2023
Released
Text
Modality
About Mixtral 8x7B (MoE)
Mixtral 8x7B was the first major open-source MoE model and proved the architecture could work for local deployment. With 46.7B total parameters across 8 experts (2 active per token, 12.9B active), it delivers quality between dense 13B and 34B models. At Q4_K_M it needs ~26 GB VRAM — fitting comfortably on 24 GB GPUs with partial offloading. The Apache 2.0 license and mature ecosystem support make it a perennial favorite. While newer MoE models have surpassed it, Mixtral remains the most battle-tested open MoE.
Technical Specifications
System Requirements
Estimated VRAM at 10% overhead for different quantization methods and context sizes.
| Quantization | 1K ctx | 32K ctx |
|---|---|---|
Q4_K_M0.50 B/W ~97% of FP16 | 24.26Datacenter GPU | 28.14Datacenter GPU |
Q8_01.00 B/W ~100% of FP16 | 48.40Datacenter GPU | 52.28Datacenter GPU |
F162.00 B/W Reference | 96.68Cluster / Multi-GPU | 100.6Cluster / Multi-GPU |
Other Mistral Models
View AllFind the right GPU for Mixtral 8x7B (MoE)
Use the interactive VRAM Calculator to see exactly how much memory you need at any quantization level, context length, and overhead setting.