Llama 4 Maverick (MoE)
Llama 4 Maverick is Meta's flagship open model, competing directly with GPT-4o and Claude 3.5 Sonnet. A massive MoE with 400B total and 40B active parameters across 128 experts (16 active per token). It delivers frontier-class reasoning, co…
400.0B
Parameters
40.0B
Active
256K
Max Context
MoE
Architecture
Apr 5, 2025
Released
Text + Vision
Modality
About Llama 4 Maverick (MoE)
Llama 4 Maverick is Meta's flagship open model, competing directly with GPT-4o and Claude 3.5 Sonnet. A massive MoE with 400B total and 40B active parameters across 128 experts (16 active per token). It delivers frontier-class reasoning, coding, and creative capabilities but requires server-class hardware — ~200 GB VRAM at Q4_K_M. Primarily deployed via cloud APIs, though the open weights enable research and enterprise self-hosting. Supports 256K context and vision.
Technical Specifications
System Requirements
Estimated VRAM at 10% overhead for different quantization methods and context sizes.
| Quantization | 1K ctx | 195K ctx | 256K ctx |
|---|---|---|---|
Q4_K_M0.50 B/W ~97% of FP16 | 206.9Cluster / Multi-GPU | 243.4Cluster / Multi-GPU | 254.8Cluster / Multi-GPU |
Q8_01.00 B/W ~100% of FP16 | 413.7Cluster / Multi-GPU | 450.1Cluster / Multi-GPU | 461.5Cluster / Multi-GPU |
F162.00 B/W Reference | 827.2Cluster / Multi-GPU | 863.6Cluster / Multi-GPU | 875.0Cluster / Multi-GPU |
Other Llama Models
View AllFind the right GPU for Llama 4 Maverick (MoE)
Use the interactive VRAM Calculator to see exactly how much memory you need at any quantization level, context length, and overhead setting.