gpt-oss 120B (MoE)
gpt-oss 120B is OpenAI's open-source MoE contribution — 117B total parameters with only 5.1B active per token. It uses 128 experts with top-4 routing and a shallow depth (36 layers) with wide hidden dimension (2880). Designed to fit on a si…
117.0B
Parameters
5.1B
Active
128K
Max Context
MoE
Architecture
Aug 20, 2025
Released
Text
Modality
About gpt-oss 120B (MoE)
gpt-oss 120B is OpenAI's open-source MoE contribution — 117B total parameters with only 5.1B active per token. It uses 128 experts with top-4 routing and a shallow depth (36 layers) with wide hidden dimension (2880). Designed to fit on a single 80 GB GPU at moderate quantization with 128K YaRN context. The extreme sparsity (4.4% active) gives it speed comparable to a 5B dense model while delivering quality competitive with 70B+ models. A fascinating demonstration of how far MoE efficiency can be pushed.
Technical Specifications
System Requirements
Estimated VRAM at 10% overhead for different quantization methods and context sizes.
| Quantization | 1K ctx | 128K ctx |
|---|---|---|
Q4_K_M0.50 B/W ~97% of FP16 | 60.55Datacenter GPU | 69.48Datacenter GPU |
Q8_01.00 B/W ~100% of FP16 | 121.0Cluster / Multi-GPU | 130.0Cluster / Multi-GPU |
F162.00 B/W Reference | 242.0Cluster / Multi-GPU | 250.9Cluster / Multi-GPU |
Other OpenAI Models
View AllFind the right GPU for gpt-oss 120B (MoE)
Use the interactive VRAM Calculator to see exactly how much memory you need at any quantization level, context length, and overhead setting.