gpt-oss 20B (MoE)
gpt-oss 20B (MoE) is a mixture-of-experts (MoE) transformer language model from the OpenAI family, containing 21B parameters across 24 layers. It has 21B total parameters loaded into VRAM with 3.6B active per token. It supports up to 131K t…
21.0B
Parameters
3.6B
Active
128K
Max Context
MoE
Architecture
—
Released
Text
Modality
About gpt-oss 20B (MoE)
gpt-oss 20B (MoE) is a mixture-of-experts (MoE) transformer language model from the OpenAI family, containing 21B parameters across 24 layers. It has 21B total parameters loaded into VRAM with 3.6B active per token. It supports up to 131K tokens of context with a hidden dimension of 2880 and 8 KV heads for efficient grouped-query attention (GQA). Apache 2.0. MoE: 32 experts, top-4 routing. Fits 16GB at MXFP4. Strong local reasoning.
Technical Specifications
System Requirements
Estimated VRAM at 10% overhead for different quantization methods and context sizes.
| Quantization | 1K ctx | 128K ctx |
|---|---|---|
Q4_K_M0.50 B/W ~97% of FP16 | 10.90Consumer GPU | 16.85Consumer GPU |
Q8_01.00 B/W ~100% of FP16 | 21.76Consumer GPU | 27.71Datacenter GPU |
F162.00 B/W Reference | 43.47Datacenter GPU | 49.42Datacenter GPU |
Other OpenAI Models
View AllFind the right GPU for gpt-oss 20B (MoE)
Use the interactive VRAM Calculator to see exactly how much memory you need at any quantization level, context length, and overhead setting.