Devstral 2 123B
Devstral 2 123B is a dense transformer language model from the Mistral family, containing 123B parameters across 96 layers. It supports up to 262K tokens of context with a hidden dimension of 10240 and 16 KV heads for efficient grouped-quer…
123.0B
Parameters
256K
Max Context
Dense
Architecture
—
Released
Text
Modality
About Devstral 2 123B
Devstral 2 123B is a dense transformer language model from the Mistral family, containing 123B parameters across 96 layers. It supports up to 262K tokens of context with a hidden dimension of 10240 and 16 KV heads for efficient grouped-query attention (GQA). Modified MIT. Agentic coding dense model. 256K context. Server class.
Technical Specifications
System Requirements
Estimated VRAM at 10% overhead for different quantization methods and context sizes.
| Quantization | 1K ctx | 195K ctx | 256K ctx |
|---|---|---|---|
Q4_K_M0.50 B/W ~97% of FP16 | 64.33Datacenter GPU | 210.1Cluster / Multi-GPU | 255.6Cluster / Multi-GPU |
Q8_01.00 B/W ~100% of FP16 | 127.9Cluster / Multi-GPU | 273.6Cluster / Multi-GPU | 319.2Cluster / Multi-GPU |
F162.00 B/W Reference | 255.1Cluster / Multi-GPU | 400.8Cluster / Multi-GPU | 446.3Cluster / Multi-GPU |
Other Mistral Models
View AllFind the right GPU for Devstral 2 123B
Use the interactive VRAM Calculator to see exactly how much memory you need at any quantization level, context length, and overhead setting.