Llama 3.3 70B
Llama 3.3 70B is a significant post-training refinement of Llama 3.1 70B with the same architecture but substantially improved instruction following, math, and coding capabilities. It closed much of the gap with Llama 3.1 405B at a fraction…
70.6B
Parameters
128K
Max Context
Dense
Architecture
Dec 6, 2024
Released
Text
Modality
About Llama 3.3 70B
Llama 3.3 70B is a significant post-training refinement of Llama 3.1 70B with the same architecture but substantially improved instruction following, math, and coding capabilities. It closed much of the gap with Llama 3.1 405B at a fraction of the size. Released alongside the Llama 3.3 70B paper detailing online RLHF improvements, it remains one of the most capable dense 70B models for local deployment.
Technical Specifications
System Requirements
Estimated VRAM at 10% overhead for different quantization methods and context sizes.
| Quantization | 1K ctx | 128K ctx |
|---|---|---|
Q4_K_M0.50 B/W ~97% of FP16 | 36.80Datacenter GPU | 76.49Datacenter GPU |
Q8_01.00 B/W ~100% of FP16 | 73.30Datacenter GPU | 113.0Cluster / Multi-GPU |
F162.00 B/W Reference | 146.3Cluster / Multi-GPU | 186.0Cluster / Multi-GPU |
Other Llama Models
View AllFind the right GPU for Llama 3.3 70B
Use the interactive VRAM Calculator to see exactly how much memory you need at any quantization level, context length, and overhead setting.