QwenDenseApache 2.0

Qwen 2.5 14B

Qwen 2.5 14B bridges the gap between 7B convenience and 32B quality. At 14.7B parameters and ~8 GB VRAM at Q4_K_M, it fits on 12 GB GPUs and runs comfortably on 16 GB. Delivers noticeably better reasoning and coding than any 7-8B model whil

14.7B

Parameters

128K

Max Context

Dense

Architecture

Sep 19, 2024

Released

Text

Modality

About Qwen 2.5 14B

Qwen 2.5 14B bridges the gap between 7B convenience and 32B quality. At 14.7B parameters and ~8 GB VRAM at Q4_K_M, it fits on 12 GB GPUs and runs comfortably on 16 GB. Delivers noticeably better reasoning and coding than any 7-8B model while staying accessible on mid-range hardware. Apache 2.0 licensed with 128K context support. A strong contender for the "best model under 10 GB VRAM" category.

General PurposeCodeMultilingualMid-Range GPUCommercial

Technical Specifications

Total Parameters14.7B
ArchitectureDense
Attention TypeGQA (Grouped Query Attention)
Hidden Dimensiond = 5,120
Transformer Layers48
Attention Heads40
KV Headsn_kv = 8
Head Dimensiond_head = 128
Activation FunctionSwiGLU
NormalizationRMSNorm
Position EmbeddingRoPE

System Requirements

Estimated VRAM at 10% overhead for different quantization methods and context sizes.

Quantization1K ctx128K ctx
Q4_K_M0.50 B/W
~97% of FP16
7.79Consumer GPU
31.60Datacenter GPU
Q8_01.00 B/W
~100% of FP16
15.38Consumer GPU
39.20Datacenter GPU
F162.00 B/W
Reference
30.58Datacenter GPU
54.39Datacenter GPU
Fits 24 GB consumer GPU
Fits 80 GB datacenter GPU
Requires cluster / multi-GPU

Other Qwen Models

View All

Find the right GPU for Qwen 2.5 14B

Use the interactive VRAM Calculator to see exactly how much memory you need at any quantization level, context length, and overhead setting.