CohereDenseCC-BY-NC 4.0

Command R 35B

Command R 35B is Cohere's RAG-optimized model, purpose-built for retrieval-augmented generation, multilingual enterprise tasks, and tool use. At 35B parameters with 128K context and strong performance across 10 languages, it excels at groun

35.0B

Parameters

128K

Max Context

Dense

Architecture

Mar 11, 2024

Released

Text

Modality

About Command R 35B

Command R 35B is Cohere's RAG-optimized model, purpose-built for retrieval-augmented generation, multilingual enterprise tasks, and tool use. At 35B parameters with 128K context and strong performance across 10 languages, it excels at grounded generation — citing sources, following structured instructions, and calling external tools. The CC-BY-NC license limits commercial use without Cohere's permission. At Q4_K_M it needs ~20 GB VRAM, fitting on 24 GB GPUs. Still relevant for enterprise RAG workflows where tool use and grounded generation are priorities.

RAGEnterpriseTool UseMultilingual

Technical Specifications

Total Parameters35.0B
ArchitectureDense
Attention TypeGQA (Grouped Query Attention)
Hidden Dimensiond = 8,192
Transformer Layers40
Attention Heads64
KV Headsn_kv = 8
Head Dimensiond_head = 128
Activation FunctionSwiGLU
NormalizationRMSNorm
Position EmbeddingRoPE

System Requirements

Estimated VRAM at 10% overhead for different quantization methods and context sizes.

Quantization1K ctx128K ctx
Q4_K_M0.50 B/W
~97% of FP16
18.25Consumer GPU
38.09Datacenter GPU
Q8_01.00 B/W
~100% of FP16
36.34Datacenter GPU
56.18Datacenter GPU
F162.00 B/W
Reference
72.52Datacenter GPU
92.36Cluster / Multi-GPU
Fits 24 GB consumer GPU
Fits 80 GB datacenter GPU
Requires cluster / multi-GPU

Other Cohere Models

View All

Find the right GPU for Command R 35B

Use the interactive VRAM Calculator to see exactly how much memory you need at any quantization level, context length, and overhead setting.