QwenDenseApache 2.0

Qwen 3.5 9B

Qwen 3.5 9B introduces a hybrid architecture combining Gated DeltaNet linear attention layers with standard full attention — only 25% of layers use KV-cached attention, dramatically reducing memory overhead for long contexts. At 9B paramete

9.0B

Parameters

256K

Max Context

Dense

Architecture

Feb 18, 2026

Released

Text

Modality

About Qwen 3.5 9B

Qwen 3.5 9B introduces a hybrid architecture combining Gated DeltaNet linear attention layers with standard full attention — only 25% of layers use KV-cached attention, dramatically reducing memory overhead for long contexts. At 9B parameters supporting 262K context (extensible to 1M with YaRN), it handles extremely long documents and multi-turn conversations at a fraction of the VRAM cost of traditional architectures. Apache 2.0 licensed. An excellent choice for long-context RAG, document analysis, and extended agent sessions.

Long ContextRAGDocument AnalysisAgenticCommercial

Technical Specifications

Total Parameters9.0B
ArchitectureDense
Attention TypeHybrid Gated DeltaNet + Full Attention (25% layers)
Hidden Dimensiond = 4,096
Transformer Layers8
Attention Heads16
KV Headsn_kv = 4
Head Dimensiond_head = 256
Activation FunctionSwiGLU
NormalizationRMSNorm
Position EmbeddingDual RoPE (local + global)

System Requirements

Estimated VRAM at 10% overhead for different quantization methods and context sizes.

Quantization1K ctx195K ctx256K ctx
Q4_K_M0.50 B/W
~97% of FP16
4.68Consumer GPU
10.76Consumer GPU
12.65Consumer GPU
Q8_01.00 B/W
~100% of FP16
9.34Consumer GPU
15.41Consumer GPU
17.30Consumer GPU
F162.00 B/W
Reference
18.64Consumer GPU
24.71Datacenter GPU
26.61Datacenter GPU
Fits 24 GB consumer GPU
Fits 80 GB datacenter GPU
Requires cluster / multi-GPU

Other Qwen Models

View All

Find the right GPU for Qwen 3.5 9B

Use the interactive VRAM Calculator to see exactly how much memory you need at any quantization level, context length, and overhead setting.