Qwen 3.5 9B
Qwen 3.5 9B introduces a hybrid architecture combining Gated DeltaNet linear attention layers with standard full attention — only 25% of layers use KV-cached attention, dramatically reducing memory overhead for long contexts. At 9B paramete…
9.0B
Parameters
256K
Max Context
Dense
Architecture
Feb 18, 2026
Released
Text
Modality
About Qwen 3.5 9B
Qwen 3.5 9B introduces a hybrid architecture combining Gated DeltaNet linear attention layers with standard full attention — only 25% of layers use KV-cached attention, dramatically reducing memory overhead for long contexts. At 9B parameters supporting 262K context (extensible to 1M with YaRN), it handles extremely long documents and multi-turn conversations at a fraction of the VRAM cost of traditional architectures. Apache 2.0 licensed. An excellent choice for long-context RAG, document analysis, and extended agent sessions.
Technical Specifications
System Requirements
Estimated VRAM at 10% overhead for different quantization methods and context sizes.
| Quantization | 1K ctx | 195K ctx | 256K ctx |
|---|---|---|---|
Q4_K_M0.50 B/W ~97% of FP16 | 4.68Consumer GPU | 10.76Consumer GPU | 12.65Consumer GPU |
Q8_01.00 B/W ~100% of FP16 | 9.34Consumer GPU | 15.41Consumer GPU | 17.30Consumer GPU |
F162.00 B/W Reference | 18.64Consumer GPU | 24.71Datacenter GPU | 26.61Datacenter GPU |
Other Qwen Models
View AllFind the right GPU for Qwen 3.5 9B
Use the interactive VRAM Calculator to see exactly how much memory you need at any quantization level, context length, and overhead setting.