Qwen 3.5 35B-A3B (MoE)
Qwen 3.5 35B-A3B combines the DeltaNet hybrid architecture with MoE — 256 experts, 8+1 active, totaling 35B loaded but only 3B active per token. It achieves roughly 3.5 tokens/second on an RTX 4090, making it one of the few MoE models viabl…
35.0B
Parameters
3.0B
Active
256K
Max Context
MoE
Architecture
Feb 18, 2026
Released
Text
Modality
About Qwen 3.5 35B-A3B (MoE)
Qwen 3.5 35B-A3B combines the DeltaNet hybrid architecture with MoE — 256 experts, 8+1 active, totaling 35B loaded but only 3B active per token. It achieves roughly 3.5 tokens/second on an RTX 4090, making it one of the few MoE models viable on consumer GPUs. The combination of linear attention layers and sparse MoE feed-forward networks delivers exceptional efficiency. Apache 2.0 licensed. A glimpse of where efficient local AI is heading.
Technical Specifications
System Requirements
Estimated VRAM at 10% overhead for different quantization methods and context sizes.
| Quantization | 1K ctx | 195K ctx | 256K ctx |
|---|---|---|---|
Q4_K_M0.50 B/W ~97% of FP16 | 18.11Consumer GPU | 21.91Consumer GPU | 23.09Consumer GPU |
Q8_01.00 B/W ~100% of FP16 | 36.20Datacenter GPU | 40.00Datacenter GPU | 41.18Datacenter GPU |
F162.00 B/W Reference | 72.38Datacenter GPU | 76.18Datacenter GPU | 77.36Datacenter GPU |
Other Qwen Models
View AllFind the right GPU for Qwen 3.5 35B-A3B (MoE)
Use the interactive VRAM Calculator to see exactly how much memory you need at any quantization level, context length, and overhead setting.