GLM-4.5 (MoE)
GLM-4.5 (MoE) is a mixture-of-experts (MoE) transformer language model from the GLM family, containing 355B parameters across 64 layers. It has 355B total parameters loaded into VRAM with 32B active per token. It supports up to 205K tokens …
355.0B
Parameters
32.0B
Active
200K
Max Context
MoE
Architecture
—
Released
Text
Modality
About GLM-4.5 (MoE)
GLM-4.5 (MoE) is a mixture-of-experts (MoE) transformer language model from the GLM family, containing 355B parameters across 64 layers. It has 355B total parameters loaded into VRAM with 32B active per token. It supports up to 205K tokens of context with a hidden dimension of 7168 and 8 KV heads for efficient grouped-query attention (GQA). MIT license. MoE. 200K context. Server class.
Technical Specifications
System Requirements
Estimated VRAM at 10% overhead for different quantization methods and context sizes.
| Quantization | 1K ctx | 195K ctx | 200K ctx |
|---|---|---|---|
Q4_K_M0.50 B/W ~97% of FP16 | 183.7Cluster / Multi-GPU | 232.3Cluster / Multi-GPU | 233.5Cluster / Multi-GPU |
Q8_01.00 B/W ~100% of FP16 | 367.2Cluster / Multi-GPU | 415.8Cluster / Multi-GPU | 417.0Cluster / Multi-GPU |
F162.00 B/W Reference | 734.2Cluster / Multi-GPU | 782.8Cluster / Multi-GPU | 784.0Cluster / Multi-GPU |
Other GLM Models
View AllFind the right GPU for GLM-4.5 (MoE)
Use the interactive VRAM Calculator to see exactly how much memory you need at any quantization level, context length, and overhead setting.