GLMMoEMIT

GLM-5.1 (MoE)

GLM-5.1 is Z.ai's flagship model for long-horizon agentic coding tasks. Built on a novel GlmMoeDSA architecture with 754B total parameters (256 routed + 1 shared experts, 8+1 active per token) across 78 layers, it combines Gated DeltaNet li

754.0B

Parameters

32.0B

Active

128K

Max Context

MoE

Architecture

Apr 7, 2026

Released

Text

Modality

About GLM-5.1 (MoE)

GLM-5.1 is Z.ai's flagship model for long-horizon agentic coding tasks. Built on a novel GlmMoeDSA architecture with 754B total parameters (256 routed + 1 shared experts, 8+1 active per token) across 78 layers, it combines Gated DeltaNet linear attention with standard attention and sparse MoE feed-forward networks — enabling efficient inference while delivering top-tier intelligence. Achieves state-of-the-art 58.4% on SWE-Bench Pro, 63.5% on Terminal-Bench 2.0, 95.3% on AIME 2026, and 86.2% on GPQA-Diamond. Uniquely designed for 8-hour sustained autonomous execution — breaking complex engineering tasks into iterative experiment-analyze-optimize loops. Supports 200K context window and 128K max output tokens. Available via API as glm-5.1 on Z.ai and BigModel.cn. Released April 7, 2026 under MIT license.

Agentic CodingSWE-benchLong-Horizon TasksEngineeringResearch

Technical Specifications

Total Parameters754.0B
Active Parameters32.0B per token
ArchitectureMixture of Experts
Total Experts257
Active Experts9 per token
Attention TypeHybrid Gated DeltaNet + Standard Attention (MoE FFN)
Hidden Dimensiond = 8,192
Transformer Layers80
Attention Heads64
KV Headsn_kv = 8
Head Dimensiond_head = 128
Activation FunctionSwiGLU
NormalizationRMSNorm
Position EmbeddingRoPE

System Requirements

Estimated VRAM at 10% overhead for different quantization methods and context sizes.

Quantization1K ctx128K ctx
Q4_K_M0.50 B/W
~97% of FP16
390.0Cluster / Multi-GPU
429.7Cluster / Multi-GPU
Q8_01.00 B/W
~100% of FP16
779.8Cluster / Multi-GPU
819.5Cluster / Multi-GPU
F162.00 B/W
Reference
1559.2Cluster / Multi-GPU
1598.9Cluster / Multi-GPU
Fits 24 GB consumer GPU
Fits 80 GB datacenter GPU
Requires cluster / multi-GPU

Other GLM Models

View All

Find the right GPU for GLM-5.1 (MoE)

Use the interactive VRAM Calculator to see exactly how much memory you need at any quantization level, context length, and overhead setting.