OpenAIMoEApache 2.0

gpt-oss 120B (MoE)

gpt-oss 120B is OpenAI's open-source MoE contribution — 117B total parameters with only 5.1B active per token. It uses 128 experts with top-4 routing and a shallow depth (36 layers) with wide hidden dimension (2880). Designed to fit on a si

117.0B

Parameters

5.1B

Active

128K

Max Context

MoE

Architecture

Aug 20, 2025

Released

Text

Modality

About gpt-oss 120B (MoE)

gpt-oss 120B is OpenAI's open-source MoE contribution — 117B total parameters with only 5.1B active per token. It uses 128 experts with top-4 routing and a shallow depth (36 layers) with wide hidden dimension (2880). Designed to fit on a single 80 GB GPU at moderate quantization with 128K YaRN context. The extreme sparsity (4.4% active) gives it speed comparable to a 5B dense model while delivering quality competitive with 70B+ models. A fascinating demonstration of how far MoE efficiency can be pushed.

Efficient MoEEnterpriseCodeReasoningCommercial

Technical Specifications

Total Parameters117.0B
Active Parameters5.1B per token
ArchitectureMixture of Experts
Total Experts128
Active Experts4 per token
Attention TypeGQA (Grouped Query Attention)
Hidden Dimensiond = 2,880
Transformer Layers36
Attention Heads64
KV Headsn_kv = 8
Head Dimensiond_head = 64
Activation FunctionSwiGLU
NormalizationRMSNorm
Position EmbeddingYaRN-extended RoPE

System Requirements

Estimated VRAM at 10% overhead for different quantization methods and context sizes.

Quantization1K ctx128K ctx
Q4_K_M0.50 B/W
~97% of FP16
60.55Datacenter GPU
69.48Datacenter GPU
Q8_01.00 B/W
~100% of FP16
121.0Cluster / Multi-GPU
130.0Cluster / Multi-GPU
F162.00 B/W
Reference
242.0Cluster / Multi-GPU
250.9Cluster / Multi-GPU
Fits 24 GB consumer GPU
Fits 80 GB datacenter GPU
Requires cluster / multi-GPU

Other OpenAI Models

View All

Find the right GPU for gpt-oss 120B (MoE)

Use the interactive VRAM Calculator to see exactly how much memory you need at any quantization level, context length, and overhead setting.