DatabricksMoEDatabricks Open Model License

DBRX 132B (MoE)

DBRX 132B is Databricks' MoE model — 132B total with 36B active across 16 experts (top-4 routing). One of the earliest major open MoE releases alongside Mixtral. At ~70 GB VRAM at Q4_K_M it requires server-class hardware. While newer MoE mo

132.0B

Parameters

36.0B

Active

32K

Max Context

MoE

Architecture

Mar 27, 2024

Released

Text

Modality

About DBRX 132B (MoE)

DBRX 132B is Databricks' MoE model — 132B total with 36B active across 16 experts (top-4 routing). One of the earliest major open MoE releases alongside Mixtral. At ~70 GB VRAM at Q4_K_M it requires server-class hardware. While newer MoE models have surpassed it in both quality and efficiency, DBRX remains historically significant and is still used in enterprise Databricks workflows. The Databricks Open Model License has some usage restrictions.

EnterpriseCodeResearch

Technical Specifications

Total Parameters132.0B
Active Parameters36.0B per token
ArchitectureMixture of Experts
Total Experts16
Active Experts4 per token
Attention TypeGQA (Grouped Query Attention)
Hidden Dimensiond = 6,144
Transformer Layers40
Attention Heads48
KV Headsn_kv = 8
Head Dimensiond_head = 128
Activation FunctionSwiGLU
NormalizationRMSNorm
Position EmbeddingRoPE

System Requirements

Estimated VRAM at 10% overhead for different quantization methods and context sizes.

Quantization1K ctx32K ctx
Q4_K_M0.50 B/W
~97% of FP16
68.38Datacenter GPU
73.23Datacenter GPU
Q8_01.00 B/W
~100% of FP16
136.6Cluster / Multi-GPU
141.5Cluster / Multi-GPU
F162.00 B/W
Reference
273.1Cluster / Multi-GPU
277.9Cluster / Multi-GPU
Fits 24 GB consumer GPU
Fits 80 GB datacenter GPU
Requires cluster / multi-GPU

Find the right GPU for DBRX 132B (MoE)

Use the interactive VRAM Calculator to see exactly how much memory you need at any quantization level, context length, and overhead setting.