MistralDenseApache 2.0

Mistral NeMo 12B

Mistral NeMo 12B is a collaboration between Mistral and NVIDIA, designed with quantization-aware training. It supports a 128K context window natively and delivers strong multilingual performance. The quantization-aware design means it holds

12.0B

Parameters

128K

Max Context

Dense

Architecture

Jul 18, 2024

Released

Text

Modality

About Mistral NeMo 12B

Mistral NeMo 12B is a collaboration between Mistral and NVIDIA, designed with quantization-aware training. It supports a 128K context window natively and delivers strong multilingual performance. The quantization-aware design means it holds quality better than most models at Q4 and below. At ~6.5 GB VRAM at Q4_K_M, it fits comfortably on 8 GB GPUs and runs fast on 12 GB cards. A strong choice for users who need Apache 2.0 licensing and long context on modest hardware.

MultilingualLong ContextRAGCommercial

Technical Specifications

Total Parameters12.0B
ArchitectureDense
Attention TypeGQA (Grouped Query Attention)
Hidden Dimensiond = 5,120
Transformer Layers40
Attention Heads32
KV Headsn_kv = 8
Head Dimensiond_head = 128
Activation FunctionSwiGLU
NormalizationRMSNorm
Position EmbeddingRoPE

System Requirements

Estimated VRAM at 10% overhead for different quantization methods and context sizes.

Quantization1K ctx128K ctx
Q4_K_M0.50 B/W
~97% of FP16
6.36Consumer GPU
26.20Datacenter GPU
Q8_01.00 B/W
~100% of FP16
12.56Consumer GPU
32.41Datacenter GPU
F162.00 B/W
Reference
24.97Datacenter GPU
44.81Datacenter GPU
Fits 24 GB consumer GPU
Fits 80 GB datacenter GPU
Requires cluster / multi-GPU

Other Mistral Models

View All

Find the right GPU for Mistral NeMo 12B

Use the interactive VRAM Calculator to see exactly how much memory you need at any quantization level, context length, and overhead setting.