Mistral NeMo 12B
Mistral NeMo 12B is a collaboration between Mistral and NVIDIA, designed with quantization-aware training. It supports a 128K context window natively and delivers strong multilingual performance. The quantization-aware design means it holds…
12.0B
Parameters
128K
Max Context
Dense
Architecture
Jul 18, 2024
Released
Text
Modality
About Mistral NeMo 12B
Mistral NeMo 12B is a collaboration between Mistral and NVIDIA, designed with quantization-aware training. It supports a 128K context window natively and delivers strong multilingual performance. The quantization-aware design means it holds quality better than most models at Q4 and below. At ~6.5 GB VRAM at Q4_K_M, it fits comfortably on 8 GB GPUs and runs fast on 12 GB cards. A strong choice for users who need Apache 2.0 licensing and long context on modest hardware.
Technical Specifications
System Requirements
Estimated VRAM at 10% overhead for different quantization methods and context sizes.
| Quantization | 1K ctx | 128K ctx |
|---|---|---|
Q4_K_M0.50 B/W ~97% of FP16 | 6.36Consumer GPU | 26.20Datacenter GPU |
Q8_01.00 B/W ~100% of FP16 | 12.56Consumer GPU | 32.41Datacenter GPU |
F162.00 B/W Reference | 24.97Datacenter GPU | 44.81Datacenter GPU |
Other Mistral Models
View AllFind the right GPU for Mistral NeMo 12B
Use the interactive VRAM Calculator to see exactly how much memory you need at any quantization level, context length, and overhead setting.