Nemotron 3 Nano 4B
Nemotron 3 Nano 4B is a dense transformer language model from the Nvidia family, containing 4B parameters across 32 layers. It supports up to 262K tokens of context with a hidden dimension of 2560 and 8 KV heads for efficient grouped-query …
4.0B
Parameters
256K
Max Context
Dense
Architecture
—
Released
Text
Modality
About Nemotron 3 Nano 4B
Nemotron 3 Nano 4B is a dense transformer language model from the Nvidia family, containing 4B parameters across 32 layers. It supports up to 262K tokens of context with a hidden dimension of 2560 and 8 KV heads for efficient grouped-query attention (GQA). Nemotron Open Model License. Hybrid Mamba2-Transformer. Laptop/workstation friendly.
Technical Specifications
System Requirements
Estimated VRAM at 10% overhead for different quantization methods and context sizes.
| Quantization | 1K ctx | 195K ctx | 256K ctx |
|---|---|---|---|
Q4_K_M0.50 B/W ~97% of FP16 | 2.19Consumer GPU | 26.48Datacenter GPU | 34.07Datacenter GPU |
Q8_01.00 B/W ~100% of FP16 | 4.26Consumer GPU | 28.55Datacenter GPU | 36.14Datacenter GPU |
F162.00 B/W Reference | 8.40Consumer GPU | 32.68Datacenter GPU | 40.27Datacenter GPU |
Other Nvidia Models
View AllFind the right GPU for Nemotron 3 Nano 4B
Use the interactive VRAM Calculator to see exactly how much memory you need at any quantization level, context length, and overhead setting.