Local LLM Models Directory

Complete reference of 94 open-source local LLMs across 21 model families. Compare parameters, architecture, layers, context length, and Q4_K_M VRAM requirements.

94
Models
21
Families
68
Dense
26
MoE
GPU Tier:

Showing 94 of 94 models

Dense
Params

490M

Q4_K_M VRAM

228 MB

Layers

24

Max Context

32K

Hidden Dim

896

KV Heads

2

Open in VRAM Calculator
Dense

Apache 2.0. Thinking mode toggle. Tied embeddings.

Params

600M

Q4_K_M VRAM

279 MB

Layers

28

Max Context

32K

Hidden Dim

1,024

KV Heads

8

Open in VRAM Calculator
Dense

Apache 2.0. Hybrid DeltaNet+Attn (25% layers KV cache). 262K→1M ctx.

Params

800M

Q4_K_M VRAM

373 MB

Layers

6

Max Context

256K

Hidden Dim

1,024

KV Heads

2

Open in VRAM Calculator
Dense
Params

1.0B

Q4_K_M VRAM

466 MB

Layers

26

Max Context

32K

Hidden Dim

1,152

KV Heads

3

Open in VRAM Calculator

LFM2 1.2B

Liquid
Dense

LFM Open License (Apache 2.0 based). On-device hybrid model. Fast CPU/mobile inference.

Params

1.2B

Q4_K_M VRAM

559 MB

Layers

24

Max Context

32K

Hidden Dim

2,048

KV Heads

8

Open in VRAM Calculator
Dense
Params

1.2B

Q4_K_M VRAM

577 MB

Layers

16

Max Context

128K

Hidden Dim

2,048

KV Heads

8

Open in VRAM Calculator
Dense
Params

1.5B

Q4_K_M VRAM

717 MB

Layers

28

Max Context

32K

Hidden Dim

1,536

KV Heads

2

Open in VRAM Calculator

Reasoning distilled into Qwen 2.5 1.5B base.

Params

1.5B

Q4_K_M VRAM

717 MB

Layers

28

Max Context

32K

Hidden Dim

1,536

KV Heads

2

Open in VRAM Calculator
Dense

Apache 2.0. Thinking mode toggle.

Params

1.7B

Q4_K_M VRAM

801 MB

Layers

28

Max Context

32K

Hidden Dim

2,048

KV Heads

8

Open in VRAM Calculator
Dense

Apache 2.0. Hybrid DeltaNet+Attn (25% layers KV cache). 262K→1M ctx.

Params

2.0B

Q4_K_M VRAM

931 MB

Layers

6

Max Context

256K

Hidden Dim

2,048

KV Heads

4

Open in VRAM Calculator

Granite 3.1 2B

IBM Granite
Dense

Apache 2.0. Enterprise RAG, code, safety.

Params

2.0B

Q4_K_M VRAM

931 MB

Layers

32

Max Context

128K

Hidden Dim

2,048

KV Heads

8

Open in VRAM Calculator
Dense

Apache 2.0. Cascade-distilled from Mistral Small 3.1.

Params

3.0B

Q4_K_M VRAM

1.4 GB

Layers

26

Max Context

256K

Hidden Dim

3,072

KV Heads

8

Open in VRAM Calculator
Dense

Apache 2.0. Tiny model for CPU/browser/phone. Educational use.

Params

3.0B

Q4_K_M VRAM

1.4 GB

Layers

32

Max Context

8K

Hidden Dim

2,560

KV Heads

8

Open in VRAM Calculator
Dense
Params

3.1B

Q4_K_M VRAM

1.4 GB

Layers

36

Max Context

32K

Hidden Dim

2,048

KV Heads

2

Open in VRAM Calculator
Dense
Params

3.2B

Q4_K_M VRAM

1.5 GB

Layers

28

Max Context

128K

Hidden Dim

3,072

KV Heads

8

Open in VRAM Calculator
Params

3.8B

Q4_K_M VRAM

1.8 GB

Layers

32

Max Context

16K

Hidden Dim

3,072

KV Heads

6

Open in VRAM Calculator
Dense

Apache 2.0. Hybrid DeltaNet+Attn (25% layers KV cache). 262K→1M ctx.

Params

4.0B

Q4_K_M VRAM

1.9 GB

Layers

8

Max Context

256K

Hidden Dim

2,560

KV Heads

4

Open in VRAM Calculator
Dense
Params

4.0B

Q4_K_M VRAM

1.9 GB

Layers

34

Max Context

32K

Hidden Dim

2,560

KV Heads

4

Open in VRAM Calculator

Nemotron Open Model License. Hybrid Mamba2-Transformer. Laptop/workstation friendly.

Params

4.0B

Q4_K_M VRAM

1.9 GB

Layers

32

Max Context

256K

Hidden Dim

2,560

KV Heads

8

Open in VRAM Calculator
Dense

Apache 2.0. Thinking mode toggle. Great small local model.

Params

4.0B

Q4_K_M VRAM

1.9 GB

Layers

36

Max Context

32K

Hidden Dim

2,560

KV Heads

8

Open in VRAM Calculator
Dense

Effective 2.3B active via PLE. Hybrid local+global attn. Audio+image. 128K ctx.

Params

5.1B

Q4_K_M VRAM

2.4 GB

Layers

35

Max Context

128K

Hidden Dim

2,560

KV Heads

4

Open in VRAM Calculator

MIT license. Image + audio + text multimodal. Good compact multimodal local.

Params

5.6B

Q4_K_M VRAM

2.6 GB

Layers

36

Max Context

16K

Hidden Dim

3,584

KV Heads

7

Open in VRAM Calculator

OLMo 3 7B

AI2 OLMo
Dense

Fully open data/code/weights. Transparent research model.

Params

7.0B

Q4_K_M VRAM

3.3 GB

Layers

32

Max Context

32K

Hidden Dim

4,096

KV Heads

8

Open in VRAM Calculator
Dense

OpenRAIL BigCode license. Code completion/instruct. Mature local code base.

Params

7.0B

Q4_K_M VRAM

3.3 GB

Layers

32

Max Context

16K

Hidden Dim

4,096

KV Heads

8

Open in VRAM Calculator
Dense
Params

7.3B

Q4_K_M VRAM

3.4 GB

Layers

32

Max Context

32K

Hidden Dim

4,096

KV Heads

8

Open in VRAM Calculator
Dense
Params

7.6B

Q4_K_M VRAM

3.5 GB

Layers

28

Max Context

128K

Hidden Dim

3,584

KV Heads

4

Open in VRAM Calculator

Apache 2.0. Mature GGUF/MLX support. Excellent laptop coding.

Params

7.6B

Q4_K_M VRAM

3.5 GB

Layers

28

Max Context

32K

Hidden Dim

3,584

KV Heads

4

Open in VRAM Calculator

Reasoning distilled into Qwen 2.5 7B base. Great local reasoning.

Params

7.6B

Q4_K_M VRAM

3.5 GB

Layers

28

Max Context

32K

Hidden Dim

3,584

KV Heads

4

Open in VRAM Calculator
Dense

Apache 2.0. Cascade-distilled from Mistral Small 3.1.

Params

8.0B

Q4_K_M VRAM

3.7 GB

Layers

34

Max Context

256K

Hidden Dim

4,096

KV Heads

8

Open in VRAM Calculator
Dense

Effective 4.5B active via PLE. Hybrid local+global attn. Audio+image. 128K ctx.

Params

8.0B

Q4_K_M VRAM

3.7 GB

Layers

42

Max Context

128K

Hidden Dim

3,072

KV Heads

6

Open in VRAM Calculator

Nvidia fine-tune of Llama 3.1 8B for reasoning.

Params

8.0B

Q4_K_M VRAM

3.7 GB

Layers

32

Max Context

128K

Hidden Dim

4,096

KV Heads

8

Open in VRAM Calculator

Granite 3.1 8B

IBM Granite
Dense

Apache 2.0. Enterprise chat, code, safety. Mature local deployments.

Params

8.0B

Q4_K_M VRAM

3.7 GB

Layers

40

Max Context

128K

Hidden Dim

4,096

KV Heads

8

Open in VRAM Calculator
Dense

Open weights. On-device agents + MCP tool use. Good local tools/agents.

Params

8.0B

Q4_K_M VRAM

3.7 GB

Layers

32

Max Context

32K

Hidden Dim

4,096

KV Heads

8

Open in VRAM Calculator
Dense
Params

8.0B

Q4_K_M VRAM

3.7 GB

Layers

32

Max Context

128K

Hidden Dim

4,096

KV Heads

8

Open in VRAM Calculator
Dense

Apache 2.0. Hybrid reasoning. Strong all-round local model.

Params

8.2B

Q4_K_M VRAM

3.8 GB

Layers

36

Max Context

128K

Hidden Dim

4,096

KV Heads

8

Open in VRAM Calculator
Dense

Apache 2.0. Hybrid DeltaNet+Attn (25% layers KV cache). 13x smaller than gpt-oss-120b.

Params

9.0B

Q4_K_M VRAM

4.2 GB

Layers

8

Max Context

256K

Hidden Dim

4,096

KV Heads

4

Open in VRAM Calculator

NVIDIA Open Model License. Unified reasoning/non-reasoning. 128K ctx.

Params

9.0B

Q4_K_M VRAM

4.2 GB

Layers

40

Max Context

128K

Hidden Dim

4,096

KV Heads

8

Open in VRAM Calculator

Apache 2.0. Chinese/English coding. Mature GGUF support.

Params

9.0B

Q4_K_M VRAM

4.2 GB

Layers

40

Max Context

32K

Hidden Dim

4,096

KV Heads

8

Open in VRAM Calculator
Dense

Apache 2.0. Quantization-aware. NVIDIA collaboration. 128K context.

Params

12.0B

Q4_K_M VRAM

5.6 GB

Layers

40

Max Context

128K

Hidden Dim

5,120

KV Heads

8

Open in VRAM Calculator
Dense
Params

12.0B

Q4_K_M VRAM

5.6 GB

Layers

48

Max Context

32K

Hidden Dim

3,840

KV Heads

6

Open in VRAM Calculator
Dense

Apache 2.0. Includes vision encoder. Strong laptop coding option.

Params

14.0B

Q4_K_M VRAM

6.5 GB

Layers

40

Max Context

256K

Hidden Dim

5,120

KV Heads

8

Open in VRAM Calculator
Dense

RL-derived code reasoning. Good local coding reasoner class.

Params

14.0B

Q4_K_M VRAM

6.5 GB

Layers

40

Max Context

32K

Hidden Dim

5,120

KV Heads

8

Open in VRAM Calculator
Dense
Params

14.7B

Q4_K_M VRAM

6.8 GB

Layers

48

Max Context

128K

Hidden Dim

5,120

KV Heads

8

Open in VRAM Calculator

Apache 2.0. Strong local coding with mature runtime support.

Params

14.7B

Q4_K_M VRAM

6.8 GB

Layers

48

Max Context

32K

Hidden Dim

5,120

KV Heads

8

Open in VRAM Calculator

Reasoning distilled into Qwen 2.5 14B base.

Params

14.7B

Q4_K_M VRAM

6.8 GB

Layers

48

Max Context

32K

Hidden Dim

5,120

KV Heads

8

Open in VRAM Calculator
Dense

MIT license. Math/reasoning specialist. High quality for size.

Params

14.7B

Q4_K_M VRAM

6.8 GB

Layers

40

Max Context

16K

Hidden Dim

5,120

KV Heads

10

Open in VRAM Calculator
Dense

Apache 2.0. Dense 14B. Excellent workstation model.

Params

14.8B

Q4_K_M VRAM

6.9 GB

Layers

40

Max Context

128K

Hidden Dim

5,120

KV Heads

8

Open in VRAM Calculator
Dense

OpenRAIL BigCode license. Strong code completion with responsible-use clauses.

Params

15.0B

Q4_K_M VRAM

7.0 GB

Layers

40

Max Context

16K

Hidden Dim

6,144

KV Heads

8

Open in VRAM Calculator

Granite 3.1 20B

IBM Granite
Dense

Apache 2.0. Strong enterprise local option.

Params

20.0B

Q4_K_M VRAM

9.3 GB

Layers

52

Max Context

128K

Hidden Dim

5,120

KV Heads

8

Open in VRAM Calculator

Apache 2.0. MoE: 32 experts, top-4 routing. Fits 16GB at MXFP4. Strong local reasoning.

Params

21.0B(3.6B active)

Q4_K_M VRAM

9.8 GB

Layers

24

Max Context

128K

Hidden Dim

2,880

KV Heads

8

Open in VRAM Calculator

Apache 2.0. Runs on RTX 4090 / 32GB Mac. Vision + function calling.

Params

24.0B

Q4_K_M VRAM

11.2 GB

Layers

56

Max Context

128K

Hidden Dim

6,144

KV Heads

8

Open in VRAM Calculator

Apache 2.0. Reasoning-focused dense model. Good workstation option.

Params

24.0B

Q4_K_M VRAM

11.2 GB

Layers

56

Max Context

128K

Hidden Dim

6,144

KV Heads

8

Open in VRAM Calculator

MoE: 128 experts, 8 active + 1 shared. Sliding window 1K. 256K ctx.

Params

25.2B(3.8B active)

Q4_K_M VRAM

11.7 GB

Layers

30

Max Context

256K

Hidden Dim

4,096

KV Heads

8

Open in VRAM Calculator
Dense

Apache 2.0. Dense 27B. Hybrid DeltaNet+Attn (25% layers KV cache). 262K ctx.

Params

27.0B

Q4_K_M VRAM

12.6 GB

Layers

16

Max Context

256K

Hidden Dim

5,120

KV Heads

4

Open in VRAM Calculator
Dense

Apache 2.0. Dense 27B coding specialist. Hybrid DeltaNet+Attn (25% KV cache layers).

Params

27.0B

Q4_K_M VRAM

12.6 GB

Layers

16

Max Context

256K

Hidden Dim

5,120

KV Heads

4

Open in VRAM Calculator
Dense
Params

27.0B

Q4_K_M VRAM

12.6 GB

Layers

64

Max Context

128K

Hidden Dim

5,632

KV Heads

8

Open in VRAM Calculator

Apache 2.0. MoE: efficient local model. 3B active per token.

Params

30.0B(3.0B active)

Q4_K_M VRAM

14.0 GB

Layers

48

Max Context

128K

Hidden Dim

4,096

KV Heads

8

Open in VRAM Calculator

Nemotron Open Model License. MoE. Up to 1M context. Efficient local reasoning/agents.

Params

30.0B(3.0B active)

Q4_K_M VRAM

14.0 GB

Layers

40

Max Context

256K

Hidden Dim

2,560

KV Heads

8

Open in VRAM Calculator
Dense

Dense 31B. Hybrid local+global attn. Dual RoPE. TurboQuant 3-bit KV. 256K ctx. #3 open model on Arena.

Params

30.7B

Q4_K_M VRAM

14.3 GB

Layers

60

Max Context

256K

Hidden Dim

5,632

KV Heads

8

Open in VRAM Calculator

OLMo 3 32B

AI2 OLMo
Dense

Fully open research model. Instruction/thinking variants.

Params

32.0B

Q4_K_M VRAM

14.9 GB

Layers

64

Max Context

32K

Hidden Dim

5,120

KV Heads

8

Open in VRAM Calculator
Dense
Params

32.5B

Q4_K_M VRAM

15.1 GB

Layers

64

Max Context

128K

Hidden Dim

5,120

KV Heads

8

Open in VRAM Calculator

Apache 2.0. Top local coding model with mature support.

Params

32.5B

Q4_K_M VRAM

15.1 GB

Layers

64

Max Context

32K

Hidden Dim

5,120

KV Heads

8

Open in VRAM Calculator

Reasoning distilled into Qwen 2.5 32B base. Top local reasoning.

Params

32.5B

Q4_K_M VRAM

15.1 GB

Layers

64

Max Context

32K

Hidden Dim

5,120

KV Heads

8

Open in VRAM Calculator
Dense

Apache 2.0. Dense 32B. Top-tier workstation coding/general.

Params

32.8B

Q4_K_M VRAM

15.3 GB

Layers

64

Max Context

128K

Hidden Dim

5,120

KV Heads

8

Open in VRAM Calculator

Apache 2.0. MoE: 256 experts, 8+1 active. DeltaNet+MoE hybrid. 3.5 tok/s on RTX 4090.

Params

35.0B(3.0B active)

Q4_K_M VRAM

16.3 GB

Layers

10

Max Context

256K

Hidden Dim

2,048

KV Heads

2

Open in VRAM Calculator

Apache 2.0. MoE: 256 experts, 8+1 active. DeltaNet+GA hybrid. 262K ctx, ext to ~1M with YaRN. SWE-bench 73.4.

Params

35.0B(3.0B active)

Q4_K_M VRAM

16.3 GB

Layers

10

Max Context

256K

Hidden Dim

2,048

KV Heads

2

Open in VRAM Calculator
Dense

CC-BY-NC. RAG, multilingual, tool use specialist. 128K context.

Params

35.0B

Q4_K_M VRAM

16.3 GB

Layers

40

Max Context

128K

Hidden Dim

8,192

KV Heads

8

Open in VRAM Calculator

MoE: 8 experts, 2 active. All 46.7B params loaded.

Params

46.7B(12.9B active)

Q4_K_M VRAM

21.7 GB

Layers

32

Max Context

32K

Hidden Dim

4,096

KV Heads

8

Open in VRAM Calculator
Dense
Params

70.6B

Q4_K_M VRAM

32.9 GB

Layers

80

Max Context

128K

Hidden Dim

8,192

KV Heads

8

Open in VRAM Calculator
Dense
Params

70.6B

Q4_K_M VRAM

32.9 GB

Layers

80

Max Context

128K

Hidden Dim

8,192

KV Heads

8

Open in VRAM Calculator

Reasoning distilled into Llama 3.3 70B base. Workstation class.

Params

70.6B

Q4_K_M VRAM

32.9 GB

Layers

80

Max Context

32K

Hidden Dim

8,192

KV Heads

8

Open in VRAM Calculator
Dense
Params

72.7B

Q4_K_M VRAM

33.9 GB

Layers

80

Max Context

128K

Hidden Dim

8,192

KV Heads

8

Open in VRAM Calculator
Dense
Params

104.0B

Q4_K_M VRAM

48.4 GB

Layers

64

Max Context

125K

Hidden Dim

12,288

KV Heads

8

Open in VRAM Calculator

MoE: 16 experts, 2 active. All 109B params loaded into VRAM.

Params

109.0B(17.0B active)

Q4_K_M VRAM

50.8 GB

Layers

48

Max Context

256K

Hidden Dim

5,120

KV Heads

8

Open in VRAM Calculator

Apache 2.0. MoE: 128 experts, top-4 routing. Single 80GB GPU capable. 128K YaRN context.

Params

117.0B(5.1B active)

Q4_K_M VRAM

54.5 GB

Layers

36

Max Context

128K

Hidden Dim

2,880

KV Heads

8

Open in VRAM Calculator

Apache 2.0. MoE: 256 experts. DeltaNet+MoE hybrid. Server/high-end workstation.

Params

122.0B(10.0B active)

Q4_K_M VRAM

56.8 GB

Layers

12

Max Context

256K

Hidden Dim

3,072

KV Heads

2

Open in VRAM Calculator
Dense

Modified MIT. Agentic coding dense model. 256K context. Server class.

Params

123.0B

Q4_K_M VRAM

57.3 GB

Layers

96

Max Context

256K

Hidden Dim

10,240

KV Heads

16

Open in VRAM Calculator

DBRX 132B (MoE)

Databricks
MoE

Databricks Open Model License. Older but important open MoE.

Params

132.0B(36.0B active)

Q4_K_M VRAM

61.5 GB

Layers

40

Max Context

32K

Hidden Dim

6,144

KV Heads

8

Open in VRAM Calculator

MoE: 8 experts, 2 active. All 141B params loaded.

Params

141.0B(39.0B active)

Q4_K_M VRAM

65.7 GB

Layers

56

Max Context

64K

Hidden Dim

6,144

KV Heads

8

Open in VRAM Calculator

Apache 2.0. MoE flagship. Server class.

Params

235.0B(22.0B active)

Q4_K_M VRAM

109.4 GB

Layers

96

Max Context

128K

Hidden Dim

8,192

KV Heads

8

Open in VRAM Calculator

April 2026. 284B total / 13B active. 1M context. Economical V4 variant. High-memory server class.

Params

284.0B(13.0B active)

Q4_K_M VRAM

132.2 GB

Layers

48

Max Context

1.0M

Hidden Dim

6,144

KV Heads

8

Open in VRAM Calculator

MIT license. MoE. 200K context. Server class.

Params

355.0B(32.0B active)

Q4_K_M VRAM

165.3 GB

Layers

64

Max Context

200K

Hidden Dim

7,168

KV Heads

8

Open in VRAM Calculator

Apache 2.0. MoE flagship: 512 experts. DeltaNet+MoE hybrid. Server class.

Params

397.0B(17.0B active)

Q4_K_M VRAM

184.9 GB

Layers

15

Max Context

256K

Hidden Dim

4,096

KV Heads

2

Open in VRAM Calculator

MoE: 128 experts, 16 active. All ~400B params loaded.

Params

400.0B(40.0B active)

Q4_K_M VRAM

186.3 GB

Layers

48

Max Context

256K

Hidden Dim

6,400

KV Heads

8

Open in VRAM Calculator
Dense

Server/cluster class. Full precision impractical for consumer hardware.

Params

405.0B

Q4_K_M VRAM

188.6 GB

Layers

126

Max Context

128K

Hidden Dim

16,384

KV Heads

8

Open in VRAM Calculator

Apache 2.0. Agentic coding MoE. Up to 1M extrapolated ctx. Server class.

Params

480.0B(35.0B active)

Q4_K_M VRAM

223.5 GB

Layers

96

Max Context

256K

Hidden Dim

8,192

KV Heads

8

Open in VRAM Calculator

Apache 2.0. Enterprise SQL/coding MoE. Server class.

Params

480.0B(17.0B active)

Q4_K_M VRAM

223.5 GB

Layers

64

Max Context

32K

Hidden Dim

7,168

KV Heads

8

Open in VRAM Calculator

MoE: 256 experts, 8 active. MLA compresses KV cache ~95%. All 671B loaded.

Params

671.0B(37.0B active)

Q4_K_M VRAM

312.5 GB

Layers

61

Max Context

64K

Hidden Dim

7,168

KV Heads

8

Open in VRAM Calculator

Same architecture as R1. Non-reasoning variant.

Params

671.0B(37.0B active)

Q4_K_M VRAM

312.5 GB

Layers

61

Max Context

64K

Hidden Dim

7,168

KV Heads

8

Open in VRAM Calculator

Apache 2.0. MoE: 128 experts, top-4 routing. Server class.

Params

675.0B(41.0B active)

Q4_K_M VRAM

314.3 GB

Layers

88

Max Context

256K

Hidden Dim

12,288

KV Heads

8

Open in VRAM Calculator

March 2024 update. 685B total params. MLA compressed KV cache.

Params

685.0B(37.0B active)

Q4_K_M VRAM

319.0 GB

Layers

61

Max Context

64K

Hidden Dim

7,168

KV Heads

8

Open in VRAM Calculator

MIT license. MoE: DSA attention. FP8 repo ~1.5 TB. Agentic engineering. Server class.

Params

754.0B(32.0B active)

Q4_K_M VRAM

351.1 GB

Layers

80

Max Context

128K

Hidden Dim

8,192

KV Heads

8

Open in VRAM Calculator

Modified MIT. MoE: 384 experts, 8+1 active. MLA for KV compression. Multimodal (MoonViT 400M). Server class. 1T total params.

Params

1.0T(32.0B active)

Q4_K_M VRAM

465.7 GB

Layers

61

Max Context

256K

Hidden Dim

7,168

KV Heads

8

Open in VRAM Calculator

April 2026 preview. 1.6T total / 49B active. 1M context. DSA + token compression. Cluster class.

Params

1.6T(49.0B active)

Q4_K_M VRAM

745.1 GB

Layers

80

Max Context

1.0M

Hidden Dim

8,192

KV Heads

8

Open in VRAM Calculator

About This Data

Q4_K_M VRAM

Estimated GPU memory for model weights at Q4_K_M quantization (0.5 bytes/param). Actual usage will be higher with KV cache. For precise calculations including context length and overhead, use the VRAM Calculator.

MoE Models

Mixture of Experts models show both total parameters (all experts loaded into VRAM) and active parameters (per-token compute). MoE models need VRAM for all experts but run faster than dense models of equivalent total size.

Architecture Types

Dense: All parameters active per token. Standard transformer architecture.
MoE: Expert sub-networks with sparse activation. Better quality-per-FLOP ratio.