Best models to run with Ollama

Open source

Running models locally with Ollama means no API keys, no rate limits, and no data leaving your machine. These are the best open-weight models to run locally in 2026, grouped by the hardware you actually have, from a 16GB laptop to a multi-GPU server.

Which model should you run?

As of June 2026, pick by the hardware you have:

16GB laptop: gpt-oss-20b for general use, Gemma 4 26B A4B if you want speed
24GB GPU or 32GB Mac: Qwen3 Coder 30B for code, Gemma 4 31B for vision, Qwen3 32B as the all-rounder
48GB setup: Llama 3.3 70B or Devstral 2 for coding
64GB+ workstation: gpt-oss-120b, the closest local gets to hosted quality
Edge devices and instant replies: LFM2.5 1.2B

All of these also have hosted endpoints if you want to test quality before downloading the weights.

The best model to run with Ollama on typical hardware is gpt-oss-20b, OpenAI's small open-weight model: it is comfortable on a 16GB machine with quantization and handles everyday chat and code well.

For local coding, Qwen3 Coder 30B A3B is the pick: a mixture-of-experts coder with only 3B active parameters, so it stays fast on prosumer hardware. With a 24GB GPU or a high-RAM Mac, Gemma 4 31B adds local vision, and Qwen3 32B is the dense all-rounder. If you have a 64GB+ workstation, gpt-oss-120b is where local quality starts to rival hosted APIs.

Rule of thumb: a quantized model needs roughly half its parameter count in GB of memory, and MoE models with small active parameter counts run much faster than their size suggests.

We also have more in-depth rankings:Best open-source AI models

The ranking

Updated June 2026

Best on a 16GB machine

What most laptops can run: small MoE models with quantization, surprisingly capable for everyday tasks.

#ModelContextInput

1
1
gpt-oss-20bOpenAIThe best local default: OpenAI's small open model, comfortable on 16GB machines.
Context131K
Input$0.029/M
2
2
Gemma 4 26B A4BGoogle DeepMindGoogle's efficient MoE: only 4B active parameters, so it runs fast locally.
Context262K
Input$0.06/M
3
3
Nemotron 3 Nano 30B A3BNvidiaNvidia's nano MoE, tuned for efficient local inference.
Context262K
Input$0.05/M
4
4
LFM2.5-1.2B-InstructLiquid AILiquid AI's tiny model for edge devices and instant responses.
Context33K
InputFree

Best on a 24-48GB GPU

The prosumer sweet spot: real coding and agent work, fully local, on a single beefy GPU or a high-RAM Mac.

#ModelContextInput

1
1
Qwen3 Coder 30B A3B InstructQwenThe local coding pick: a fast MoE coder that fits prosumer hardware.
Context160K
Input$0.07/M
2
2
Qwen3 30B A3B Instruct 2507QwenAll-round MoE with 3B active parameters, a favorite for local agents.
Context131K
Input$0.048/M
3
3
Gemma 4 31BGoogle DeepMindGoogle's open vision model: reads images locally on a 24GB card.
Context262K
Input$0.12/M
4
4
Qwen3 32BQwenA solid dense all-rounder for 24GB GPUs.
Context131K
Input$0.08/M
5
5
Devstral 2 2512Mistral AIMistral's open coding model, built with self-hosting in mind.
Context262K
Input$0.4/M

Best on a workstation or server

For 64GB+ of memory: quality that starts to rival hosted APIs.

#ModelContextInput

1
1
gpt-oss-120bOpenAIThe high-end local pick: flagship-class quality on a 64GB+ workstation.
Context131K
Input$0.039/M
2
2
Llama 3.3 70B InstructMetaMeta's proven 70B, still a dependable choice for 48GB setups.
Context131K
Input$0.1/M

Frequently asked questions

What is the best model to run with Ollama in 2026?

gpt-oss-20b is the best local model for most people: OpenAI's small open-weight release runs comfortably on a 16GB machine and handles everyday chat and code. With more memory, Qwen3 Coder 30B is the local coding pick, and gpt-oss-120b is the high-end choice for 64GB+ workstations.

What is the best local model for coding?

Qwen3 Coder 30B A3B is the best local coding model: a mixture-of-experts specialist with 3B active parameters, so it generates fast on a 24GB GPU or a high-RAM Mac while staying tuned for repositories and agentic edits. Devstral 2 from Mistral is the strongest alternative, built explicitly with self-hosting in mind.

How much RAM or VRAM do I need to run models locally?

A practical rule: a 4-bit quantized model needs roughly half its parameter count in GB, so a 20B model fits in about 12GB and a 70B model needs around 40GB. Mixture-of-experts models run faster than their total size suggests because only a few billion parameters are active per token, which is why 26B to 30B MoE models are the local sweet spot.

Is running models locally cheaper than using an API?

For light use, no: cheap hosted models cost well under a dollar per million tokens, which is hard to beat after hardware costs. Local wins when you run agents continuously (no rate limits and no per-token bills), need data to stay on your machine, or already own the hardware. Always-on personal agents like Hermes and OpenClaw are the classic case where local pays off.

Share:

Details:

Models
11
Filter
Open source
Updated
June 2026

Best models to run with Ollama

Open source

Running models locally with Ollama means no API keys, no rate limits, and no data leaving your machine. These are the best open-weight models to run locally in 2026, grouped by the hardware you actually have, from a 16GB laptop to a multi-GPU server.

Which model should you run?

As of June 2026, pick by the hardware you have:

16GB laptop: gpt-oss-20b for general use, Gemma 4 26B A4B if you want speed
24GB GPU or 32GB Mac: Qwen3 Coder 30B for code, Gemma 4 31B for vision, Qwen3 32B as the all-rounder
48GB setup: Llama 3.3 70B or Devstral 2 for coding
64GB+ workstation: gpt-oss-120b, the closest local gets to hosted quality
Edge devices and instant replies: LFM2.5 1.2B

All of these also have hosted endpoints if you want to test quality before downloading the weights.

The best model to run with Ollama on typical hardware is gpt-oss-20b, OpenAI's small open-weight model: it is comfortable on a 16GB machine with quantization and handles everyday chat and code well.

Rule of thumb: a quantized model needs roughly half its parameter count in GB of memory, and MoE models with small active parameter counts run much faster than their size suggests.

We also have more in-depth rankings:Best open-source AI models

The ranking

Updated June 2026

Best on a 16GB machine

What most laptops can run: small MoE models with quantization, surprisingly capable for everyday tasks.

#ModelContextInput

1
1
gpt-oss-20bOpenAIThe best local default: OpenAI's small open model, comfortable on 16GB machines.
Context131K
Input$0.029/M
2
2
Gemma 4 26B A4BGoogle DeepMindGoogle's efficient MoE: only 4B active parameters, so it runs fast locally.
Context262K
Input$0.06/M
3
3
Nemotron 3 Nano 30B A3BNvidiaNvidia's nano MoE, tuned for efficient local inference.
Context262K
Input$0.05/M
4
4
LFM2.5-1.2B-InstructLiquid AILiquid AI's tiny model for edge devices and instant responses.
Context33K
InputFree

Best on a 24-48GB GPU

The prosumer sweet spot: real coding and agent work, fully local, on a single beefy GPU or a high-RAM Mac.

#ModelContextInput

1
1
Qwen3 Coder 30B A3B InstructQwenThe local coding pick: a fast MoE coder that fits prosumer hardware.
Context160K
Input$0.07/M
2
2
Qwen3 30B A3B Instruct 2507QwenAll-round MoE with 3B active parameters, a favorite for local agents.
Context131K
Input$0.048/M
3
3
Gemma 4 31BGoogle DeepMindGoogle's open vision model: reads images locally on a 24GB card.
Context262K
Input$0.12/M
4
4
Qwen3 32BQwenA solid dense all-rounder for 24GB GPUs.
Context131K
Input$0.08/M
5
5
Devstral 2 2512Mistral AIMistral's open coding model, built with self-hosting in mind.
Context262K
Input$0.4/M

Best on a workstation or server

For 64GB+ of memory: quality that starts to rival hosted APIs.

#ModelContextInput

1
1
gpt-oss-120bOpenAIThe high-end local pick: flagship-class quality on a 64GB+ workstation.
Context131K
Input$0.039/M
2
2
Llama 3.3 70B InstructMetaMeta's proven 70B, still a dependable choice for 48GB setups.
Context131K
Input$0.1/M

Frequently asked questions

What is the best model to run with Ollama in 2026?

What is the best local model for coding?

How much RAM or VRAM do I need to run models locally?

Is running models locally cheaper than using an API?

Share:

Details:

Models
11
Filter
Open source
Updated
June 2026