Best models to run with Ollama
Open sourceRunning models locally with Ollama means no API keys, no rate limits, and no data leaving your machine. These are the best open-weight models to run locally in 2026, grouped by the hardware you actually have, from a 16GB laptop to a multi-GPU server.
As of June 2026, pick by the hardware you have:
- 16GB laptop: gpt-oss-20b for general use, Gemma 4 26B A4B if you want speed
- 24GB GPU or 32GB Mac: Qwen3 Coder 30B for code, Gemma 4 31B for vision, Qwen3 32B as the all-rounder
- 48GB setup: Llama 3.3 70B or Devstral 2 for coding
- 64GB+ workstation: gpt-oss-120b, the closest local gets to hosted quality
- Edge devices and instant replies: LFM2.5 1.2B
All of these also have hosted endpoints if you want to test quality before downloading the weights.
The best model to run with Ollama on typical hardware is gpt-oss-20b, OpenAI's small open-weight model: it is comfortable on a 16GB machine with quantization and handles everyday chat and code well.
For local coding, Qwen3 Coder 30B A3B is the pick: a mixture-of-experts coder with only 3B active parameters, so it stays fast on prosumer hardware. With a 24GB GPU or a high-RAM Mac, Gemma 4 31B adds local vision, and Qwen3 32B is the dense all-rounder. If you have a 64GB+ workstation, gpt-oss-120b is where local quality starts to rival hosted APIs.
Rule of thumb: a quantized model needs roughly half its parameter count in GB of memory, and MoE models with small active parameter counts run much faster than their size suggests.
What most laptops can run: small MoE models with quantization, surprisingly capable for everyday tasks.
- 11gpt-oss-20bOpenAIThe best local default: OpenAI's small open model, comfortable on 16GB machines.Context131KInput$0.029/M
- 22Gemma 4 26B A4BGoogle DeepMindGoogle's efficient MoE: only 4B active parameters, so it runs fast locally.Context262KInput$0.06/M
- 33Nemotron 3 Nano 30B A3BNvidiaNvidia's nano MoE, tuned for efficient local inference.Context262KInput$0.05/M
- 44LFM2.5-1.2B-InstructLiquid AILiquid AI's tiny model for edge devices and instant responses.Context33KInputFree
The prosumer sweet spot: real coding and agent work, fully local, on a single beefy GPU or a high-RAM Mac.
- 11Qwen3 Coder 30B A3B InstructQwenThe local coding pick: a fast MoE coder that fits prosumer hardware.Context160KInput$0.07/M
- 22Qwen3 30B A3B Instruct 2507QwenAll-round MoE with 3B active parameters, a favorite for local agents.Context131KInput$0.048/M
- 33Gemma 4 31BGoogle DeepMindGoogle's open vision model: reads images locally on a 24GB card.Context262KInput$0.12/M
- 44Qwen3 32BQwenA solid dense all-rounder for 24GB GPUs.Context131KInput$0.08/M
- 55Devstral 2 2512Mistral AIMistral's open coding model, built with self-hosting in mind.Context262KInput$0.4/M
For 64GB+ of memory: quality that starts to rival hosted APIs.
- 11gpt-oss-120bOpenAIThe high-end local pick: flagship-class quality on a 64GB+ workstation.Context131KInput$0.039/M
- 22Llama 3.3 70B InstructMetaMeta's proven 70B, still a dependable choice for 48GB setups.Context131KInput$0.1/M
What is the best model to run with Ollama in 2026?
gpt-oss-20b is the best local model for most people: OpenAI's small open-weight release runs comfortably on a 16GB machine and handles everyday chat and code. With more memory, Qwen3 Coder 30B is the local coding pick, and gpt-oss-120b is the high-end choice for 64GB+ workstations.
What is the best local model for coding?
Qwen3 Coder 30B A3B is the best local coding model: a mixture-of-experts specialist with 3B active parameters, so it generates fast on a 24GB GPU or a high-RAM Mac while staying tuned for repositories and agentic edits. Devstral 2 from Mistral is the strongest alternative, built explicitly with self-hosting in mind.
How much RAM or VRAM do I need to run models locally?
A practical rule: a 4-bit quantized model needs roughly half its parameter count in GB, so a 20B model fits in about 12GB and a 70B model needs around 40GB. Mixture-of-experts models run faster than their total size suggests because only a few billion parameters are active per token, which is why 26B to 30B MoE models are the local sweet spot.
Is running models locally cheaper than using an API?
For light use, no: cheap hosted models cost well under a dollar per million tokens, which is hard to beat after hardware costs. Local wins when you run agents continuously (no rate limits and no per-token bills), need data to stay on your machine, or already own the hardware. Always-on personal agents like Hermes and OpenClaw are the classic case where local pays off.
Models
11Filter
Open sourceUpdated
June 2026