Agents Directory
SkillsRankingsAgents
CategoriesModelsBenchmarksCompareAgent LeaderboardSkillsRankingsAgentsAbout
/Rankings
/Best models to run with Ollama

Best models to run with Ollama

Open source

Running models locally with Ollama means no API keys, no rate limits, and no data leaving your machine. These are the best open-weight models to run locally in 2026, grouped by the hardware you actually have, from a 16GB laptop to a multi-GPU server.

Which model should you run?

As of June 2026, pick by the hardware you have:

  • 16GB laptop: OpenAIgpt-oss-20b for general use, GeminiGemma 4 26B A4B if you want speed
  • 24GB GPU or 32GB Mac: Qwen3 Coder 30B for code, GeminiGemma 4 31B for vision, QwenQwen3 32B as the all-rounder
  • 48GB setup: Llama 3.3 70B or Devstral 2 for coding
  • 64GB+ workstation: OpenAIgpt-oss-120b, the closest local gets to hosted quality
  • Edge devices and instant replies: LFM2.5 1.2B

All of these also have hosted endpoints if you want to test quality before downloading the weights.

The best model to run with Ollama on typical hardware is OpenAIgpt-oss-20b, OpenAI's small open-weight model: it is comfortable on a 16GB machine with quantization and handles everyday chat and code well.

For local coding, Qwen3 Coder 30B A3B is the pick: a mixture-of-experts coder with only 3B active parameters, so it stays fast on prosumer hardware. With a 24GB GPU or a high-RAM Mac, GeminiGemma 4 31B adds local vision, and QwenQwen3 32B is the dense all-rounder. If you have a 64GB+ workstation, OpenAIgpt-oss-120b is where local quality starts to rival hosted APIs.

Rule of thumb: a quantized model needs roughly half its parameter count in GB of memory, and MoE models with small active parameter counts run much faster than their size suggests.

We also have more in-depth rankings:Best open-source AI models
The ranking
Updated June 2026
Best on a 16GB machine

What most laptops can run: small MoE models with quantization, surprisingly capable for everyday tasks.

#ModelContextInput
  • 1
    1OpenAI
    gpt-oss-20bOpenAIThe best local default: OpenAI's small open model, comfortable on 16GB machines.
    Context131K
    Input$0.029/M
  • 2
    2Gemini
    Gemma 4 26B A4BGoogle DeepMindGoogle's efficient MoE: only 4B active parameters, so it runs fast locally.
    Context262K
    Input$0.06/M
  • 3
    3Nvidia
    Nemotron 3 Nano 30B A3BNvidiaNvidia's nano MoE, tuned for efficient local inference.
    Context262K
    Input$0.05/M
  • 4
    4Liquid
    LFM2.5-1.2B-InstructLiquid AILiquid AI's tiny model for edge devices and instant responses.
    Context33K
    InputFree
Best on a 24-48GB GPU

The prosumer sweet spot: real coding and agent work, fully local, on a single beefy GPU or a high-RAM Mac.

#ModelContextInput
  • 1
    1Qwen
    Qwen3 Coder 30B A3B InstructQwenThe local coding pick: a fast MoE coder that fits prosumer hardware.
    Context160K
    Input$0.07/M
  • 2
    2Qwen
    Qwen3 30B A3B Instruct 2507QwenAll-round MoE with 3B active parameters, a favorite for local agents.
    Context131K
    Input$0.048/M
  • 3
    3Gemini
    Gemma 4 31BGoogle DeepMindGoogle's open vision model: reads images locally on a 24GB card.
    Context262K
    Input$0.12/M
  • 4
    4Qwen
    Qwen3 32BQwenA solid dense all-rounder for 24GB GPUs.
    Context131K
    Input$0.08/M
  • 5
    5Mistral
    Devstral 2 2512Mistral AIMistral's open coding model, built with self-hosting in mind.
    Context262K
    Input$0.4/M
Best on a workstation or server

For 64GB+ of memory: quality that starts to rival hosted APIs.

#ModelContextInput
  • 1
    1OpenAI
    gpt-oss-120bOpenAIThe high-end local pick: flagship-class quality on a 64GB+ workstation.
    Context131K
    Input$0.039/M
  • 2
    2Meta
    Llama 3.3 70B InstructMetaMeta's proven 70B, still a dependable choice for 48GB setups.
    Context131K
    Input$0.1/M
Frequently asked questions
What is the best model to run with Ollama in 2026?

OpenAIgpt-oss-20b is the best local model for most people: OpenAI's small open-weight release runs comfortably on a 16GB machine and handles everyday chat and code. With more memory, Qwen3 Coder 30B is the local coding pick, and OpenAIgpt-oss-120b is the high-end choice for 64GB+ workstations.

What is the best local model for coding?

Qwen3 Coder 30B A3B is the best local coding model: a mixture-of-experts specialist with 3B active parameters, so it generates fast on a 24GB GPU or a high-RAM Mac while staying tuned for repositories and agentic edits. Devstral 2 from Mistral is the strongest alternative, built explicitly with self-hosting in mind.

How much RAM or VRAM do I need to run models locally?

A practical rule: a 4-bit quantized model needs roughly half its parameter count in GB, so a 20B model fits in about 12GB and a 70B model needs around 40GB. Mixture-of-experts models run faster than their total size suggests because only a few billion parameters are active per token, which is why 26B to 30B MoE models are the local sweet spot.

Is running models locally cheaper than using an API?

For light use, no: cheap hosted models cost well under a dollar per million tokens, which is hard to beat after hardware costs. Local wins when you run agents continuously (no rate limits and no per-token bills), need data to stay on your machine, or already own the hardware. Always-on personal agents like Hermes logoHermes and OpenClaw logoOpenClaw are the classic case where local pays off.

Share:
Details:
  • Models


    11
  • Filter


    Open source
  • Updated


    June 2026
Browse:SkillsRankingsModelsBenchmarksProvidersAgentsAgent LeaderboardCompareCategories
Quick Links:AboutBlog

© 2026 Agents Directory