Best open-source models for Hermes

Open source

Open-weight models you can inspect, fine-tune, and self-host. Ideal for privacy-sensitive or air-gapped Hermes setups. We rank a deep bench here because Hermes routes more than 380 different models in the wild, and most of its real volume goes to open weights.

Which model should you run?

As of June 2026, the best open-source model for Hermes depends on what you are optimizing for:

Best overall: Kimi K2.6 for the strongest open-weight agentic coding, or DeepSeek V4 for frontier-class reasoning at a lower price
Cheapest to self-host or run high-volume: gpt-oss-120b, Nemotron 3 Super, or DeepSeek V4 Flash, all open weights at rock-bottom cost
Biggest context for large repositories: Qwen3 Coder 480B and MiMo-V2.5-Pro both carry a 1M token window

Every pick ships downloadable weights, so you can serve it yourself with Ollama, vLLM, or SGLang and keep prompts on your own machine, or call a hosted $0 endpoint on OpenRouter while you test.

The best open-source model for Hermes is Kimi K2.6, the strongest open-weight agentic coder right now at 47.6% on CursorBench 3.1, for $0.68 per million input tokens.

Every model here ships open weights you can inspect, fine-tune, and self-host, which makes them ideal for privacy-sensitive or air-gapped setups. DeepSeek V4 is the frontier-class pick, gpt-oss-120b and Nemotron 3 Super are the cheapest to host for high-volume work, and Qwen3 Coder 480B brings a 1M token context window for large repositories.

We also have more in-depth Hermes rankings:Best cheap models for HermesBest free models for HermesBest models for Hermes

The ranking

Updated June 2026

#ModelContextInput

1
1
Kimi K2.6Moonshot AIThe strongest open-weight agentic coder right now (47.6% on CursorBench).
Context262K
Input$0.66/M
2
2
DeepSeek V4DeepSeekFrontier-class reasoning with open weights, cheap to host yourself.
Context1.049M
Input$0.435/M
3
3
GLM 5.1Z.AIZhipu's open-weight flagship, strong at long agentic sessions.
Context203K
Input$0.98/M
4
4
Qwen3 Coder 480B A35BQwenOpen-weight coding specialist sized for big repositories.
Context1.049M
Input$0.22/M
5
5M
MiMo-V2.5-ProXiaomiXiaomi's open-weight MoE, a capable all-rounder for long agent sessions.
Context1.049M
Input$0.435/M
6
6
MiniMax M2.7MiniMaxCompact open MoE that punches above its price on coding benchmarks.
Context205K
Input$0.24/M
7
7
Nemotron 3 SuperNvidiaNvidia's open-weight model, very cheap to self-host for high-volume runs.
Context1M
Input$0.09/M
8
8
Nemotron 3 UltraNvidiaNvidia's larger open-weight model, a step up from Super for harder reasoning.
Context1M
Input$0.5/M
9
9
DeepSeek V4 FlashDeepSeekThe cheapest DeepSeek tier, same open weights for high-volume, low-stakes steps.
Context1.049M
Input$0.09/M
10
10
gpt-oss-120bOpenAIOpenAI's open-weight generalist, dependable for everyday code and the cheapest to host.
Context131K
Input$0.039/M
11
11
Qwen3 Next 80B A3B InstructQwenFast open-weight MoE for quick reasoning and routine sub-agent steps.
Context262K
Input$0.09/M
12
12M
MiMo-V2.5XiaomiThe cheaper MiMo tier, same open weights for routine high-volume work.
Context1.049M
Input$0.105/M
13
13
Step 3.5 FlashStepFunStepFun's fast open MoE, cheap for routine edits and sub-agents.
Context262K
Input$0.09/M

Frequently asked questions

What is the best open-source model for Hermes?

Kimi K2.6 is the best open-weight model for Hermes, scoring 47.6% on CursorBench 3.1, the highest of any open model, at $0.68 per million input tokens. DeepSeek V4 is the next pick for frontier-class reasoning at lower cost, and Qwen3 Coder 480B is best when you need a 1M token context window for big repositories.

Which open models do Hermes users actually run?

By token volume on OpenRouter over the last 30 days, the open-weight models Hermes routes most are DeepSeek V4 (the Flash and full tiers), Nemotron 3 Super, MiniMax, Kimi K2.6, and the MiMo V2.5 line. Real-world usage leans on cheap, high-context open weights, which is why this list ranks a deep bench rather than a top three.

Is MiniMax open source?

MiniMax M2.7 ships open weights, so you can self-host it, and it sits on this list. The newer MiniMax M3 is API-only (closed weights) for now, so it appears on our general and free Hermes rankings instead of here. If MiniMax open-weights M3 later, it will move onto this list.

Why run an open-source model with Hermes?

Open-weight models let you inspect, fine-tune, and self-host the model, so no prompts or code leave your machine. That makes them the right fit for privacy-sensitive, regulated, or air-gapped Hermes deployments, and it removes per-token API cost when you host them yourself.

Can you self-host these models for Hermes?

Yes. Because the weights are open, you can serve any of these through Ollama, vLLM, or SGLang and point Hermes at the local OpenAI-compatible endpoint. Smaller models run on a single GPU or an Apple Silicon Mac; the largest, like Qwen3 Coder 480B, need serious hardware or a hosted endpoint such as OpenRouter.

Open-source or free: which should you pick for Hermes?

Free models cost $0 to call but run on someone else's servers with rate limits. Open-source models ship downloadable weights you can host yourself with no limits and full privacy, though hosting has its own cost. Some models are both. See our best free models for Hermes ranking for the zero-API-cost options.

More rankings for Hermes

Share:

Details:

Agent
Hermes
Models
13
Filter
Open source
Updated
June 2026