Best open-source models for Hermes
Open sourceOpen-weight models you can inspect, fine-tune, and self-host. Ideal for privacy-sensitive or air-gapped Hermes setups. We rank a deep bench here because Hermes routes more than 380 different models in the wild, and most of its real volume goes to open weights.
As of June 2026, the best open-source model for Hermes depends on what you are optimizing for:
- Best overall: Kimi K2.6 for the strongest open-weight agentic coding, or DeepSeek V4 for frontier-class reasoning at a lower price
- Cheapest to self-host or run high-volume: gpt-oss-120b, Nemotron 3 Super, or DeepSeek V4 Flash, all open weights at rock-bottom cost
- Biggest context for large repositories: Qwen3 Coder 480B and MiMo-V2.5-Pro both carry a 1M token window
Every pick ships downloadable weights, so you can serve it yourself with Ollama, vLLM, or SGLang and keep prompts on your own machine, or call a hosted $0 endpoint on OpenRouter while you test.
The best open-source model for Hermes is Kimi K2.6, the strongest open-weight agentic coder right now at 47.6% on CursorBench 3.1, for $0.68 per million input tokens.
Every model here ships open weights you can inspect, fine-tune, and self-host, which makes them ideal for privacy-sensitive or air-gapped setups. DeepSeek V4 is the frontier-class pick, gpt-oss-120b and Nemotron 3 Super are the cheapest to host for high-volume work, and Qwen3 Coder 480B brings a 1M token context window for large repositories.
- 11Kimi K2.6Moonshot AIThe strongest open-weight agentic coder right now (47.6% on CursorBench).Context262KInput$0.66/M
- 22DeepSeek V4DeepSeekFrontier-class reasoning with open weights, cheap to host yourself.Context1.049MInput$0.435/M
- 3Context203KInput$0.98/M
- 44Qwen3 Coder 480B A35BQwenOpen-weight coding specialist sized for big repositories.Context1.049MInput$0.22/M
- 5Context1.049MInput$0.435/M
- 66MiniMax M2.7MiniMaxCompact open MoE that punches above its price on coding benchmarks.Context205KInput$0.24/M
- 77Nemotron 3 SuperNvidiaNvidia's open-weight model, very cheap to self-host for high-volume runs.Context1MInput$0.09/M
- 88Nemotron 3 UltraNvidiaNvidia's larger open-weight model, a step up from Super for harder reasoning.Context1MInput$0.5/M
- 99DeepSeek V4 FlashDeepSeekThe cheapest DeepSeek tier, same open weights for high-volume, low-stakes steps.Context1.049MInput$0.09/M
- 1010gpt-oss-120bOpenAIOpenAI's open-weight generalist, dependable for everyday code and the cheapest to host.Context131KInput$0.039/M
- 1111Qwen3 Next 80B A3B InstructQwenFast open-weight MoE for quick reasoning and routine sub-agent steps.Context262KInput$0.09/M
- 12Context1.049MInput$0.105/M
- 1313Step 3.5 FlashStepFunStepFun's fast open MoE, cheap for routine edits and sub-agents.Context262KInput$0.09/M
What is the best open-source model for Hermes?
Kimi K2.6 is the best open-weight model for Hermes, scoring 47.6% on CursorBench 3.1, the highest of any open model, at $0.68 per million input tokens. DeepSeek V4 is the next pick for frontier-class reasoning at lower cost, and Qwen3 Coder 480B is best when you need a 1M token context window for big repositories.
Which open models do Hermes users actually run?
By token volume on OpenRouter over the last 30 days, the open-weight models Hermes routes most are DeepSeek V4 (the Flash and full tiers), Nemotron 3 Super, MiniMax, Kimi K2.6, and the MiMo V2.5 line. Real-world usage leans on cheap, high-context open weights, which is why this list ranks a deep bench rather than a top three.
Is MiniMax open source?
MiniMax M2.7 ships open weights, so you can self-host it, and it sits on this list. The newer MiniMax M3 is API-only (closed weights) for now, so it appears on our general and free Hermes rankings instead of here. If MiniMax open-weights M3 later, it will move onto this list.
Why run an open-source model with Hermes?
Open-weight models let you inspect, fine-tune, and self-host the model, so no prompts or code leave your machine. That makes them the right fit for privacy-sensitive, regulated, or air-gapped Hermes deployments, and it removes per-token API cost when you host them yourself.
Can you self-host these models for Hermes?
Yes. Because the weights are open, you can serve any of these through Ollama, vLLM, or SGLang and point Hermes at the local OpenAI-compatible endpoint. Smaller models run on a single GPU or an Apple Silicon Mac; the largest, like Qwen3 Coder 480B, need serious hardware or a hosted endpoint such as OpenRouter.
Open-source or free: which should you pick for Hermes?
Free models cost $0 to call but run on someone else's servers with rate limits. Open-source models ship downloadable weights you can host yourself with no limits and full privacy, though hosting has its own cost. Some models are both. See our best free models for Hermes ranking for the zero-API-cost options.
Best cheap models for Hermes
Best free models for Hermes
Best models for Hermes
Agent
HermesModels
13Filter
Open sourceUpdated
June 2026