Agents Directory

Skills Rankings Agents

Categories Models Benchmarks Compare Agent Leaderboard Skills Rankings Agents About

/FrontierCode Main

FrontierCode Main

Coding

Cognition's test of whether a model writes code maintainers would actually merge, not just code that passes tests. Main is the 100 hardest of 150 tasks. Higher is better.

Official source

FrontierCode's 150 tasks were hand-selected by 20+ open source maintainers from 36 flagship repositories, then graded on behavioral correctness, regression safety, scope discipline, test quality, and codebase conventions. Main is the 100 hardest tasks. Score is each trial's weighted rubric value, counted only once it clears every blocking criterion (else 0), averaged over the tasks.

Leaderboard

#ModelScoreProvider

1
Claude Fable 5
46.3%Anthropic
2
Claude Opus 4.8
34.3%Anthropic
3
GPT-5.5
25.5%OpenAI
4
Claude Opus 4.7
23%Anthropic
5
GPT-5.4 Mini
17.8%OpenAI
6
Gemini 3.1 Pro Preview
16.7%Google DeepMind
7
Kimi K2.6
16%Moonshot AI
8
Claude Sonnet 4.6
15.1%Anthropic
9
Kimi K2.5
6.9%Moonshot AI
10
MiniMax M2.7
6%MiniMax
11
MiniMax M2.5
5.3%MiniMax
12
Gemini 3.1 Flash Lite
4.8%Google DeepMind

Sources:

FrontierCode (Cognition)

Share:

Details:

Category
Coding
Created by
Cognition
Models tested
12
Leader
Claude Fable 5
Top score
46.3%

Updated June 2026

Browse:Skills Rankings Models Benchmarks Providers Agents Agent Leaderboard Compare Categories

Quick Links:About Blog

© 2026 Agents Directory