Agents Directory
SkillsRankingsAgents
CategoriesModelsBenchmarksCompareAgent LeaderboardSkillsRankingsAgentsAbout
/Benchmarks
/FrontierCode Main
FrontierCode Main logo

FrontierCode Main

Coding

Cognition's test of whether a model writes code maintainers would actually merge, not just code that passes tests. Main is the 100 hardest of 150 tasks. Higher is better.

Official source

FrontierCode's 150 tasks were hand-selected by 20+ open source maintainers from 36 flagship repositories, then graded on behavioral correctness, regression safety, scope discipline, test quality, and codebase conventions. Main is the 100 hardest tasks. Score is each trial's weighted rubric value, counted only once it clears every blocking criterion (else 0), averaged over the tasks.

Leaderboard
#ModelScoreProvider
  • 1
    ClaudeClaude Fable 5
    46.3%Anthropic
  • 2
    ClaudeClaude Opus 4.8
    34.3%Anthropic
  • 3
    OpenAIGPT-5.5
    25.5%OpenAI
  • 4
    ClaudeClaude Opus 4.7
    23%Anthropic
  • 5
    OpenAIGPT-5.4 Mini
    17.8%OpenAI
  • 6
    GeminiGemini 3.1 Pro Preview
    16.7%Google DeepMind
  • 7
    MoonshotAIKimi K2.6
    16%Moonshot AI
  • 8
    ClaudeClaude Sonnet 4.6
    15.1%Anthropic
  • 9
    MoonshotAIKimi K2.5
    6.9%Moonshot AI
  • 10
    MinimaxMiniMax M2.7
    6%MiniMax
  • 11
    MinimaxMiniMax M2.5
    5.3%MiniMax
  • 12
    GeminiGemini 3.1 Flash Lite
    4.8%Google DeepMind
Sources:
FrontierCode (Cognition)
Share:
Details:
  • Category


    Coding
  • Cognition logoCreated by


    Cognition
  • Models tested


    12
  • Leader


    ClaudeClaude Fable 5
  • Top score


    46.3%

Updated June 2026

Browse:SkillsRankingsModelsBenchmarksProvidersAgentsAgent LeaderboardCompareCategories
Quick Links:AboutBlog

© 2026 Agents Directory