FrontierCode Diamond

Name: FrontierCode Diamond leaderboard
Creator: Cognition

Coding

The 50 hardest FrontierCode tasks: the toughest production-code problems, graded on whether maintainers would merge the patch. Scores stay low by design. Higher is better.

Official source

Claude Fable 5's Diamond score is still pending from Cognition, so it does not yet appear on this board.

Diamond is the 50 most difficult FrontierCode tasks, the hardest production-code problems from real open source repositories. They use the same maintainer-merge rubric with hard blocking criteria (correctness, regression safety, scope). Score is the gated weighted rubric value, counted only once a trial clears every blocker (else 0), averaged over the tasks. As the toughest agentic-coding measure on the board, scores stay low.

Leaderboard

#ModelScoreProvider