Agents Directory
SkillsRankingsAgents
CategoriesModelsBenchmarksCompareAgent LeaderboardSkillsRankingsAgentsAbout
/Benchmarks
/Aider Polyglot
Aider Polyglot logo

Aider Polyglot

Coding

The practitioner favorite for code editing: 225 hard Exercism exercises across six languages, solved end to end through the aider tool and checked by unit tests. Higher is better.

Official source
The board has not been refreshed since November 2025, so current frontier models (ClaudeClaude Fable 5, ClaudeClaude Opus 4.8, OpenAIGPT-5.5) do not appear yet. It remains the reference for the prior generation.

Each model attempts the 225 hardest Exercism practice exercises spanning C++, Go, Java, JavaScript, Python and Rust, driving aider end to end. The model must emit changes in a structured edit format (diff, whole-file, or architect mode), solutions are checked by running each exercise's unit tests, and one retry is allowed after seeing failures: percent correct is the share of tasks passing after that second attempt. Every run also publishes its total USD cost (shown here divided by 225 as cost per task), which makes the board a clean score vs cost frontier. All runs live as YAML in the aider GitHub repo and community result PRs are accepted.

Score vs. cost
Leaderboard
#ModelScoreCost
  • 1
    OpenAIGPT-5High
    88%$0.13
  • 2
    OpenAIGPT-5Medium
    86.7%$0.08
  • 3
    OpenAIo3 ProHigh
    84.9%$0.65
  • 4
    GeminiGemini 2.5 Pro Preview 06-0532k thinking
    83.1%$0.22
  • 5
    OpenAIGPT-5Low
    81.3%$0.05
  • 6
    OpenAIo3High
    81.3%$0.09
  • 7
    GeminiGemini 2.5 Pro Preview 06-05Default thinking
    79.1%$0.20
  • 8
    GeminiGemini 2.5 Pro Preview 05-06
    76.9%$0.17
  • 9
    OpenAIo3Default
    76.9%$0.06
  • 10
    DeepSeekDeepSeek V3.2 ExpReasoner
    74.2%$0.01
  • 11
    OpenAIo4 Mini High
    72%$0.09
  • 12
    ClaudeClaude Opus 432k thinking
    72%$0.29
  • 13
    DeepSeekR1 0528
    71.4%$0.02
  • 14
    ClaudeClaude Opus 4No thinking
    70.7%$0.30
  • 15
    DeepSeekDeepSeek V3.2 ExpChat
    70.2%$0.00
  • 16
    ClaudeClaude Sonnet 432k thinking
    61.3%$0.12
  • 17
    MoonshotAIKimi K2 0711
    59.1%$0.01
Sources:
Raw leaderboard YAML (polyglot_leaderboard.yml)Aider LLM LeaderboardsPolyglot benchmark announcement
Share:
Details:
  • Category


    Coding
  • Aider logoCreated by


    Aider
  • Models tested


    11
  • Configs tested


    17
  • Leader


    OpenAIGPT-5
  • Top score


    88%

Updated November 2025

Browse:SkillsRankingsModelsBenchmarksProvidersAgentsAgent LeaderboardCompareCategories
Quick Links:AboutBlog

© 2026 Agents Directory