Agents Directory
SkillsRankingsAgents
CategoriesModelsBenchmarksCompareAgent LeaderboardSkillsRankingsAgentsAbout
/Agents Directory Blog
/How Claude Fable 5 ranks on benchmarks
  • CursorBench 3.1CursorBench 3.1
  • Artificial Analysis Intelligence IndexArtificial Analysis Intelligence Index
  • Anthropic's reported numbersAnthropic's reported numbers
  • The Mythos caveatThe Mythos caveat
  • Bottom lineBottom line
  • SourcesSources

How Claude Fable 5 ranks on benchmarks

Anthropic's new Mythos-class model tops CursorBench 3.1 and posts the strongest agentic-coding scores reported so far. The numbers, with the one caveat that matters.

Jun 10, 2026•3 min read•Written byAgents Directory's profileAgents Directory@agentsdir

AnthropicAnthropic released ClaudeClaude Fable 5 on June 9. It's the company's first Mythos-class model, priced at $10 in / $50 out per million tokens with a 1M-token context window and built for long-running autonomous work. Here is where it lands, sourced from Anthropic's announcement and the independent CursorBench leaderboard.

CursorBench 3.1

CursorCursorBench evaluates models on ambiguous, multi-file tasks taken from real Cursor sessions. It's the closest thing we have to a production agentic-coding benchmark.

  • Fable 5 high (default): 70.6% at $10.81 per task, more than 7 points clear of every other default configuration.
  • Fable 5 Max: 72.9%, the top score on the whole leaderboard.
  • Next-best defaults: Cursor's Composer 2.5 at 63.2% ($0.55 per task, the value outlier), GPT-5.5 high at 62.6%, Claude Opus 4.8 high at 58.4%.

Full interactive leaderboard on our CursorBench page.

Artificial Analysis Intelligence Index

Artificial Analysis logoArtificial Analysis publishes a composite 0-100 intelligence score that blends knowledge, reasoning, math, coding, and agentic evaluations. It is the most widely cited all-up benchmark outside vendor tables.

  • Fable 5 (default): 64.9, the top score in our catalog, about 7 points above Claude Opus 4.7 (57.3) and Gemini 3.1 Pro Preview (57.2).
  • Next in this cut: Qwen3.7 Max (56.6), Gemini 3.5 Flash (55.3), MiniMax-M3 (54.7), Grok 4.3 high (53.2).

Full interactive leaderboard on our Intelligence Index page.

Anthropic's reported numbers

From the announcement, against Claude Opus 4.8, GPT-5.5 and Gemini 3.1 Pro, with the best score per row highlighted:

Claude Mythos 5 / Fable 5Claude Mythos PreviewClaude Opus 4.8GPT-5.5Gemini 3.1 Pro
Agentic coding
SWE-Bench Pro
80.3%77.8%69.2%58.6%54.2%
Agentic coding
FrontierCode (Diamond), xhigh
29.3%—13.4%5.7%—
Agentic coding
Terminal-Bench 2.1
88.0%*—82.7%83.4%
Codex CLI
70.7%
Gemini CLI
Knowledge work
GDPval-AA
1932—189017691314
Knowledge work vision
GDP.pdf, no tools
29.8%—22.5%24.9%16.7%
Spatial reasoning
Blueprint-Bench 2
38.6%—14.5%36.2%26.5%
Tool use
AutomationBench
17.4%—15.5%12.9%9.6%
Computer use
OSWorld-Verified
85.0%85.4%83.4%78.7%76.2%
Legal
Legal Agent Benchmark
13.3%—10.4%2.1%0.0%
Multidisciplinary reasoning
Humanity's Last Exam, no tools
59.0%*56.8%49.8%41.4%44.4%
Multidisciplinary reasoning
Humanity's Last Exam, with tools
64.5%*64.7%57.9%52.2%51.4%
Biology
BioMysteryBench, hard
46.1%*29.6%40.0%——
Biology
BioMysteryBench, human solved
83.9%*82.6%80.4%——
Cybersecurity
ExploitBench (Cap)
78.0%*69.0%40.0%34.0%—
Health
HealthBench Professional
66.0%*64.7%56.9%51.8%—

Anthropic reports the higher score of Claude Mythos 5 and Claude Fable 5; the two land within 1-3 points of each other except on starred (*) benchmarks. See the Mythos caveat below.

The Mythos caveat

Anthropic reports the higher score of two models: Claude Mythos 5 (the identical model with safeguards lifted, restricted to vetted researchers) and the generally available Fable 5. The two land within 1-3 points of each other except on starred (*) benchmarks, where Fable 5's safeguards redirect cybersecurity and biology queries to Opus 4.8 (under 5% of sessions). On those, Fable 5's effective score sits closer to Opus 4.8.

Bottom line

At twice Opus 4.8's price, Fable 5 is not the default for everything. But on long-horizon agentic coding it is currently the strongest model available, and the CursorBench cost curve shows the premium buying capability, not just tokens. Pricing, host availability, and sources on the Claude Fable 5 model page.

Sources

  • Anthropic: Introducing Claude Fable 5 and Claude Mythos 5 for the announcement, pricing, and the reported benchmark table
  • CursorBench leaderboard for independent agentic-coding scores and per-task cost
  • CursorBench 3.1 on Agents Directory for the live leaderboard we keep updated
  • Artificial Analysis Intelligence Index for the composite intelligence methodology
  • Intelligence Index on Agents Directory for the live leaderboard we keep updated
  • Claude Fable 5 model page for pricing, context window, and host availability
Share:
Browse:SkillsRankingsModelsBenchmarksProvidersAgentsAgent LeaderboardCompareCategories
Quick Links:AboutBlog

© 2026 Agents Directory