Agents Directory
SkillsRankingsAgents
CategoriesModelsBenchmarksCompareAgent LeaderboardSkillsRankingsAgentsAbout
/Benchmarks
/BrowseComp
OpenAI

BrowseComp

Web

OpenAI's hard web-browsing benchmark: 1,266 questions whose answers are hard to find but easy to verify, requiring persistent multi-step browsing. Higher is better.

Official source

BrowseComp questions are built with an inverted design: trainers start from a known fact and write a question whose answer does not appear on first-page search results and that another person cannot solve within ten minutes. Answers are short and graded by a model checking semantic equivalence against the reference. The benchmark is essentially unsolvable without tools (GPT-4o scored 1.9% with browsing at launch, and human trainers solved only 29.2% within a two-hour limit), so all rows here are with browsing and tools enabled, single-agent configurations only. Multi-agent harnesses score higher and are excluded to keep rows comparable; there is no single official leaderboard, so scores come from vendor model cards compiled by aggregators.

Leaderboard
#ModelScoreProvider
  • 1
    OpenAIGPT-5.5 ProBrowsing, parallel compute
    90.1%OpenAI
  • 2
    OpenAIGPT-5.4 ProBrowsing
    89.3%OpenAI
  • 3
    ClaudeClaude Fable 5Single agent, web search
    86.9%Anthropic
  • 4
    GeminiGemini 3.1 Pro PreviewSearch + Python + Browse
    85.9%Google DeepMind
  • 5
    OpenAIGPT-5.5Browsing
    84.4%OpenAI
  • 6
    ClaudeClaude Opus 4.8Single agent, web search
    84.3%Anthropic
  • 7
    ClaudeClaude Opus 4.6Max thinking, tools
    84%Anthropic
  • 8
    MinimaxMiniMax-M3Browsing
    83.5%MiniMax
  • 9
    DeepSeekDeepSeek V4Pro, max thinking, browsing
    83.4%DeepSeek
  • 10
    MoonshotAIKimi K2.6Single agent, tools
    83.2%Moonshot AI
  • 11
    OpenAIGPT-5.4Browsing
    82.7%OpenAI
  • 12
    GLM 5.1 logoGLM 5.1Browsing
    79.3%Z.AI
  • 13
    ClaudeClaude Opus 4.7Adaptive thinking, web search
    79.3%Anthropic
  • 14
    OpenAIGPT-5.2 ProBrowsing
    77.9%OpenAI
  • 15
    GLM 5 logoGLM 5Browsing
    75.9%Z.AI
  • 16
    ClaudeClaude Sonnet 4.6Max thinking, tools
    74.7%Anthropic
  • 17
    OpenAIGPT-5.2xHigh, tools
    65.8%OpenAI
  • 18
    MoonshotAIKimi K2 ThinkingTools
    60.2%Moonshot AI
  • 19
    GeminiGemini 3 ProSearch + Python + Browse
    59.2%Google DeepMind
Sources:
OpenAI: BrowseComp announcementBrowseComp paper (arXiv 2504.12516)openai/simple-evals (dataset + grader)
Share:
Details:
  • Category


    Web
  • OpenAICreated by


    OpenAI
  • Models tested


    19
  • Leader


    OpenAIGPT-5.5 Pro
  • Top score


    90.1%

Updated June 2026

Browse:SkillsRankingsModelsBenchmarksProvidersAgentsAgent LeaderboardCompareCategories
Quick Links:AboutBlog

© 2026 Agents Directory