Agents Directory
SkillsRankingsAgents
CategoriesModelsBenchmarksCompareAgent LeaderboardSkillsRankingsAgentsAbout
/Benchmarks
/SWE-Lancer (IC Diamond)
OpenAI

SWE-Lancer (IC Diamond)

Coding

OpenAI's freelance-work benchmark: real Upwork software tasks with real dollar payouts, scored as the share of the task pool's value a model earns. Higher is better.

Official source
Only OpenAI models have published SWE-Lancer scores so far; Anthropic, Google and open-weight vendors do not report it, so this board is narrow until third parties run it.

SWE-Lancer contains over 1,400 real Upwork freelance tasks worth 1 million dollars in actual client payouts, from 50 dollar bug fixes to 32,000 dollar feature builds, mostly from the Expensify open source app. The public Diamond split is worth 500,800 dollars: individual-contributor tasks graded by triple-verified end-to-end tests run with internet disabled, plus manager tasks graded against the choices of the original hired engineering managers. A model earns a task's dollar value only if it fully passes, and the leaderboard normalizes that to percent of the pool earned on the IC Diamond subset.

Leaderboard
#ModelScoreProvider
  • 1
    OpenAIGPT-5.3-Codex
    81.4%OpenAI
  • 2
    OpenAIGPT-5.2
    74.6%OpenAI
  • 3
    OpenAIo3 Mini
    7.4%OpenAI
Sources:
OpenAI: Introducing SWE-LancerSWE-Lancer paper (arXiv 2502.12115)openai/preparedness (SWELancer)
Share:
Details:
  • Category


    Coding
  • OpenAICreated by


    OpenAI
  • Models tested


    3
  • Leader


    OpenAIGPT-5.3-Codex
  • Top score


    81.4%

Updated June 2026

Browse:SkillsRankingsModelsBenchmarksProvidersAgentsAgent LeaderboardCompareCategories
Quick Links:AboutBlog

© 2026 Agents Directory