o4-mini

OpenAI Best for: coding
Compare this model →
Categories
MATH / AIME 97.5%
reasoning
HumanEval / MBPP 97.3%
coding
MMLU 90.0%
general
LiveCodeBench 80.2%
coding
GPQA (Diamond) 77.6%
reasoning
Chatbot Arena (LMSYS) 1391
general
SWE-bench 68.1%
agenticcoding
SimpleQA 20.2%
general
TAU-bench
agenticmultiagent
GAIA
agenticmultiagent
WebArena
agentic
MT-Bench
general
AgentBench
multiagent
IFEval
generalagentic

Data unavailable

Data unavailable

Data unavailable