DeepSeek R1
radar chart — all benchmarks
family — deepseek-r
benchmark scores Scores from Mar 2026
| Categories | ||
|---|---|---|
| MATH / AIME | 97.3% | reasoning |
| HumanEval / MBPP | 92.0% | coding |
| MMLU | 90.8% | general |
| GPQA (Diamond) | 71.5% | reasoning |
| Chatbot Arena (LMSYS) | 1358 | general |
| LiveCodeBench | 65.9% | coding |
| SWE-bench | 49.2% | agenticcoding |
| TAU-bench | — | agenticmultiagent |
| GAIA | — | agenticmultiagent |
| WebArena | — | agentic |
| MT-Bench | — | general |
| AgentBench | — | multiagent |
| IFEval | — | generalagentic |
| SimpleQA | — | general |
pricing — per 1M tokens via openrouter
Data unavailable
latency percentiles — time to first token (ms)
Data unavailable
model specifications
Context window —
Max output tokens —
Input modalities
—
Output modalities
—
Supports reasoning —
Supports tool use —