| ANTHROPIC | OPENAI | MOONSHOT | Z.AI | ALIBABA | DEEPSEEK | xAI | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mythos Preview(Nis 26 önizleme) |
Opus 4.7(16 Nis 26) |
Opus 4.6(5 Şub 26) |
Opus 4.5(24 Kas 25) |
Sonnet 4.6(17 Şub 26) |
GPT 5.5(23 Nis 26) |
GPT 5.4(5 Mar 26) |
Gemini 3.1 Pro(19 Şub 26) |
Gemini 3 Pro(18 Kas 25) |
Kimi K2.6(13 Nis 26) |
GLM 5.1(27 Mar 26) |
Qwen 3.6 Plus(2 Nis 26) |
DeepSeek V4-Pro(24 Nis 26) |
DeepSeek V3.2(1 Ara 25) |
DeepSeek V3(26 Ara 24) |
Grok 4.20(17 Şub 26) |
Grok 4(Tem 25) |
|
| YAZILIM GELİŞTİRME / KODLAMA | |||||||||||||||||
| SWE-bench Verified | 93.9% | 87.6% | 80.8% | 80.9% | 79.6% | — | 77.2% | 80.6% | 76.2% | — | — | 78.8% | 80.6% | ~74% | 42.0% | — | 72.0% |
| SWE-bench Pro | 77.8% | 64.3% | 53.4% | — | — | 58.6% | 57.7% | 54.2% | 43.3% | — | — | 56.6% | 55.4% | — | — | — | — |
| SWE-bench Multilingual | 87.3% | — | 77.5% | — | — | — | — | — | — | — | — | 73.8% | 76.2% | — | — | — | — |
| SWE-bench Multimodal | 59.0% | — | 27.1% | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| Terminal-Bench 2.0 | 82.0% | 69.4% | 65.4% | 59.3% | 59.1% | 82.7% | 75.1% | 68.5% | 56.9% | — | — | 61.6% | 67.9% | 46.4% | — | — | — |
| LiveCodeBench | — | — | 88.8% | — | — | — | — | 91.7% | — | — | — | 87.1% | 93.5% | 49.2% | 40.5% | — | 79.0% |
| Codeforces (Elo) | — | — | — | — | — | — | 3168 | 3052 | — | — | — | — | 3206 | — | — | — | — |
| GENEL BİLGİ & AKIL YÜRÜTME | |||||||||||||||||
| MMLU-Pro | — | — | 89.1% | — | — | — | 87.5% | 91.0% | — | — | — | 88.5% | 87.5% | 81.2% | 75.9% | — | — |
| GPQA Diamond | 94.6% | 94.2% | 91.3% | 87.0% | 89.9% | 93.6% | 94.4% | 94.3% | 91.9% | — | — | 90.4% | 90.1% | 82.4% | 59.1% | — | 87.5% |
| Humanity's Last Exam | 56.8% | 46.9% | 40.0% | — | 33.2% | 41.4% | 42.7% | 44.4% | 37.5% | — | — | 28.8% | 37.7% | — | — | — | 25.0% |
| SimpleQA-Verified | — | — | 46.2% | — | — | — | 45.3% | 75.6% | — | — | — | — | 57.9% | — | 24.9% | — | — |
| Chinese-SimpleQA | — | — | 76.2% | — | — | — | 76.8% | 85.9% | — | — | — | — | 84.4% | — | 64.8% | — | — |
| ARC-AGI-1 | — | 92.0% | 94.0% | 80.0% | 86.5% | 95.0% | 93.7% | 98.0% | 75.0% | — | — | — | — | 57.0% | — | 89.5% | 66.6% |
| ARC-AGI-2 | — | 75.8% | 68.8% | 37.6% | 58.3% | 85.0% | 74.0% | 77.1% | 31.1% | — | — | — | — | 4.0% | — | 65.1% | 15.9% |
| ARC-AGI-3 | — | — | 0.5% | — | — | — | 0.2% | 0.4% | — | — | — | — | — | — | — | 0.1% | — |
| MMMLU (Çok Dilli) | — | 91.5% | 91.1% | 90.8% | 89.3% | — | — | 92.6% | 91.8% | — | — | 89.5% | — | — | — | — | — |
| MMMU-Pro (Çok Modal) | — | — | 73.9% | — | 74.5% | 81.2% | 81.2% | 80.5% | 81.0% | — | — | — | — | — | — | — | — |
| SciCode | — | — | 52.0% | — | 47.0% | — | — | 59.0% | 56.0% | — | — | — | — | — | — | — | — |
| CharXiv Reasoning (araçsız) | 86.1% | 82.1% | 69.1% | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| CharXiv Reasoning (araçlı) | 93.2% | 91.0% | 84.7% | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| MATEMATİK | |||||||||||||||||
| AIME 2025 | — | — | 99.8% | — | — | — | — | 98.1% | 95.0% | — | — | — | — | 96.0% | — | — | 91.7% |
| AIME 2026 | — | 95.8% | 96.7% | 95.1% | — | 97.5% | 99.2% | 98.3% | 91.7% | 95.8% | 95.8% | 95.3% | 95.8% | 94.2% | — | — | — |
| USAMO 2026 | — | — | 47.0% | — | — | — | 95.2% | 74.4% | — | — | — | — | — | — | — | — | — |
| HMMT 2026 Feb | — | — | 96.2% | — | — | — | 97.7% | 94.7% | — | — | — | 87.8% | 95.2% | — | — | — | — |
| IMOAnswerBench | — | — | 75.3% | — | — | — | 91.4% | 81.0% | — | — | — | 83.8% | 89.8% | — | — | — | — |
| Apex | — | — | 34.5% | — | — | — | 54.1% | 60.9% | 18.4% | — | — | — | 38.3% | — | — | — | — |
| Apex Shortlist | — | — | 85.9% | — | — | — | 78.1% | 89.1% | — | — | — | — | 90.2% | — | — | — | — |
| UZUN BAĞLAM (1M Token) | |||||||||||||||||
| MRCR 1M | — | — | 92.9% | — | — | — | — | 76.3% | — | — | — | — | 83.5% | — | — | — | — |
| CorpusQA 1M | — | — | 71.7% | — | — | — | — | 53.8% | — | — | — | — | 62.0% | — | — | — | — |
| EYLEMCİ / BİLGİSAYAR KULLANIMI | |||||||||||||||||
| BrowseComp | 86.9% | 79.3% | 83.7% | — | 74.7% | 84.4% | 82.7% | 85.9% | 59.2% | — | — | — | 83.4% | — | — | — | — |
| OSWorld-Verified | 79.6% | 78.0% | 72.7% | 66.3% | 72.5% | 78.7% | 75.0% | — | — | — | — | — | — | — | — | — | — |
| HLE (araçlı) | 64.7% | 54.7% | 53.3% | — | 49.0% | 52.2% | 58.7% | 51.4% | — | — | — | 50.6% | 48.2% | — | — | — | 44.4% |
| GDPval-AA (Elo) | — | — | 1619 | — | 1633 | — | 1674 | 1317 | 1195 | — | — | — | 1554 | — | — | — | — |
| MCP-Atlas | — | 77.3% | 75.8% | 62.3% | 61.3% | 75.3% | 68.1% | 69.2% | 54.1% | — | — | 74.1% | 73.6% | — | — | — | — |
| Toolathlon | — | — | 47.2% | — | — | 55.6% | 54.6% | 48.8% | — | — | — | — | 51.8% | — | — | — | — |
| Finance Agent v1.1 | — | 64.4% | 60.1% | — | — | 60.0% | 61.5% | 59.7% | — | — | — | — | — | — | — | — | — |
| τ2-bench (Retail) | — | — | 91.9% | 88.9% | 91.7% | — | — | 90.8% | 85.3% | — | — | — | — | — | — | — | — |
| τ2-bench (Telecom) | — | — | 99.3% | 98.2% | 97.9% | 98.0% | 98.9% | 99.3% | 98.0% | — | — | — | — | — | — | — | — |
| Vending-Bench 2 ($) | — | $10.937 | $8.018 | $4.967 | $7.204 | $7.524 | $6.144 | $911 | $5.478 | $6.205 | $5.634 | $5.115 | $3.285 | $1.034 | — | $4.663 | — |
| SİBER GÜVENLİK | |||||||||||||||||
| CyberGym | 83.1% | 73.1% | 73.8% | — | — | 81.8% | 66.3% | — | — | — | — | — | — | — | — | — | — |