| ANTHROPIC | OPENAI | MOONSHOT | Z.AI | ALIBABA | DEEPSEEK | MINIMAX | xAI | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Fable 5(Haz 26) |
Mythos Preview(Nis 26) |
Opus 4.8(May 26) |
Opus 4.7(Nis 26) |
Opus 4.6(Şub 26) |
Opus 4.5(Kas 25) |
Sonnet 4.6(Şub 26) |
GPT 5.5(Nis 26) |
GPT 5.4(Mar 26) |
Gemini 3.5 Flash(May 26) |
Gemini 3.1 Pro(Şub 26) |
Gemini 3 Pro(Kas 25) |
Kimi K2.6(Nis 26) |
GLM 5.2(Haz 26) |
GLM 5.1(Mar 26) |
Qwen 3.7 Max(May 26) |
Qwen 3.6 Plus(Nis 26) |
V4-Pro(Nis 26) | V3.2(Ara 25) | V3(Ara 24) | M3(Haz 26) | Grok 4.3(Nis 26) |
Grok 4.20(Şub 26) |
Grok 4(Tem 25) |
|
| YAZILIM GELİŞTİRME / KODLAMA | ||||||||||||||||||||||||
| SWE-bench Verified | 95.5% | 93.9% | 88.6% | 87.6% | 80.8% | 80.9% | 79.6% | — | 77.2% | — | 80.6% | 76.2% | — | — | — | — | 78.8% | 80.6% | — | 42.0% | 80.5% | — | — | 72.0% |
| SWE-bench Pro | 80.3% | 77.8% | 69.2% | 64.3% | 53.4% | — | — | 58.6% | 57.7% | 55.1% | 54.2% | 43.3% | — | 62.1% | — | — | 56.6% | 55.4% | — | — | 59.0% | — | — | — |
| SWE-bench Multilingual | — | 87.3% | 84.4% | 80.5% | 77.5% | — | — | — | — | — | — | — | — | — | — | — | 73.8% | 76.2% | — | — | — | — | — | — |
| SWE-bench Multimodal | — | 59.0% | 38.4% | 34.5% | 27.1% | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| Terminal-Bench 2.0 | — | 82.0% | — | 69.4% | 65.4% | 59.3% | 59.1% | 82.7% | 75.1% | — | 68.5% | 56.9% | — | — | — | — | 61.6% | 67.9% | 46.4% | — | — | — | — | — |
| Terminal-Bench 2.1 | 88.0% | — | 82.7% | 66.1% | — | — | — | 83.4% | — | 76.2% | 70.7% | — | — | — | — | — | — | — | — | — | 66.0% | — | — | — |
| Terminal-Bench Hard | 62.9% | — | 58.3% | 51.5% | 46.2% | 47.0% | 53.0% | 60.6% | 57.6% | 40.9% | 53.8% | 41.7% | 43.9% | 50.8% | 43.2% | 50.8% | 43.9% | 46.2% | 35.6% | 6.8% | 42.4% | 37.9% | 37.9% | 37.9% |
| FrontierCode (Diamond) | 29.3% | — | 13.4% | — | — | — | — | 5.7% | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| LiveCodeBench | — | — | — | — | 88.8% | — | — | — | — | — | 91.7% | — | — | — | — | — | 87.1% | 93.5% | 49.2% | 40.5% | — | — | — | 79.0% |
| Codeforces (Elo) | — | — | — | — | — | — | — | — | 3168 | — | 3052 | — | — | — | — | — | — | 3206 | — | — | — | — | — | — |
| GENEL BİLGİ & AKIL YÜRÜTME | ||||||||||||||||||||||||
| ▸ Fen & Bilim | ||||||||||||||||||||||||
| GPQA Diamond | 92.6% | — | 92.0% | 91.4% | 89.6% | 86.6% | 87.5% | 93.5% | 92.0% | 92.2% | 94.1% | 90.8% | 91.1% | 89.5% | 86.8% | 92.3% | 88.2% | 88.8% | 84.0% | 55.7% | 92.9% | 90.1% | 91.1% | 87.7% |
| CritPt | 28.6% | — | 20.9% | 12.0% | 12.6% | 4.6% | 3.1% | 27.1% | 23.4% | 13.1% | 17.7% | 9.1% | 8.0% | 20.9% | 4.6% | 13.4% | 2.9% | 12.9% | 2.9% | 0.0% | 3.7% | 8.0% | 6.6% | 2.0% |
| SciCode | 60.2% | — | 53.5% | 54.5% | 51.9% | 49.5% | 46.8% | 56.1% | 56.6% | 53.1% | 58.9% | 56.1% | 53.5% | 50.5% | 43.8% | 48.8% | 40.7% | 50.0% | 38.9% | 35.4% | 45.4% | 47.3% | 45.6% | 45.7% |
| Humanity's Last Exam | 53.3% | — | 45.7% | 39.6% | 36.7% | 28.4% | 30.0% | 44.3% | 41.6% | 41.0% | 44.7% | 37.2% | 35.9% | 40.1% | 28.0% | 38.1% | 25.7% | 35.9% | 22.2% | 3.6% | 37.1% | 35.0% | 32.2% | 23.9% |
| Blueprint-Bench 2 | 38.6% | — | 14.5% | 24.5% | — | — | 6.7% | 36.2% | — | 33.6% | 26.5% | — | — | — | — | — | — | — | — | — | — | — | — | — |
| ▸ Sağlık & Biyomedikal | ||||||||||||||||||||||||
| HealthBench | 62.7% | 61.1% | 59.3% | — | — | — | — | 56.5% | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| HealthBench Professional | 66.0% | 64.7% | 56.9% | — | — | — | — | 51.8% | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| BioMysteryBench (insan) | 83.9% | 82.6% | 80.4% | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| BioMysteryBench (zor) | 46.1% | 29.6% | 40.0% | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| ▸ Bilgi & Gerçeklik | ||||||||||||||||||||||||
| MMLU-Pro | — | — | — | — | 89.1% | — | — | — | 87.5% | — | 91.0% | — | — | — | — | — | 88.5% | 87.5% | 81.2% | 75.9% | — | — | — | — |
| MMMLU (Çok Dilli) | — | 92.7% | — | 91.5% | 91.1% | 90.8% | 89.3% | — | — | — | 92.6% | 91.8% | — | — | — | — | 89.5% | — | — | — | — | — | — | — |
| SimpleQA-Verified | — | — | — | — | 46.2% | — | — | — | 45.3% | — | 75.6% | — | — | — | — | — | — | 57.9% | — | 24.9% | — | — | — | — |
| Chinese-SimpleQA | — | — | — | — | 76.2% | — | — | — | 76.8% | — | 85.9% | — | — | — | — | — | — | 84.4% | — | 64.8% | — | — | — | — |
| IFBench | 63.5% | — | 62.2% | 58.6% | 53.1% | 58.0% | 56.6% | 75.9% | 73.9% | 76.3% | 77.1% | 70.4% | 76.0% | 73.3% | 76.3% | 80.5% | 75.2% | 76.5% | 60.7% | 34.8% | 82.9% | 81.3% | 81.2% | 53.7% |
| ▸ Soyut Akıl Yürütme (ARC-AGI) | ||||||||||||||||||||||||
| ARC-AGI-1 | — | — | 92.0% | 92.0% | 93.0% | 80.0% | 86.0% | 95.0% | 93.7% | 92.5% | 98.0% | 75.0% | — | — | — | — | — | — | 57.0% | — | — | — | 89.5% | 66.6% |
| ARC-AGI-2 | — | — | 72.1% | 75.8% | 68.8% | 37.6% | 58.3% | 85.0% | 74.0% | 72.1% | 77.1% | 31.1% | — | — | — | — | — | — | 4.0% | — | — | — | 65.1% | 15.9% |
| ARC-AGI-3 | — | — | 1.5% | 0.2% | 0.5% | — | — | 0.4% | 0.2% | — | 0.4% | — | — | — | — | — | — | — | — | — | — | — | 0.1% | — |
| ▸ Çok Biçimli & Görsel | ||||||||||||||||||||||||
| MMMU-Pro (Çok Modal) | — | — | — | — | 73.9% | — | 74.5% | 81.2% | 81.2% | 83.6% | 80.5% | 81.0% | — | — | — | — | — | — | — | — | 78.1% | — | — | — |
| CharXiv Reasoning (araçsız) | 88.9% | 86.1% | — | 82.1% | 69.1% | — | — | — | — | 84.2% | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| CharXiv Reasoning (araçlı) | 93.5% | 93.2% | — | 91.0% | 84.7% | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| ChartQAPro (araçsız) | — | — | 69.4% | 67.6% | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| ChartQAPro (araçlı) | — | — | 72.3% | 69.8% | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| MATEMATİK | ||||||||||||||||||||||||
| AIME 2025 | — | — | — | — | 99.8% | — | — | — | — | — | 98.1% | 95.0% | — | — | — | — | — | — | 96.0% | — | — | — | — | 91.7% |
| AIME 2026 | — | — | — | 95.8% | 96.7% | 95.1% | — | 97.5% | 99.2% | — | 98.3% | 91.7% | 95.8% | 99.2% | 95.8% | — | 95.3% | 95.8% | 94.2% | — | — | — | — | — |
| USAMO 2026 | — | 97.6% | — | — | 66.2% | — | — | — | 95.2% | — | 74.4% | — | — | — | — | — | — | — | — | — | — | — | — | — |
| HMMT 2026 Feb | — | — | — | — | 96.2% | — | — | — | 97.7% | — | 94.7% | — | — | 92.5% | — | — | 87.8% | 95.2% | — | — | — | — | — | — |
| IMOAnswerBench | — | — | — | — | 75.3% | — | — | — | 91.4% | — | 81.0% | — | — | 91.0% | — | — | 83.8% | 89.8% | — | — | — | — | — | — |
| Apex | — | — | — | — | 34.5% | — | — | — | 54.1% | — | 60.9% | 18.4% | — | — | — | — | — | 38.3% | — | — | — | — | — | — |
| Apex Shortlist | — | — | — | — | 85.9% | — | — | — | 78.1% | — | 89.1% | — | — | — | — | — | — | 90.2% | — | — | — | — | — | — |
| ArxivMath | 78.5% | 68.7% | 71.8% | — | — | — | — | 71.5% | — | — | 64.8% | — | — | — | — | — | — | — | — | — | — | — | — | — |
| RiemannBench | 55.0% | 43.0% | 34.0% | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| UZUN BAĞLAM (1M Token) | ||||||||||||||||||||||||
| MRCR 1M | — | — | — | — | 92.9% | — | — | — | — | 26.6% | 76.3% | — | — | — | — | — | — | 83.5% | — | — | — | — | — | — |
| CorpusQA 1M | — | — | — | — | 71.7% | — | — | — | — | — | 53.8% | — | — | — | — | — | — | 62.0% | — | — | — | — | — | — |
| GraphWalks BFS 256K | 91.1% | 85.7% | 85.9% | 76.9% | 38.7% | — | — | 73.7% | 21.4% | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| GraphWalks Parents 256K | 99.96% | 99.9% | 99.3% | 93.6% | — | — | — | 90.1% | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| EYLEMCİ / BİLGİSAYAR KULLANIMI | ||||||||||||||||||||||||
| ▸ Tarayıcı & Bilgisayar Kullanımı | ||||||||||||||||||||||||
| BrowseComp | 88.0% | 87.9% | 84.3% | 79.8% | 83.7% | — | 74.7% | 84.4% | 82.7% | — | 85.9% | 59.2% | — | — | — | — | — | 83.4% | — | — | 83.5% | — | — | — |
| OSWorld-Verified | 85.0% | 85.4% | 83.4% | 82.8% | 72.7% | 66.3% | 72.5% | 78.7% | 75.0% | 78.4% | 76.2% | — | — | — | — | — | — | — | — | — | 70.1% | — | — | — |
| ScreenSpot-Pro (araçsız) | — | — | 82.3% | 79.5% | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| ScreenSpot-Pro (araçlı) | — | — | 87.9% | 87.6% | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| Automation Bench | 17.4% | — | 15.5% | 9.9% | — | — | — | 12.9% | — | 14.5% | 9.6% | — | — | — | — | — | — | — | — | — | — | — | — | — |
| ▸ Araç & Protokol Kullanımı | ||||||||||||||||||||||||
| MCP-Atlas | — | — | 82.2% | 79.1% | 75.8% | 62.3% | 61.3% | 75.3% | 68.1% | 83.6% | 78.2% | 54.1% | — | 76.8% | — | — | 74.1% | 73.6% | — | — | 74.2% | — | — | — |
| Toolathlon | — | — | — | — | 47.2% | — | — | 55.6% | 54.6% | 56.5% | 48.8% | — | — | — | — | — | — | 51.8% | — | — | — | — | — | — |
| τ2-bench (Retail) | — | — | — | — | 91.9% | 88.9% | 91.7% | — | — | — | 90.8% | 85.3% | — | — | — | — | — | — | — | — | — | — | — | — |
| τ2-bench (Telecom) | — | — | — | — | 99.3% | 98.2% | 97.9% | 98.0% | 98.9% | — | 99.3% | 98.0% | — | — | — | — | — | — | — | — | — | — | — | — |
| ▸ Alan Eylemcileri (Finans/Hukuk/Ofis) | ||||||||||||||||||||||||
| Finance Agent v1.1 | — | — | — | 64.4% | 60.1% | — | — | 60.0% | 61.5% | — | 59.7% | — | — | — | — | — | — | — | — | — | — | — | — | — |
| Finance Agent v2 | — | — | 53.9% | 51.5% | — | — | — | 51.8% | — | 57.9% | 43.0% | — | — | — | — | — | — | — | — | — | — | — | — | — |
| Legal Agent (Harvey seti) | 13.3% | — | 10.4% | — | — | — | — | 2.1% | — | 0.8% | 0.0% | — | — | — | — | — | — | — | — | — | — | — | — | — |
| Legal Agent (açık set) | 16.9% | 13.4% | 9.6% | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| OfficeQA Pro | 57.9% | — | 48.1% | — | — | — | — | 52.6% | — | — | 18.1% | — | — | — | — | — | — | — | — | — | — | — | — | — |
| Vending-Bench 2 ($) | $5.680 | — | $5.787 | $10.937 | $8.018 | $4.967 | $7.204 | $7.524 | $6.144 | $5.396 | $911 | $5.478 | $6.205 | $8.314 | $5.634 | — | $5.115 | $3.285 | $1.034 | — | — | — | $4.663 | — |
| ▸ Belge & İktisadi Görevler | ||||||||||||||||||||||||
| HLE (araçlı) | 64.5% | 64.7% | 57.9% | 54.7% | 53.3% | — | 49.0% | 52.2% | 58.7% | — | 51.4% | — | — | — | — | — | 50.6% | 48.2% | — | — | — | — | — | 44.4% |
| GDPval-AA (Elo) | 1932 | — | 1890 | 1753 | 1619 | 1450 | 1676 | 1769 | 1674 | 1656 | 1314 | 1184 | 1481 | 1524 | 1535 | 1300 | 1352 | 1554 | 1197 | 409 | 1670 | 1098 | 1168 | 990 |
| GDP.pdf (görsel) | 29.8% | — | 22.5% | — | — | — | — | 24.9% | — | — | 16.7% | — | — | — | — | — | — | — | — | — | — | — | — | — |
| SİBER GÜVENLİK | ||||||||||||||||||||||||
| CyberGym | — | 83.1% | — | 73.1% | 73.8% | — | — | 81.8% | 66.3% | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| ExploitBench | 78.0% | 69.0% | 40.0% | — | — | — | — | 34.0% | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |