FuryMartin commited on
Commit
7d2962b
·
verified ·
1 Parent(s): f5ed4a8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -5
README.md CHANGED
@@ -12,7 +12,6 @@ library_name: transformers
12
 
13
  <div align="center" style="line-height:1">
14
  <a href="https://www.kimi.com" target="_blank"><img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-Kimi%20K2-ff6b6b?color=1783ff&logoColor=white"/></a>
15
- <a href="https://github.com/moonshotai/Kimi-K2"><img alt="github" src="https://img.shields.io/badge/🤖%20Github-Kimi%20K2-ff6b6b?color=1783ff&logoColor=white"/></a>
16
  <a href="https://www.moonshot.ai" target="_blank"><img alt="Homepage" src="https://img.shields.io/badge/Homepage-Moonshot%20AI-white?logo=Kimi&logoColor=white"/></a>
17
  </div>
18
 
@@ -68,7 +67,7 @@ Kimi K2 Thinking is the latest, most capable version of open-source thinking mod
68
 
69
  **Reasoning Tasks**
70
  | Benchmark | Setting | K2 Thinking | GPT-5 | Claude Sonnet 4.5<br> (Thinking) | K2 0905 | DeepSeek-V3.2 | Grok-4 |
71
- |:----------|:--------:|:------------:|:------:|:----------------------------:|:--------:|:--------------:|:-------:|
72
  | **HLE (Text-only)** | no tools | 23.9 | 26.3 | 19.8* | 7.9 | 19.8 | 25.4 |
73
  | | w/ tools | 44.9 | 41.7* | 32.0* | 21.7 | 20.3* | 41.0 |
74
  | | heavy | 51.0 | 42.0 | - | - | - | 50.7 |
@@ -83,7 +82,7 @@ Kimi K2 Thinking is the latest, most capable version of open-source thinking mod
83
 
84
  **General Tasks**
85
  | Benchmark | Setting | K2 Thinking | GPT-5 | Claude Sonnet 4.5<br> (Thinking) | K2 0905 | DeepSeek-V3.2 |
86
- |:----------|:--------:|:------------:|:------:|:----------------------------:|:--------:|:--------------:|
87
  | **MMLU-Pro** | no tools | 84.6 | 87.1 | 87.5 | 81.9 | 85.0 |
88
  | **MMLU-Redux** | no tools | 94.4 | 95.3 | 95.6 | 92.7 | 93.7 |
89
  | **Longform Writing** | no tools | 73.8 | 71.4 | 79.8 | 62.8 | 72.5 |
@@ -91,7 +90,7 @@ Kimi K2 Thinking is the latest, most capable version of open-source thinking mod
91
 
92
  **Agentic Search Tasks**
93
  | Benchmark | Setting | K2 Thinking | GPT-5 | Claude Sonnet 4.5<br> (Thinking) | K2 0905 | DeepSeek-V3.2 |
94
- |:----------|:--------:|:------------:|:------:|:----------------------------:|:--------:|:--------------:|
95
  | **BrowseComp** | w/ tools | 60.2 | 54.9 | 24.1 | 7.4 | 40.1 |
96
  | **BrowseComp-ZH** | w/ tools | 62.3 | 63.0* | 42.4* | 22.2 | 47.9 |
97
  | **Seal-0** | w/ tools | 56.3 | 51.4* | 53.4* | 25.2 | 38.5* |
@@ -100,7 +99,7 @@ Kimi K2 Thinking is the latest, most capable version of open-source thinking mod
100
 
101
  **Coding Tasks**
102
  | Benchmark | Setting | K2 Thinking | GPT-5 | Claude Sonnet 4.5<br> (Thinking) | K2 0905 | DeepSeek-V3.2 |
103
- |:----------|:--------:|:------------:|:------:|:----------------------------:|:--------:|:--------------:|
104
  | **SWE-bench Verified** | w/ tools | 71.3 | 74.9 | 77.2 | 69.2 | 67.8 |
105
  | **SWE-bench Multilingual** | w/ tools | 61.1 | 55.3* | 68.0 | 55.9 | 57.9 |
106
  | **Multi-SWE-bench** | w/ tools | 41.9 | 39.3* | 44.3 | 33.5 | 30.6 |
 
12
 
13
  <div align="center" style="line-height:1">
14
  <a href="https://www.kimi.com" target="_blank"><img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-Kimi%20K2-ff6b6b?color=1783ff&logoColor=white"/></a>
 
15
  <a href="https://www.moonshot.ai" target="_blank"><img alt="Homepage" src="https://img.shields.io/badge/Homepage-Moonshot%20AI-white?logo=Kimi&logoColor=white"/></a>
16
  </div>
17
 
 
67
 
68
  **Reasoning Tasks**
69
  | Benchmark | Setting | K2 Thinking | GPT-5 | Claude Sonnet 4.5<br> (Thinking) | K2 0905 | DeepSeek-V3.2 | Grok-4 |
70
+ |:----------:|:--------:|:------------:|:------:|:----------------------------:|:--------:|:--------------:|:-------:|
71
  | **HLE (Text-only)** | no tools | 23.9 | 26.3 | 19.8* | 7.9 | 19.8 | 25.4 |
72
  | | w/ tools | 44.9 | 41.7* | 32.0* | 21.7 | 20.3* | 41.0 |
73
  | | heavy | 51.0 | 42.0 | - | - | - | 50.7 |
 
82
 
83
  **General Tasks**
84
  | Benchmark | Setting | K2 Thinking | GPT-5 | Claude Sonnet 4.5<br> (Thinking) | K2 0905 | DeepSeek-V3.2 |
85
+ |:----------:|:--------:|:------------:|:------:|:----------------------------:|:--------:|:--------------:|
86
  | **MMLU-Pro** | no tools | 84.6 | 87.1 | 87.5 | 81.9 | 85.0 |
87
  | **MMLU-Redux** | no tools | 94.4 | 95.3 | 95.6 | 92.7 | 93.7 |
88
  | **Longform Writing** | no tools | 73.8 | 71.4 | 79.8 | 62.8 | 72.5 |
 
90
 
91
  **Agentic Search Tasks**
92
  | Benchmark | Setting | K2 Thinking | GPT-5 | Claude Sonnet 4.5<br> (Thinking) | K2 0905 | DeepSeek-V3.2 |
93
+ |:----------:|:--------:|:------------:|:------:|:----------------------------:|:--------:|:--------------:|
94
  | **BrowseComp** | w/ tools | 60.2 | 54.9 | 24.1 | 7.4 | 40.1 |
95
  | **BrowseComp-ZH** | w/ tools | 62.3 | 63.0* | 42.4* | 22.2 | 47.9 |
96
  | **Seal-0** | w/ tools | 56.3 | 51.4* | 53.4* | 25.2 | 38.5* |
 
99
 
100
  **Coding Tasks**
101
  | Benchmark | Setting | K2 Thinking | GPT-5 | Claude Sonnet 4.5<br> (Thinking) | K2 0905 | DeepSeek-V3.2 |
102
+ |:----------:|:--------:|:------------:|:------:|:----------------------------:|:--------:|:--------------:|
103
  | **SWE-bench Verified** | w/ tools | 71.3 | 74.9 | 77.2 | 69.2 | 67.8 |
104
  | **SWE-bench Multilingual** | w/ tools | 61.1 | 55.3* | 68.0 | 55.9 | 57.9 |
105
  | **Multi-SWE-bench** | w/ tools | 41.9 | 39.3* | 44.3 | 33.5 | 30.6 |