moonshotai
/

Kimi-K2-Thinking

@@ -12,7 +12,6 @@ library_name: transformers
 <div align="center" style="line-height:1">
   <a href="https://www.kimi.com" target="_blank"><img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-Kimi%20K2-ff6b6b?color=1783ff&logoColor=white"/></a>
-  <a href="https://github.com/moonshotai/Kimi-K2"><img alt="github" src="https://img.shields.io/badge/🤖%20Github-Kimi%20K2-ff6b6b?color=1783ff&logoColor=white"/></a>
   <a href="https://www.moonshot.ai" target="_blank"><img alt="Homepage" src="https://img.shields.io/badge/Homepage-Moonshot%20AI-white?logo=Kimi&logoColor=white"/></a>
 </div>
@@ -68,7 +67,7 @@ Kimi K2 Thinking is the latest, most capable version of open-source thinking mod
 **Reasoning Tasks**
 | Benchmark | Setting | K2 Thinking | GPT-5 | Claude Sonnet 4.5<br> (Thinking) | K2 0905 | DeepSeek-V3.2 | Grok-4 |
-|:----------|:--------:|:------------:|:------:|:----------------------------:|:--------:|:--------------:|:-------:|
 | **HLE (Text-only)** | no tools | 23.9 | 26.3 | 19.8* | 7.9 | 19.8 | 25.4 |
 | | w/ tools | 44.9 | 41.7* | 32.0* | 21.7 | 20.3* | 41.0 |
 | | heavy | 51.0 | 42.0 | - | - | - | 50.7 |
@@ -83,7 +82,7 @@ Kimi K2 Thinking is the latest, most capable version of open-source thinking mod
 **General Tasks**
 | Benchmark | Setting | K2 Thinking | GPT-5 | Claude Sonnet 4.5<br> (Thinking) | K2 0905 | DeepSeek-V3.2 |
-|:----------|:--------:|:------------:|:------:|:----------------------------:|:--------:|:--------------:|
 | **MMLU-Pro** | no tools | 84.6 | 87.1 | 87.5 | 81.9 | 85.0 |
 | **MMLU-Redux** | no tools | 94.4 | 95.3 | 95.6 | 92.7 | 93.7 |
 | **Longform Writing** | no tools | 73.8 | 71.4 | 79.8 | 62.8 | 72.5 |
@@ -91,7 +90,7 @@ Kimi K2 Thinking is the latest, most capable version of open-source thinking mod
 **Agentic Search Tasks**
 | Benchmark | Setting | K2 Thinking | GPT-5 | Claude Sonnet 4.5<br> (Thinking) | K2 0905 | DeepSeek-V3.2 |
-|:----------|:--------:|:------------:|:------:|:----------------------------:|:--------:|:--------------:|
 | **BrowseComp** | w/ tools | 60.2 | 54.9 | 24.1 | 7.4 | 40.1 |
 | **BrowseComp-ZH** | w/ tools | 62.3 | 63.0* | 42.4* | 22.2 | 47.9 |
 | **Seal-0** | w/ tools | 56.3 | 51.4* | 53.4* | 25.2 | 38.5* |
@@ -100,7 +99,7 @@ Kimi K2 Thinking is the latest, most capable version of open-source thinking mod
 **Coding Tasks**
 | Benchmark | Setting | K2 Thinking | GPT-5 | Claude Sonnet 4.5<br> (Thinking) | K2 0905 | DeepSeek-V3.2 |
-|:----------|:--------:|:------------:|:------:|:----------------------------:|:--------:|:--------------:|
 | **SWE-bench Verified** | w/ tools | 71.3 | 74.9 | 77.2 | 69.2 | 67.8 |
 | **SWE-bench Multilingual** | w/ tools | 61.1 | 55.3* | 68.0 | 55.9 | 57.9 |
 | **Multi-SWE-bench** | w/ tools | 41.9 | 39.3* | 44.3 | 33.5 | 30.6 |

 <div align="center" style="line-height:1">
   <a href="https://www.kimi.com" target="_blank"><img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-Kimi%20K2-ff6b6b?color=1783ff&logoColor=white"/></a>
   <a href="https://www.moonshot.ai" target="_blank"><img alt="Homepage" src="https://img.shields.io/badge/Homepage-Moonshot%20AI-white?logo=Kimi&logoColor=white"/></a>
 </div>
 **Reasoning Tasks**
 | Benchmark | Setting | K2 Thinking | GPT-5 | Claude Sonnet 4.5<br> (Thinking) | K2 0905 | DeepSeek-V3.2 | Grok-4 |
+|:----------:|:--------:|:------------:|:------:|:----------------------------:|:--------:|:--------------:|:-------:|
 | **HLE (Text-only)** | no tools | 23.9 | 26.3 | 19.8* | 7.9 | 19.8 | 25.4 |
 | | w/ tools | 44.9 | 41.7* | 32.0* | 21.7 | 20.3* | 41.0 |
 | | heavy | 51.0 | 42.0 | - | - | - | 50.7 |
 **General Tasks**
 | Benchmark | Setting | K2 Thinking | GPT-5 | Claude Sonnet 4.5<br> (Thinking) | K2 0905 | DeepSeek-V3.2 |
+|:----------:|:--------:|:------------:|:------:|:----------------------------:|:--------:|:--------------:|
 | **MMLU-Pro** | no tools | 84.6 | 87.1 | 87.5 | 81.9 | 85.0 |
 | **MMLU-Redux** | no tools | 94.4 | 95.3 | 95.6 | 92.7 | 93.7 |
 | **Longform Writing** | no tools | 73.8 | 71.4 | 79.8 | 62.8 | 72.5 |
 **Agentic Search Tasks**
 | Benchmark | Setting | K2 Thinking | GPT-5 | Claude Sonnet 4.5<br> (Thinking) | K2 0905 | DeepSeek-V3.2 |
+|:----------:|:--------:|:------------:|:------:|:----------------------------:|:--------:|:--------------:|
 | **BrowseComp** | w/ tools | 60.2 | 54.9 | 24.1 | 7.4 | 40.1 |
 | **BrowseComp-ZH** | w/ tools | 62.3 | 63.0* | 42.4* | 22.2 | 47.9 |
 | **Seal-0** | w/ tools | 56.3 | 51.4* | 53.4* | 25.2 | 38.5* |
 **Coding Tasks**
 | Benchmark | Setting | K2 Thinking | GPT-5 | Claude Sonnet 4.5<br> (Thinking) | K2 0905 | DeepSeek-V3.2 |
+|:----------:|:--------:|:------------:|:------:|:----------------------------:|:--------:|:--------------:|
 | **SWE-bench Verified** | w/ tools | 71.3 | 74.9 | 77.2 | 69.2 | 67.8 |
 | **SWE-bench Multilingual** | w/ tools | 61.1 | 55.3* | 68.0 | 55.9 | 57.9 |
 | **Multi-SWE-bench** | w/ tools | 41.9 | 39.3* | 44.3 | 33.5 | 30.6 |