Spaces:
Running
Running
| title: Cache-to-Cache Communication Demo | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: 5.49.1 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| tags: | |
| - llm | |
| - cache-to-cache | |
| - model-communication | |
| - kv-cache | |
| short_description: Compare Single, Text-to-Text, and Cache-to-Cache inference | |
| thumbnail: >- | |
| https://cdn-uploads.huggingface.co/production/uploads/6445fd9ba56444c355dcbcba/R5YOyw0aoBENYJs8Ugnbi.png | |
| # Cache-to-Cache Communication Demo | |
| This Space demonstrates **Cache-to-Cache (C2C)** communication between Large Language Models, comparing three inference approaches side-by-side: | |
| 1. **Single Model**: Standard inference with one model | |
| 2. **Text-to-Text (T2T)**: Two-stage communication where Sharer model generates text β Receiver model processes text | |
| 3. **Cache-to-Cache (C2C)**: Direct KV-Cache communication between Sharer and Receiver | |
| ## What is Cache-to-Cache? | |
| It makes language models talk without words. | |
| Cache-to-Cache (C2C) lets multiple LLMs communicate directly through their KV-caches instead of text, transferring deep semantics without token-by-token generation. | |
| The payoff: up to 10% higher accuracy, 3β5% gains over text-based communication, and 2Γ faster responses. | |
| ## Citation | |
| ```bibtex | |
| @article{fu2025c2c, | |
| title={Cache-to-Cache: Direct Semantic Communication Between Large Language Models}, | |
| author={Tianyu Fu and Zihan Min and Hanling Zhang and Jichao Yan and Guohao Dai and Wanli Ouyang and Yu Wang}, | |
| journal={arXiv preprint arXiv:2510.03215}, | |
| year={2025}, | |
| } | |
| ``` |