Spaces:

nics-efc
/

C2C_demo

Running

C2C_demo / README.md

Update README.md

5c971e3 verified about 1 month ago

1.55 kB

	---
	title: Cache-to-Cache Communication Demo
	emoji: 🔗
	colorFrom: blue
	colorTo: blue
	sdk: gradio
	sdk_version: 5.49.1
	app_file: app.py
	pinned: false
	license: apache-2.0
	tags:
	- llm
	- cache-to-cache
	- model-communication
	- kv-cache
	short_description: Compare Single, Text-to-Text, and Cache-to-Cache inference
	thumbnail: >-
	https://cdn-uploads.huggingface.co/production/uploads/6445fd9ba56444c355dcbcba/R5YOyw0aoBENYJs8Ugnbi.png
	---
	# Cache-to-Cache Communication Demo

	This Space demonstrates Cache-to-Cache (C2C) communication between Large Language Models, comparing three inference approaches side-by-side:

	1. Single Model: Standard inference with one model
	2. Text-to-Text (T2T): Two-stage communication where Sharer model generates text → Receiver model processes text
	3. Cache-to-Cache (C2C): Direct KV-Cache communication between Sharer and Receiver

	## What is Cache-to-Cache?

	It makes language models talk without words.

	Cache-to-Cache (C2C) lets multiple LLMs communicate directly through their KV-caches instead of text, transferring deep semantics without token-by-token generation.

	The payoff: up to 10% higher accuracy, 3–5% gains over text-based communication, and 2× faster responses.

	## Citation

	```bibtex
	@article{fu2025c2c,
	title={Cache-to-Cache: Direct Semantic Communication Between Large Language Models},
	author={Tianyu Fu and Zihan Min and Hanling Zhang and Jichao Yan and Guohao Dai and Wanli Ouyang and Yu Wang},
	journal={arXiv preprint arXiv:2510.03215},
	year={2025},
	}
	```