KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs
Abstract
KV-Embedding enables training-free representation learning from frozen LLMs by utilizing key-value states for enhanced context access and automated layer selection.
While LLMs are powerful embedding backbones, their application in training-free settings faces two structural challenges: causal attention restricts early tokens from accessing subsequent context, and the next-token prediction objective biases representations toward generation rather than semantic compression. To address these limitations, we propose KV-Embedding, a framework that activates the latent representation power of frozen LLMs. Our method leverages the observation that the key-value (KV) states of the final token at each layer encode a compressed view of the sequence. By re-routing these states as a prepended prefix, we enable all tokens to access sequence-level context within a single forward pass. To ensure model-agnostic applicability, we introduce an automated layer selection strategy based on intrinsic dimensionality. Evaluations on MTEB across Qwen, Mistral, and Llama backbones show that KV-Embedding outperforms existing training-free baselines by up to 10%, while maintaining robust performance on sequences up to 4,096 tokens. These results demonstrate that internal state manipulation offers an efficient alternative to input modification, and we hope this work encourages further exploration of LLM internals for representation learning.
Community
✨ Turn any decoder-only LLM into a powerful embedding model—zero training needed!
✨ The Trick: Re-route the final token's key-value states as an internal prefix, giving all tokens access to global context in one forward pass. No input modification, no mask removal, just smart internal state manipulation.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Hierarchical Token Prepending: Enhancing Information Flow in Decoder-based LLM Embeddings (2025)
- C2LLM Technical Report: A New Frontier in Code Retrieval via Adaptive Cross-Attention Pooling (2025)
- SR-KI: Scalable and Real-Time Knowledge Integration into LLMs via Supervised Attention (2025)
- Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches (2025)
- Unlocking the Address Book: Dissecting the Sparse Semantic Structure of LLM Key-Value Caches via Sparse Autoencoders (2025)
- Learning to Compress: Unlocking the Potential of Large Language Models for Text Representation (2025)
- ReMatch: Boosting Representation through Matching for Multimodal Retrieval (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper