CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion Paper • 2512.19535 • Published 3 days ago • 7
Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery Paper • 2407.14499 • Published Jul 19, 2024
Good Teachers Explain: Explanation-Enhanced Knowledge Distillation Paper • 2402.03119 • Published Feb 5, 2024
Better Understanding Differences in Attribution Methods via Systematic Evaluations Paper • 2303.11884 • Published Mar 21, 2023
MoshiVis v0.1 Collection MoshiVis is a Vision Speech Model built as a perceptually-augmented version of Moshi v0.1 for conversing about image inputs • 9 items • Updated 2 days ago • 22