Canvas-to-Image: Compositional Image Generation with Multimodal Controls Paper • 2511.21691 • Published Nov 26 • 35
MAPS: Advancing Multi-Modal Reasoning in Expert-Level Physical Science Paper • 2501.10768 • Published Jan 18
Glyph: Scaling Context Windows via Visual-Text Compression Paper • 2510.17800 • Published Oct 20 • 67