Hop, Skip, and Overthink: Diagnosing Why Reasoning Models Fumble during Multi-Hop Analysis Paper • 2508.04699 • Published Aug 6 • 2 • 2
SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text Paper • 2411.16077 • Published Nov 25, 2024 • 1 • 1
VLMGuard: Defending VLMs against Malicious Prompts via Unlabeled Data Paper • 2410.00296 • Published Oct 1, 2024 • 5 • 2