SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models Paper • 2510.09541 • Published Oct 10 • 14
Large Reasoning Models Learn Better Alignment from Flawed Thinking Paper • 2510.00938 • Published Oct 1 • 58
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks Paper • 2503.15478 • Published Mar 19 • 13
liuhaotian/llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5 Text Generation • Updated Oct 5, 2023 • 76 • 22
LLM-Rec: Personalized Recommendation via Prompting Large Language Models Paper • 2307.15780 • Published Jul 24, 2023 • 27