Boundary-Guided Policy Optimization for Memory-Efficient RL of Diffusion Large Language Models
-
THU-KEG/LLaDA-8B-BGPO-math
Reinforcement Learning • 8B • Updated • 11 • 1 -
THU-KEG/LLaDA-8B-BGPO-code
Reinforcement Learning • 8B • Updated • 15 • 1 -
THU-KEG/LLaDA-8B-BGPO-countdown
Reinforcement Learning • 8B • Updated • 6 • 1 -
THU-KEG/LLaDA-8B-BGPO-sudoku
Reinforcement Learning • 8B • Updated • 8 • 1