Liger Kernel: Efficient Triton Kernels for LLM Training Paper • 2410.10989 • Published Oct 14, 2024 • 1
AlphaPO -- Reward shape matters for LLM alignment Paper • 2501.03884 • Published Jan 7, 2025 • 2
LLaDA-MedV: Exploring Large Language Diffusion Models for Biomedical Image Understanding Paper • 2508.01617 • Published Aug 3, 2025
LLaDA-MedV: Exploring Large Language Diffusion Models for Biomedical Image Understanding Paper • 2508.01617 • Published Aug 3, 2025
Local2Global query Alignment for Video Instance Segmentation Paper • 2507.20120 • Published Jul 27, 2025
Reasoning Models Can be Accurately Pruned Via Chain-of-Thought Reconstruction Paper • 2509.12464 • Published Sep 15, 2025
Reasoning Models Can be Accurately Pruned Via Chain-of-Thought Reconstruction Paper • 2509.12464 • Published Sep 15, 2025
Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs Paper • 2509.25779 • Published Sep 30, 2025 • 18
Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs Paper • 2509.25779 • Published Sep 30, 2025 • 18
AlphaPO -- Reward shape matters for LLM alignment Paper • 2501.03884 • Published Jan 7, 2025 • 2
Online Difficulty Filtering for Reasoning Oriented Reinforcement Learning Paper • 2504.03380 • Published Apr 4, 2025
When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research Paper • 2505.11855 • Published May 17, 2025 • 10
Stable Language Model Pre-training by Reducing Embedding Variability Paper • 2409.07787 • Published Sep 12, 2024
Cross-lingual Transfer of Reward Models in Multilingual Alignment Paper • 2410.18027 • Published Oct 23, 2024
Margin-aware Preference Optimization for Aligning Diffusion Models without Reference Paper • 2406.06424 • Published Jun 10, 2024 • 15