Collections
Discover the best community collections!
Collections including paper arxiv:2509.02547
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 627 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 301 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 315 -
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Paper • 2509.03867 • Published • 210
-
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
Paper • 2508.13167 • Published • 129 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 225 -
PRewrite: Prompt Rewriting with Reinforcement Learning
Paper • 2401.08189 • Published -
UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning
Paper • 2509.11543 • Published • 47
-
Universal Deep Research: Bring Your Own Model and Strategy
Paper • 2509.00244 • Published • 13 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 225 -
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
Paper • 2510.00515 • Published • 39 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 140
-
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 225 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 53 -
Scaling Agents via Continual Pre-training
Paper • 2509.13310 • Published • 117 -
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 121
-
Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR
Paper • 2509.02522 • Published • 25 -
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Paper • 2508.20751 • Published • 89 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 225 -
Towards a Unified View of Large Language Model Post-Training
Paper • 2509.04419 • Published • 75
-
Universal Deep Research: Bring Your Own Model and Strategy
Paper • 2509.00244 • Published • 13 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 225 -
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
Paper • 2510.00515 • Published • 39 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 140
-
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 225 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 53 -
Scaling Agents via Continual Pre-training
Paper • 2509.13310 • Published • 117 -
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 121
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 627 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 301 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 315 -
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Paper • 2509.03867 • Published • 210
-
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
Paper • 2508.13167 • Published • 129 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 225 -
PRewrite: Prompt Rewriting with Reinforcement Learning
Paper • 2401.08189 • Published -
UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning
Paper • 2509.11543 • Published • 47
-
Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR
Paper • 2509.02522 • Published • 25 -
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Paper • 2508.20751 • Published • 89 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 225 -
Towards a Unified View of Large Language Model Post-Training
Paper • 2509.04419 • Published • 75