-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 124 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
Collections
Discover the best community collections!
Collections including paper arxiv:2507.19849
-
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Paper • 2507.19457 • Published • 30 -
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 158 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 318 -
Cache-to-Cache: Direct Semantic Communication Between Large Language Models
Paper • 2510.03215 • Published • 98
-
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
Paper • 2508.13167 • Published • 129 -
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper • 2512.01374 • Published • 105 -
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning
Paper • 2511.16043 • Published • 109 -
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 106
-
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
Paper • 2507.21183 • Published • 15 -
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
Paper • 2507.21802 • Published • 19 -
EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity
Paper • 2507.21848 • Published • 9 -
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 158
-
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 158 -
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance
Paper • 2507.22448 • Published • 70 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 214 -
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
Paper • 2508.21113 • Published • 110
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 124 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
Paper • 2508.13167 • Published • 129 -
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper • 2512.01374 • Published • 105 -
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning
Paper • 2511.16043 • Published • 109 -
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 106
-
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
Paper • 2507.21183 • Published • 15 -
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
Paper • 2507.21802 • Published • 19 -
EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity
Paper • 2507.21848 • Published • 9 -
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 158
-
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Paper • 2507.19457 • Published • 30 -
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 158 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 318 -
Cache-to-Cache: Direct Semantic Communication Between Large Language Models
Paper • 2510.03215 • Published • 98
-
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 158 -
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance
Paper • 2507.22448 • Published • 70 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 214 -
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
Paper • 2508.21113 • Published • 110