-
Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning
Paper • 2507.17512 • Published • 36 -
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains
Paper • 2511.04962 • Published • 54 -
10 Open Challenges Steering the Future of Vision-Language-Action Models
Paper • 2511.05936 • Published • 5
Collections
Discover the best community collections!
Collections including paper arxiv:2507.17512
-
S*: Test Time Scaling for Code Generation
Paper • 2502.14382 • Published • 63 -
o1-Coder: an o1 Replication for Coding
Paper • 2412.00154 • Published • 44 -
Competitive Programming with Large Reasoning Models
Paper • 2502.06807 • Published • 68 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization
Paper • 2503.16874 • Published • 44 -
System Prompt Optimization with Meta-Learning
Paper • 2505.09666 • Published • 71 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 -
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning
Paper • 2505.23754 • Published • 15
-
Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning
Paper • 2507.17512 • Published • 36 -
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains
Paper • 2511.04962 • Published • 54 -
10 Open Challenges Steering the Future of Vision-Language-Action Models
Paper • 2511.05936 • Published • 5
-
S*: Test Time Scaling for Code Generation
Paper • 2502.14382 • Published • 63 -
o1-Coder: an o1 Replication for Coding
Paper • 2412.00154 • Published • 44 -
Competitive Programming with Large Reasoning Models
Paper • 2502.06807 • Published • 68 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
-
MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization
Paper • 2503.16874 • Published • 44 -
System Prompt Optimization with Meta-Learning
Paper • 2505.09666 • Published • 71 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 -
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning
Paper • 2505.23754 • Published • 15
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4