VoladorLuYu
's Collections
LLM+Self-Play RL
updated
Training Language Models to Self-Correct via Reinforcement Learning
Paper
•
2409.12917
•
Published
•
140
Recursive Introspection: Teaching Language Model Agents How to
Self-Improve
Paper
•
2407.18219
•
Published
•
3
Physics of Language Models: Part 2.2, How to Learn From Mistakes on
Grade-School Math Problems
Paper
•
2408.16293
•
Published
•
27
Selective Self-Rehearsal: A Fine-Tuning Approach to Improve
Generalization in Large Language Models
Paper
•
2409.04787
•
Published
•
1
Self-Contrast: Better Reflection Through Inconsistent Solving
Perspectives
Paper
•
2401.02009
•
Published
•
1
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Paper
•
2503.07572
•
Published
•
47
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Paper
•
2504.19162
•
Published
•
18
General-Reasoner: Advancing LLM Reasoning Across All Domains
Paper
•
2505.14652
•
Published
•
24
Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies
Paper
•
2512.19673
•
Published
•
64