LLM+Self-Play RL - a VoladorLuYu Collection

VoladorLuYu 's Collections

Research on LLM

Generative Multiple Modality

Super Alignment

Foundation Machine Learning

Graph Foundation Multimodal Models

Symbolic LLM Reasoning

Data-efficient LLMs

Understanding LLM

synthetic code generation

Diffusion Models

LLM+Architecture

LLM+Self-Play RL

LLM+Self-Play RL

updated Dec 29, 2025

Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published Sep 19, 2024 • 140
Recursive Introspection: Teaching Language Model Agents How to Self-Improve

Paper • 2407.18219 • Published Jul 25, 2024 • 3
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems

Paper • 2408.16293 • Published Aug 29, 2024 • 27
Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models

Paper • 2409.04787 • Published Sep 7, 2024 • 1
Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives

Paper • 2401.02009 • Published Jan 4, 2024 • 1
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Paper • 2503.07572 • Published Mar 10, 2025 • 47
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning

Paper • 2504.19162 • Published Apr 27, 2025 • 18
General-Reasoner: Advancing LLM Reasoning Across All Domains

Paper • 2505.14652 • Published May 20, 2025 • 24
Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies

Paper • 2512.19673 • Published Dec 22, 2025 • 64