-
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 187 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 143 -
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
Paper • 2505.22617 • Published • 131
melon
jellyisadog
·
AI & ML interests
None yet
Recent Activity
liked
a dataset
3 days ago
AI-MO/NuminaMath-CoT
liked
a dataset
4 days ago
dltdojo/ecommerce-faq-chatbot-dataset
liked
a dataset
4 days ago
bitext/Bitext-customer-support-llm-chatbot-training-dataset