Efficient Reasoning via Decoupled Reward Policy Optimization
Gang Li
ganglii
AI & ML interests
None yet
Recent Activity
updated
a dataset
4 days ago
ganglii/Reasoning_math_sft_len8k
published
a dataset
4 days ago
ganglii/Reasoning_math_sft_len8k
updated
a dataset
4 days ago
ganglii/Reasoning_math_sft_len16k
Organizations
None yet