Adaptive Preference Optimization with Uncertainty-aware Utility Anchor
Paper
•
2509.10515
•
Published
None defined yet.
Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning
Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation