-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 345 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
Collections
Discover the best community collections!
Collections including paper arxiv:2512.19673
-
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 114 -
KlingAvatar 2.0 Technical Report
Paper • 2512.13313 • Published • 40 -
SemanticGen: Video Generation in Semantic Space
Paper • 2512.20619 • Published • 88 -
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper • 2512.16676 • Published • 199
-
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 140 -
Recursive Introspection: Teaching Language Model Agents How to Self-Improve
Paper • 2407.18219 • Published • 3 -
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
Paper • 2408.16293 • Published • 27 -
Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models
Paper • 2409.04787 • Published • 1
-
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper • 2512.16676 • Published • 199 -
Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies
Paper • 2512.19673 • Published • 60 -
Region-Constraint In-Context Generation for Instructional Video Editing
Paper • 2512.17650 • Published • 49 -
SpatialTree: How Spatial Abilities Branch Out in MLLMs
Paper • 2512.20617 • Published • 42
-
ARE: Scaling Up Agent Environments and Evaluations
Paper • 2509.17158 • Published • 35 -
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation
Paper • 2510.08551 • Published • 33 -
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
Paper • 2510.04212 • Published • 23 -
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Paper • 2510.12693 • Published • 27
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 103 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 345 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper • 2512.16676 • Published • 199 -
Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies
Paper • 2512.19673 • Published • 60 -
Region-Constraint In-Context Generation for Instructional Video Editing
Paper • 2512.17650 • Published • 49 -
SpatialTree: How Spatial Abilities Branch Out in MLLMs
Paper • 2512.20617 • Published • 42
-
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 114 -
KlingAvatar 2.0 Technical Report
Paper • 2512.13313 • Published • 40 -
SemanticGen: Video Generation in Semantic Space
Paper • 2512.20619 • Published • 88 -
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper • 2512.16676 • Published • 199
-
ARE: Scaling Up Agent Environments and Evaluations
Paper • 2509.17158 • Published • 35 -
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation
Paper • 2510.08551 • Published • 33 -
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
Paper • 2510.04212 • Published • 23 -
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Paper • 2510.12693 • Published • 27
-
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 140 -
Recursive Introspection: Teaching Language Model Agents How to Self-Improve
Paper • 2407.18219 • Published • 3 -
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
Paper • 2408.16293 • Published • 27 -
Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models
Paper • 2409.04787 • Published • 1
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 103 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75