Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model Paper • 2510.27607 • Published Oct 31 • 8
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics Paper • 2506.00070 • Published May 29 • 29
Efficient Long Video Tokenization via Coordinated-based Patch Reconstruction Paper • 2411.14762 • Published Nov 22, 2024 • 11
Meta-Transformer: A Unified Framework for Multimodal Learning Paper • 2307.10802 • Published Jul 20, 2023 • 44
Collaborative Score Distillation for Consistent Visual Synthesis Paper • 2307.04787 • Published Jul 4, 2023 • 29