dreamdifferent/mimic-video-so101-hetero-ee-2cam-hstack-5fps-action-decoder Updated 3 days ago • 5 • 1
OmniDirector: General Multi-Shot Camera Cloning without Cross-Paired Data Paper • 2606.13432 • Published 12 days ago • 106
STREAM: A Data-Centric Framework for Mining High-Value Task-Oriented Dialogues from Streaming Media Paper • 2605.25162 • Published about 1 month ago • 4
Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players Paper • 2605.28816 • Published 27 days ago • 430
AR-VLA: True Autoregressive Action Expert for Vision-Language-Action Models Paper • 2603.10126 • Published May 11 • 2
Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining Paper • 2605.14747 • Published May 14 • 147
LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening Paper • 2605.19597 • Published May 19 • 21
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published May 13 • 274
ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents Paper • 2605.12481 • Published May 12 • 28
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published May 7 • 236
HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation Paper • 2604.28196 • Published Apr 30 • 74
Code-Switching Information Retrieval: Benchmarks, Analysis, and the Limits of Current Retrievers Paper • 2604.17632 • Published Apr 19 • 12