Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion Paper • 2606.15236 • Published 4 days ago • 18
Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion Paper • 2606.15236 • Published 4 days ago • 18
From Pixels to Words -- Towards Native One-Vision Models at Scale Paper • 2605.28820 • Published 24 days ago • 73
From Pixels to Words -- Towards Native One-Vision Models at Scale Paper • 2605.28820 • Published 24 days ago • 73 • 3
NEO1_5 Collection From Pixels to Words -- Towards Native One-Vision Models at Scale • 3 items • Updated 23 days ago • 6
From Pixels to Words -- Towards Native One-Vision Models at Scale Paper • 2605.28820 • Published 24 days ago • 73
From Pixels to Words -- Towards Native One-Vision Models at Scale Paper • 2605.28820 • Published 24 days ago • 73
NEO1_5 Collection From Pixels to Words -- Towards Native One-Vision Models at Scale • 3 items • Updated 23 days ago • 6
LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence Paper • 2605.25979 • Published 26 days ago • 27
SpatialBench: Is Your Spatial Foundation Model an All-Round Player? Paper • 2605.27367 • Published 25 days ago • 72
PhysX-Omni: Unified Simulation-Ready Physical 3D Generation for Rigid, Deformable, and Articulated Objects Paper • 2605.21572 • Published about 1 month ago • 53
DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation Paper • 2601.22153 • Published Jan 29 • 75
VISTA-Bench: Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text? Paper • 2602.04802 • Published Feb 4 • 2