ESPIRE: A Diagnostic Benchmark for Embodied Spatial Reasoning of Vision-Language Models Paper • 2603.13033 • Published 3 days ago • 11
Cosmos-Tokenizer Collection A suite of image and video tokenizers • 12 items • Updated 5 days ago • 43