LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling Paper • 2511.20785 • Published 11 days ago • 148
CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning Paper • 2511.18659 • Published 13 days ago • 11
MDGA Collection Make Diffusion Great Again. The resource list for Super Data Learners, Quokka, and OpenMoE 2. • 16 items • Updated Nov 4 • 8
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe Paper • 2511.16334 • Published 16 days ago • 91
Scaling Spatial Intelligence with Multimodal Foundation Models Paper • 2511.13719 • Published 19 days ago • 44
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25 • 208
SenseNova-SI Collection Scaling Spatial Intelligence with Multimodal Foundation Models • 8 items • Updated about 11 hours ago • 10
PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image Paper • 2511.13648 • Published 19 days ago • 52
VST Collection A comprehensive framework designed to cultivate VLMs with human-like visuospatial abilities. • 5 items • Updated 24 days ago • 6
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought Paper • 2511.02779 • Published Nov 4 • 57
The Quest for Generalizable Motion Generation: Data, Model, and Evaluation Paper • 2510.26794 • Published Oct 30 • 26
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training Paper • 2509.23661 • Published Sep 28 • 46
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 155
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization Paper • 2507.06181 • Published Jul 8 • 43
view article Article OpenEvolve: An Open Source Implementation of Google DeepMind's AlphaEvolve May 20 • 51