Encoder-Free Human Motion Understanding via Structured Motion Descriptions Paper • 2604.21668 • Published 6 days ago • 1
Benchmarking and Mechanistic Analysis of Vision-Language Models for Cross-Depiction Assembly Instruction Alignment Paper • 2604.00913 • Published 27 days ago • 4
LongCat-Next: Lexicalizing Modalities as Discrete Tokens Paper • 2603.27538 • Published about 1 month ago • 145
NanoVDR: Distilling a 2B Vision-Language Retriever into a 70M Text-Only Encoder for Visual Document Retrieval Paper • 2603.12824 • Published Mar 13 • 5
view article Article NanoVDR: A 70M Text-Only Model That Retrieves Visual Documents as Well as a 2B VLM Mar 16 • 3
view article Article LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family Jan 19 • 92
UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation Paper • 2510.18701 • Published Oct 21, 2025 • 68