DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios Paper • 2604.25914 • Published 6 days ago • 40
WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models Paper • 2604.18224 • Published 14 days ago • 22
DR^{3}-Eval: Towards Realistic and Reproducible Deep Research Evaluation Paper • 2604.14683 • Published 18 days ago • 36
CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction Paper • 2603.00610 • Published Feb 28 • 35
AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations Paper • 2602.03828 • Published Feb 3 • 20
Vibe AIGC: A New Paradigm for Content Generation via Agentic Orchestration Paper • 2602.04575 • Published Feb 4 • 17
YuE: Scaling Open Foundation Models for Long-Form Music Generation Paper • 2503.08638 • Published Mar 11, 2025 • 73
T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation Paper • 2512.21094 • Published Dec 24, 2025 • 25
AutoMV: An Automatic Multi-Agent System for Music Video Generation Paper • 2512.12196 • Published Dec 13, 2025 • 7
NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents Paper • 2512.12730 • Published Dec 14, 2025 • 52
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle Paper • 2512.04324 • Published Dec 3, 2025 • 159
How Far Are We from Genuinely Useful Deep Research Agents? Paper • 2512.01948 • Published Dec 1, 2025 • 58
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence Paper • 2511.18538 • Published Nov 23, 2025 • 304