T2V-CompBench Leaderboard

๐Ÿ† Welcome to the leaderboard of the T2V-CompBench! ๐ŸŽฆ A Comprehensive Benchmark for Compositional Text-to-video Generation

  • 1400 Prompts: We analyze 1.67 million real-user prompts to extract high-frequency nouns, verbs, and adjectives, resulting in a suite of 1,400 prompts.
  • 7 Compositional Categories: We evaluate multiple-object compositionality on attributes, actions, interactions, quantities, and spatio-temporal dynamics, covering 7 categories.
  • Evaluation metrics: We design MLLM-based, Detection-based, and Tracking-based evaluation metrics for compositional T2V generation, all validated by human evaluations.
  • Valuable Insights: We provide insightful analysis on current models' ability, highlighting the significant challenge of compositional T2V generation.

Join Leaderboard: Please follow the steps in our github repository to prepare the videos and run the evaluation scripts. Before uploading the generated .csv files here, please conduct a final check by carefully reading this instruction. After clicking the Submit Eval! button, click the Refresh button. Then, you can successfully showcase your model's performance on our leaderboard!

Model Information: What are the details of these Video Generation Models? See Appendix B of our paper. We will provide more details soon.

Evaluation Category
Model Name (clickable)
Evaluated by
Date
Total Avg. Score
Selected Avg. Score
Consistent Attribute Binding
Dynamic Attribute Binding
Spatial Relationships
Motion Binding
Action Binding
Object Interactions
Generative Numeracy

T2V-CompBench Team

2024-12-01

0.5661
0.5661
0.6931
0.0624
0.5979
0.2867
0.8722
0.8309
0.6066