T2V-CompBench Leaderboard
๐ Welcome to the leaderboard of the T2V-CompBench! ๐ฆ A Comprehensive Benchmark for Compositional Text-to-video Generation
- 1400 Prompts: We analyze 1.67 million real-user prompts to extract high-frequency nouns, verbs, and adjectives, resulting in a suite of 1,400 prompts.
- 7 Compositional Categories: We evaluate multiple-object compositionality on attributes, actions, interactions, quantities, and spatio-temporal dynamics, covering 7 categories.
- Evaluation metrics: We design MLLM-based, Detection-based, and Tracking-based evaluation metrics for compositional T2V generation, all validated by human evaluations.
- Valuable Insights: We provide insightful analysis on current models' ability, highlighting the significant challenge of compositional T2V generation.
Join Leaderboard: Please follow the steps in our github repository to prepare the videos and run the evaluation scripts. Before uploading the generated .csv files here, please conduct a final check by carefully reading this instruction. After clicking the Submit Eval! button, click the Refresh button. Then, you can successfully showcase your model's performance on our leaderboard!
Model Information: What are the details of these Video Generation Models? See Appendix B of our paper. We will provide more details soon.
Model Name (clickable) | Evaluated by | Date | Total Avg. Score | Selected Avg. Score | Consistent Attribute Binding | Dynamic Attribute Binding | Spatial Relationships | Motion Binding | Action Binding | Object Interactions | Generative Numeracy |
|---|---|---|---|---|---|---|---|---|---|---|---|
T2V-CompBench Team | 2024-12-01 | 0.5661 | 0.5661 | 0.6931 | 0.0624 | 0.5979 | 0.2867 | 0.8722 | 0.8309 | 0.6066 |
Model Name (clickable) | Evaluated by | Date | Consistent Attribute Binding-Color | Consistent Attribute Binding-Shape | Consistent Attribute Binding-Texture | 2D Spatial Relationships-Coexist | 2D Spatial Relationships-Acc. | 2D Spatial Relationships-Acc.Score | Motion Binding-Motion Level | Motion Binding-Acc. | Action Binding-Common | Action Binding-Uncommon | Object Interactions-Physical | Object Interactions-Social | Total Avg. Score | Selected Avg. Score |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
T2V-CompBench Team | 2024-12-01 | 0.7898 | 0.5768 | 0.6381 | 89% | 71% | 0.8231 | 28.09 | 38% | 0.9016 | 0.7546 | 0.8219 | 0.7726 | 0.5661 | 0.5661 |
- T2V-CompBench, a comprehensive benchmark for compositional text-to-video generation, consists of seven categories: consistent attribute binding, dynamic attribute binding, spatial relationships, motion binding, action binding, object interactions, and generative numeracy.
- For each category, we carefully design 200 prompts, resulting in 1400 in total, and sample generated videos from a set of T2V models.
- We propose three types of evaluation metrics: MLLM-based, Detection-based, and Tracking-based metrics, all specifically designed for compositional T2V generation and validated by human evaluations.
- We benchmark various T2V models, reveal their strengths and weaknesses by examining the results across 7 categories and 12 sub-dimensions, meanwhile provide insightful analysis on compositional T2V generation.
Submit on T2V-CompBench Introduction
๐ฎ
- Please note that you need to obtain a list of
.csvfiles by running the evaluation scripts of T2V-CompBench in our Github. You may conduct an Offline Check before uploading. - Then, pack these CSV files into a
ZIParchive, ensuring that the top-level directory of the ZIP contains the individual CSV files. - Finally, upload the ZIP archive below.
โ ๏ธ Uploading generated videos of the model is invalid!
โ ๏ธ Submissions that do not correctly fill in the model name and model link may be deleted by the T2V-CompBench team. The contact information you filled in will not be made public.
โ๏ธโจ Submit your model evaluation CSV files here!
Here is a required field
Submit Success! Please press refresh and return to LeaderBoard!
โ๏ธPlease ensure that the Model Name, Project Page, and Email are filled in correctly.