MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models

We present MMIE, a Massive Multimodal Interleaved understanding Evaluation benchmark, designed for Large Vision-Language Models (LVLMs). MMIE offers a robust framework for evaluating the interleaved comprehension and generation capabilities of LVLMs across diverse fields, supported by reliable automated metrics.

Website | Code | Dataset | Results | Evaluation Model | Paper

Model Type
Model
Model Type
Situational analysis
Project-based learning
Multi-step reasoning
AVG
10
Qwen-VL-70b | Openjourney
Interleaved LVLM
47.63
55.12
42.17
50.92