---
license: apache-2.0
datasets:
- weizhiwang/mlm_filter_instructions
language:
- en
base_model:
- Qwen/Qwen2.5-VL-3B-Instruct
---
# Model Card for mlm-filter-qwen2.5-VL-3B

<!-- Provide a quick summary of what the model is/does. -->

This model is trained on a scoring dataset and can be used to score English image-text pairs. It supports four dimensions: image_text_matching, object_detail_fulfillment, caption_text_quality, and semantic_understanding. The base model is Qwen2.5 VL-instruct-3B. Since most of the publicly released models by the original authors are based on custom architectures, it is inconvenient to perform inference with vLLM. Therefore, we trained Qwen2.5-VL on the same data to fully support vLLM inference and accelerate inference speed.

该模型基于数据集进行训练，可用于对英语图文对进行评分。它支持四个维度：图文匹配（image_text_matching）、细节符合度（object_detail_fulfillment）、文本质量（caption_text_quality）以及语义理解（semantic_understanding）。基础模型为 Qwen2.5 VL-instruct-3B。由于原作者公开的模型大多基于自定义架构，使用vllm推理不方便，故我们使用相同数据训练了Qwen2.5 VL，以全面的支持vllm推理，以加快推理速度。

---
The dataset used is [weizhiwang/mlm_filter_instructions](https://huggingface.co/datasets/weizhiwang/mlm_filter_instructions), and the inference prompt can be referenced from the github link from the original authors [MLM-Filter](https://github.com/Victorwz/MLM_Filter).

使用的数据集为[weizhiwang/mlm_filter_instructions](https://huggingface.co/datasets/weizhiwang/mlm_filter_instructions)，推理prompt可参考原作者的代码[MLM-Filter](https://github.com/Victorwz/MLM_Filter)，