--- license: apache-2.0 datasets: - weizhiwang/mlm_filter_instructions language: - en base_model: - Qwen/Qwen2.5-VL-3B-Instruct --- # Model Card for mlm-filter-qwen2.5-VL-3B This model is trained on a scoring dataset and can be used to score English image-text pairs. It supports four dimensions: image_text_matching, object_detail_fulfillment, caption_text_quality, and semantic_understanding. The base model is Qwen2.5 VL-instruct-3B. Since most of the publicly released models by the original authors are based on custom architectures, it is inconvenient to perform inference with vLLM. Therefore, we trained Qwen2.5-VL on the same data to fully support vLLM inference and accelerate inference speed. 该模型基于数据集进行训练,可用于对英语图文对进行评分。它支持四个维度:图文匹配(image_text_matching)、细节符合度(object_detail_fulfillment)、文本质量(caption_text_quality)以及语义理解(semantic_understanding)。基础模型为 Qwen2.5 VL-instruct-3B。由于原作者公开的模型大多基于自定义架构,使用vllm推理不方便,故我们使用相同数据训练了Qwen2.5 VL,以全面的支持vllm推理,以加快推理速度。 --- The dataset used is [weizhiwang/mlm_filter_instructions](https://huggingface.co/datasets/weizhiwang/mlm_filter_instructions), and the inference prompt can be referenced from the github link from the original authors [MLM-Filter](https://github.com/Victorwz/MLM_Filter). 使用的数据集为[weizhiwang/mlm_filter_instructions](https://huggingface.co/datasets/weizhiwang/mlm_filter_instructions),推理prompt可参考原作者的代码[MLM-Filter](https://github.com/Victorwz/MLM_Filter),