Model Summary

UnifiedReward-2.0-qwen35-4b is the first unified reward model based on Qwen/Qwen3.5-4B for multimodal understanding and generation assessment, enabling both pairwise ranking and pointwise scoring, which can be employed for vision model preference alignment.

For further details, please refer to the following resources:

📰 Paper: https://arxiv.org/pdf/2503.05236
🪐 Project Page: https://codegoat24.github.io/UnifiedReward/
🤗 Model Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-models-67c3008148c3a380d15ac63a
🤗 Dataset Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-training-data-67c300d4fd5eff00fa7f1ede
👋 Point of Contact: Yibin Wang

vLLM Server Deployment

export VLLM_DISABLE_FLASHINFER_GDN_PREFILL=1
export TOKENIZERS_PARALLELISM=false
vllm serve CodeGoat24/UnifiedReward-2.0-qwen35-4b \
 --host localhost \
 --port 8080 \
 --trust-remote-code \
 --served-model-name UnifiedReward \
 --gpu-memory-utilization 0.95 \
 --mm-encoder-tp-mode data \
 --mm-processor-cache-type shm \
 --enable-prefix-caching \
 --tensor-parallel-size 8 \
 --default-chat-template-kwargs '{"enable_thinking": false}'

The inference code is provided here.

🏁 Compared with Current Reward Models

Reward Model	Method	Image Generation	Image Understanding	Video Generation	Video Understanding
PickScore	Point	√
HPS	Point	√
ImageReward	Point	√
LLaVA-Critic	Pair/Point		√
IXC-2.5-Reward	Pair/Point		√		√
VideoScore	Point			√
LiFT	Point			√
VisionReward	Point	√		√
VideoReward	Point			√
UnifiedReward (Ours)	Pair/Point	√	√	√	√

Citation

@article{unifiedreward,
  title={Unified reward model for multimodal understanding and generation},
  author={Wang, Yibin and Zang, Yuhang and Li, Hao and Jin, Cheng and Wang, Jiaqi},
  journal={arXiv preprint arXiv:2503.05236},
  year={2025}
}