Model Summary

UnifiedReward-2.0-qwen35-4b is the first unified reward model based on Qwen/Qwen3.5-4B for multimodal understanding and generation assessment, enabling both pairwise ranking and pointwise scoring, which can be employed for vision model preference alignment.

For further details, please refer to the following resources:

vLLM Server Deployment

export VLLM_DISABLE_FLASHINFER_GDN_PREFILL=1
export TOKENIZERS_PARALLELISM=false
vllm serve CodeGoat24/UnifiedReward-2.0-qwen35-4b \
 --host localhost \
 --port 8080 \
 --trust-remote-code \
 --served-model-name UnifiedReward \
 --gpu-memory-utilization 0.95 \
 --mm-encoder-tp-mode data \
 --mm-processor-cache-type shm \
 --enable-prefix-caching \
 --tensor-parallel-size 8 \
 --default-chat-template-kwargs '{"enable_thinking": false}'

The inference code is provided here.

🏁 Compared with Current Reward Models

Reward Model Method Image Generation Image Understanding Video Generation Video Understanding
PickScore Point √
HPS Point √
ImageReward Point √
LLaVA-Critic Pair/Point √
IXC-2.5-Reward Pair/Point √ √
VideoScore Point √
LiFT Point √
VisionReward Point √ √
VideoReward Point √
UnifiedReward (Ours) Pair/Point √ √ √ √

Citation

@article{unifiedreward,
  title={Unified reward model for multimodal understanding and generation},
  author={Wang, Yibin and Zang, Yuhang and Li, Hao and Jin, Cheng and Wang, Jiaqi},
  journal={arXiv preprint arXiv:2503.05236},
  year={2025}
}
Downloads last month
47
Safetensors
Model size
5B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for CodeGoat24/UnifiedReward-2.0-qwen35-4b

Finetuned
Qwen/Qwen3.5-4B
Finetuned
(44)
this model
Finetunes
2 models
Quantizations
2 models

Collection including CodeGoat24/UnifiedReward-2.0-qwen35-4b

Paper for CodeGoat24/UnifiedReward-2.0-qwen35-4b