SII-Yibin Wang
CodeGoat24
AI & ML interests
I'm part of Shanghai Innovation Institute, focusing on Multimodal RL and Generation.
Recent Activity
liked
a model about 23 hours ago
CodeGoat24/UnifiedReward-2.0-qwen-7b liked
a model about 24 hours ago
CodeGoat24/UnifiedReward-2.0-qwen35-9b updated
a collection
about 24 hours ago
UnifiedReward Edit Models Organizations
Pref-GRPO & UniGenBench
-
UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation
Paper • 2510.18701 • Published • 67 -
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Paper • 2508.20751 • Published • 89 -
CodeGoat24/UniGenBench-Eval-Images
Preview • Updated • 5.03k • 4 -
CodeGoat24/UniGenBench-EvalModel-qwen3vl-32b-v1
Image-Text-to-Text • 1.14M • Updated • 157
UnifiedReward 2.0 Qwen3VL Models
UnifiedReward 1.0 Qwen2.5VL Models
-
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 123 -
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Paper • 2505.03318 • Published • 92 -
CodeGoat24/UnifiedReward-Think-qwen-7b
8B • Updated • 313 • 3 -
CodeGoat24/UnifiedReward-qwen-32b
33B • Updated • 2 • 1
UnifiedReward 1.0 LLaVA Model
-
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 123 -
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Paper • 2505.03318 • Published • 92 -
CodeGoat24/UnifiedReward-Think-7b
8B • Updated • 53 • 10 -
CodeGoat24/UnifiedReward-7b-v1.5
8B • Updated • 5.51k • 7
UnifiedReward Flex
-
Unified Personalized Reward Model for Vision Generation
Paper • 2602.02380 • Published • 20 -
CodeGoat24/FLUX.2-klein-base-9B-UnifiedReward-Flex-lora
Text-to-Image • Updated • 561 • 19 -
CodeGoat24/Wan2.2-T2V-A14B-UnifiedReward-Flex-lora
Text-to-Video • Updated • 306 • 11 -
CodeGoat24/Wan2.1-T2V-14B-UnifiedReward-Flex-lora
Text-to-Video • Updated • 200 • 6
UnifiedReward Edit Models
UnifiedReward 2.0 Qwen2.5VL Models
UnifiedReward 1.0 Qwen2.5 Models GGUF
-
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 123 -
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Paper • 2505.03318 • Published • 92 -
mradermacher/UnifiedReward-qwen-32b-i1-GGUF
33B • Updated • 514 • 1 -
mradermacher/UnifiedReward-Think-qwen-7b-i1-GGUF
8B • Updated • 871
UnifiedReward Training Data
-
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 123 -
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Paper • 2505.03318 • Published • 92 -
CodeGoat24/UnifiedReward-2.0-T2X-score-data
Viewer • Updated • 337k • 334 -
CodeGoat24/ImageGen-CoT-Reward-5K
Viewer • Updated • 5.54k • 74 • 1
UnifiedReward 2.0 Qwen3.5 Models
UnifiedReward Flex
-
Unified Personalized Reward Model for Vision Generation
Paper • 2602.02380 • Published • 20 -
CodeGoat24/FLUX.2-klein-base-9B-UnifiedReward-Flex-lora
Text-to-Image • Updated • 561 • 19 -
CodeGoat24/Wan2.2-T2V-A14B-UnifiedReward-Flex-lora
Text-to-Video • Updated • 306 • 11 -
CodeGoat24/Wan2.1-T2V-14B-UnifiedReward-Flex-lora
Text-to-Video • Updated • 200 • 6
Pref-GRPO & UniGenBench
-
UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation
Paper • 2510.18701 • Published • 67 -
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Paper • 2508.20751 • Published • 89 -
CodeGoat24/UniGenBench-Eval-Images
Preview • Updated • 5.03k • 4 -
CodeGoat24/UniGenBench-EvalModel-qwen3vl-32b-v1
Image-Text-to-Text • 1.14M • Updated • 157
UnifiedReward Edit Models
UnifiedReward 2.0 Qwen3VL Models
UnifiedReward 2.0 Qwen2.5VL Models
UnifiedReward 1.0 Qwen2.5VL Models
-
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 123 -
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Paper • 2505.03318 • Published • 92 -
CodeGoat24/UnifiedReward-Think-qwen-7b
8B • Updated • 313 • 3 -
CodeGoat24/UnifiedReward-qwen-32b
33B • Updated • 2 • 1
UnifiedReward 1.0 Qwen2.5 Models GGUF
-
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 123 -
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Paper • 2505.03318 • Published • 92 -
mradermacher/UnifiedReward-qwen-32b-i1-GGUF
33B • Updated • 514 • 1 -
mradermacher/UnifiedReward-Think-qwen-7b-i1-GGUF
8B • Updated • 871
UnifiedReward 1.0 LLaVA Model
-
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 123 -
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Paper • 2505.03318 • Published • 92 -
CodeGoat24/UnifiedReward-Think-7b
8B • Updated • 53 • 10 -
CodeGoat24/UnifiedReward-7b-v1.5
8B • Updated • 5.51k • 7
UnifiedReward Training Data
-
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 123 -
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Paper • 2505.03318 • Published • 92 -
CodeGoat24/UnifiedReward-2.0-T2X-score-data
Viewer • Updated • 337k • 334 -
CodeGoat24/ImageGen-CoT-Reward-5K
Viewer • Updated • 5.54k • 74 • 1