view article Article We’re open-sourcing our text-to-image model and the process behind it 24 days ago • 73
CoVT: Chain-of-Visual-Thought Collection Enrich VLMs’ vision-centric reasoning capabilities via Chain-of-Visual-Thought! • 7 items • Updated 11 days ago • 5
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge Feb 7 • 253
view article Article CinePile 2.0 - making stronger datasets with adversarial refinement +2 Oct 23, 2024 • 18
view article Article PaliGemma – Google's Cutting-Edge Open Vision Language Model +1 May 14, 2024 • 277
V-JEPA 2 Collection A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13 • 173
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation Paper • 2506.03147 • Published Jun 3 • 58
D-FINE Collection State-of-the-art real-time object detection model with Apache 2.0 licence • 15 items • Updated May 5 • 56
view article Article 🚀 Build a Qwen 2.5 VL API endpoint with Hugging Face spaces and Docker! Jan 29 • 21