AlignBench: Benchmarking Fine-Grained Image-Text Alignment with Synthetic Image-Caption Pairs Paper • 2511.20515 • Published 13 days ago • 3
OneThinker: All-in-one Reasoning Model for Image and Video Paper • 2512.03043 • Published 6 days ago • 29
Thinking with Programming Vision: Towards a Unified View for Thinking with Images Paper • 2512.03746 • Published 6 days ago • 15
Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation Paper • 2512.03534 • Published 6 days ago • 18
CookAnything: A Framework for Flexible and Consistent Multi-Step Recipe Image Generation Paper • 2512.03540 • Published 6 days ago • 11
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper • 2511.22699 • Published 11 days ago • 164
REASONEDIT: Towards Reasoning-Enhanced Image Editing Models Paper • 2511.22625 • Published 11 days ago • 45
Looking Through the Glass: Neural Surface Reconstruction Against High Specular Reflections Paper • 2304.08706 • Published Apr 18, 2023
Towards Natural Image Matting in the Wild via Real-Scenario Prior Paper • 2410.06593 • Published Oct 9, 2024 • 4
ConsisSR: Delving Deep into Consistency in Diffusion-based Image Super-Resolution Paper • 2410.13807 • Published Oct 17, 2024
High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity Paper • 2410.10105 • Published Oct 14, 2024 • 3
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning Paper • 2505.12370 • Published May 18
MagicTryOn: Harnessing Diffusion Transformer for Garment-Preserving Video Virtual Try-on Paper • 2505.21325 • Published May 27 • 4
HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex Motions Paper • 2505.22977 • Published May 29 • 1
A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models Paper • 2508.01548 • Published Aug 3 • 13
VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models Paper • 2511.11007 • Published 25 days ago • 15
MagicWorld: Interactive Geometry-driven Video World Exploration Paper • 2511.18886 • Published 15 days ago • 17
One4D: Unified 4D Generation and Reconstruction via Decoupled LoRA Control Paper • 2511.18922 • Published 15 days ago • 10
VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models Paper • 2511.11007 • Published 25 days ago • 15