JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion Paper • 2601.22143 • Published 1 day ago • 2
UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders Paper • 2601.17950 • Published 5 days ago • 3
SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer Paper • 2601.16515 • Published 8 days ago • 15
VideoMaMa: Mask-Guided Video Matting via Generative Prior Paper • 2601.14255 • Published 10 days ago • 13
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding Paper • 2601.14724 • Published 10 days ago • 73
Transition Matching Distillation for Fast Video Generation Paper • 2601.09881 • Published 16 days ago • 32
Alterbute: Editing Intrinsic Attributes of Objects in Images Paper • 2601.10714 • Published 15 days ago • 30
SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices Paper • 2601.08303 • Published 18 days ago • 16
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head Paper • 2601.07832 • Published 18 days ago • 51
GenCtrl -- A Formal Controllability Toolkit for Generative Models Paper • 2601.05637 • Published 22 days ago • 4
Klear: Unified Multi-Task Audio-Video Joint Generation Paper • 2601.04151 • Published 23 days ago • 16
DreamStyle: A Unified Framework for Video Stylization Paper • 2601.02785 • Published 25 days ago • 24
LTX-2: Efficient Joint Audio-Visual Foundation Model Paper • 2601.03233 • Published 24 days ago • 141
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation Paper • 2512.23576 • Published Dec 29, 2025 • 65
Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation Paper • 2601.00664 • Published 29 days ago • 56