Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length Paper • 2512.04677 • Published 4 days ago • 141
view article Article SARLO-80: Worldwide Slant SAR Language Optic Dataset at 80 cm Resolution 7 days ago • 3
Evaluating In Silico Creativity: An Expert Review of AI Chess Compositions Paper • 2510.23772 • Published Oct 27 • 1
Prompt-to-Prompt Image Editing with Cross Attention Control Paper • 2208.01626 • Published Aug 2, 2022 • 3
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale Paper • 2407.05282 • Published Jul 7, 2024 • 16
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech Paper • 2106.06103 • Published Jun 11, 2021 • 4
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper • 2510.15870 • Published Oct 17 • 89
Video-As-Prompt: Unified Semantic Control for Video Generation Paper • 2510.20888 • Published Oct 23 • 45
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE Paper • 2510.13344 • Published Oct 15 • 62
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec Paper • 2410.15764 • Published Oct 21, 2024 • 1
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer Paper • 2409.00750 • Published Sep 1, 2024 • 5