-
SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer
Paper • 2509.24695 • Published • 45 -
Efficient-Large-Model/SANA-Video_2B_480p
Text-to-Video • Updated • 168 • 12 -
Efficient-Large-Model/SANA-Video_2B_480p_diffusers
Text-to-Video • Updated • 5 -
Efficient-Large-Model/video_toy_data
Updated • 105
Collections
Discover the best community collections!
Collections including paper arxiv:2509.24695
-
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 104 -
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
Paper • 2506.08009 • Published • 30 -
Seeing Voices: Generating A-Roll Video from Audio with Mirage
Paper • 2506.08279 • Published • 27 -
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
Paper • 2506.07848 • Published • 4
-
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
Paper • 2504.02821 • Published • 9 -
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
Paper • 2504.17343 • Published • 13 -
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting
Paper • 2504.15921 • Published • 7 -
Causal-Copilot: An Autonomous Causal Analysis Agent
Paper • 2504.13263 • Published • 7
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 18 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
-
Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation
Paper • 2505.18875 • Published • 42 -
PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models
Paper • 2506.16054 • Published • 60 -
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Paper • 2410.02367 • Published • 49 -
Radial Attention: O(nlog n) Sparse Attention with Energy Decay for Long Video Generation
Paper • 2506.19852 • Published • 41
-
SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer
Paper • 2509.24695 • Published • 45 -
Efficient-Large-Model/SANA-Video_2B_480p
Text-to-Video • Updated • 168 • 12 -
Efficient-Large-Model/SANA-Video_2B_480p_diffusers
Text-to-Video • Updated • 5 -
Efficient-Large-Model/video_toy_data
Updated • 105
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 18 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
-
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 104 -
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
Paper • 2506.08009 • Published • 30 -
Seeing Voices: Generating A-Roll Video from Audio with Mirage
Paper • 2506.08279 • Published • 27 -
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
Paper • 2506.07848 • Published • 4
-
Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation
Paper • 2505.18875 • Published • 42 -
PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models
Paper • 2506.16054 • Published • 60 -
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Paper • 2410.02367 • Published • 49 -
Radial Attention: O(nlog n) Sparse Attention with Energy Decay for Long Video Generation
Paper • 2506.19852 • Published • 41
-
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
Paper • 2504.02821 • Published • 9 -
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
Paper • 2504.17343 • Published • 13 -
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting
Paper • 2504.15921 • Published • 7 -
Causal-Copilot: An Autonomous Causal Analysis Agent
Paper • 2504.13263 • Published • 7