-
HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation
Paper • 2504.21650 • Published • 16 -
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation
Paper • 2505.02836 • Published • 8 -
ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies
Paper • 2506.14315 • Published • 10 -
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels
Paper • 2507.21809 • Published • 135
Collections
Discover the best community collections!
Collections including paper arxiv:2505.02836
-
Describe Anything: Detailed Localized Image and Video Captioning
Paper • 2504.16072 • Published • 63 -
EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment
Paper • 2410.09604 • Published -
Geospatial Mechanistic Interpretability of Large Language Models
Paper • 2505.03368 • Published • 11 -
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation
Paper • 2505.02836 • Published • 8
-
TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion
Paper • 2401.09416 • Published • 11 -
SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild
Paper • 2401.10171 • Published • 14 -
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model
Paper • 2311.09217 • Published • 22 -
GALA: Generating Animatable Layered Assets from a Single Scan
Paper • 2401.12979 • Published • 9
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
FlashWorld: High-quality 3D Scene Generation within Seconds
Paper • 2510.13678 • Published • 71 -
NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks
Paper • 2510.15019 • Published • 63 -
GeoSVR: Taming Sparse Voxels for Geometrically Accurate Surface Reconstruction
Paper • 2509.18090 • Published • 4 -
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation
Paper • 2509.19296 • Published • 23
-
HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation
Paper • 2504.21650 • Published • 16 -
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation
Paper • 2505.02836 • Published • 8 -
ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies
Paper • 2506.14315 • Published • 10 -
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels
Paper • 2507.21809 • Published • 135
-
Describe Anything: Detailed Localized Image and Video Captioning
Paper • 2504.16072 • Published • 63 -
EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment
Paper • 2410.09604 • Published -
Geospatial Mechanistic Interpretability of Large Language Models
Paper • 2505.03368 • Published • 11 -
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation
Paper • 2505.02836 • Published • 8
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion
Paper • 2401.09416 • Published • 11 -
SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild
Paper • 2401.10171 • Published • 14 -
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model
Paper • 2311.09217 • Published • 22 -
GALA: Generating Animatable Layered Assets from a Single Scan
Paper • 2401.12979 • Published • 9
-
FlashWorld: High-quality 3D Scene Generation within Seconds
Paper • 2510.13678 • Published • 71 -
NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks
Paper • 2510.15019 • Published • 63 -
GeoSVR: Taming Sparse Voxels for Geometrically Accurate Surface Reconstruction
Paper • 2509.18090 • Published • 4 -
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation
Paper • 2509.19296 • Published • 23