Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
- Website
- Community
- Solutions
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2604.04746

Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning

Paper • 2604.04746 • Published Apr 8 • 72
ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation

Paper • 2602.00744 • Published Jan 31 • 12
Language-Guided Music Recommendation for Video via Prompt Analogies

Paper • 2306.09327 • Published Jun 15, 2023 • 10

Papers I'm going to read

LTX-2: Efficient Joint Audio-Visual Foundation Model

Paper • 2601.03233 • Published Jan 6 • 180
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Paper • 2601.07832 • Published Jan 12 • 53
Motion Attribution for Video Generation

Paper • 2601.08828 • Published Jan 13 • 72
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published Jan 27 • 27

CoLLM: A Large Language Model for Composed Image Retrieval

Paper • 2503.19910 • Published Mar 25, 2025 • 15
LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing

Paper • 2503.21541 • Published Mar 27, 2025 • 1
HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration

Paper • 2504.03536 • Published Apr 4, 2025 • 13
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis

Paper • 2504.04842 • Published Apr 7, 2025 • 35

Contrastive Decoding Improves Reasoning in Large Language Models

Paper • 2309.09117 • Published Sep 17, 2023 • 40
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Paper • 2310.08491 • Published Oct 12, 2023 • 57
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding

Paper • 2411.04282 • Published Nov 6, 2024 • 37
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 25

about 16 hours ago

AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation

Paper • 2602.17100 • Published Feb 19 • 4
GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant

Paper • 2603.01059 • Published Mar 1 • 1
Multi-Domain Riemannian Graph Gluing for Building Graph Foundation Models

Paper • 2603.00618 • Published Feb 28
Heterogeneous Agent Collaborative Reinforcement Learning

Paper • 2603.02604 • Published Mar 3 • 197

about 17 hours ago

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning

Paper • 2506.22434 • Published Jun 27, 2025 • 10
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning

Paper • 2507.13348 • Published Jul 17, 2025 • 79
RewardDance: Reward Scaling in Visual Generation

Paper • 2509.08826 • Published Sep 10, 2025 • 73
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Paper • 2510.18876 • Published Oct 21, 2025 • 37

about 18 hours ago

BrushEdit: All-In-One Image Inpainting and Editing

Paper • 2412.10316 • Published Dec 13, 2024 • 37
ColorFlow: Retrieval-Augmented Image Sequence Colorization

Paper • 2412.11815 • Published Dec 16, 2024 • 27
FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers

Paper • 2412.09611 • Published Dec 12, 2024 • 11
FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing

Paper • 2412.07517 • Published Dec 10, 2024 • 11

Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning

Paper • 2604.04746 • Published Apr 8 • 72
ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation

Paper • 2602.00744 • Published Jan 31 • 12
Language-Guided Music Recommendation for Video via Prompt Analogies

Paper • 2306.09327 • Published Jun 15, 2023 • 10

about 16 hours ago

AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation

Paper • 2602.17100 • Published Feb 19 • 4
GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant

Paper • 2603.01059 • Published Mar 1 • 1
Multi-Domain Riemannian Graph Gluing for Building Graph Foundation Models

Paper • 2603.00618 • Published Feb 28
Heterogeneous Agent Collaborative Reinforcement Learning

Paper • 2603.02604 • Published Mar 3 • 197

Papers I'm going to read

LTX-2: Efficient Joint Audio-Visual Foundation Model

Paper • 2601.03233 • Published Jan 6 • 180
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Paper • 2601.07832 • Published Jan 12 • 53
Motion Attribution for Video Generation

Paper • 2601.08828 • Published Jan 13 • 72
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published Jan 27 • 27

about 17 hours ago

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning

Paper • 2506.22434 • Published Jun 27, 2025 • 10
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning

Paper • 2507.13348 • Published Jul 17, 2025 • 79
RewardDance: Reward Scaling in Visual Generation

Paper • 2509.08826 • Published Sep 10, 2025 • 73
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Paper • 2510.18876 • Published Oct 21, 2025 • 37

CoLLM: A Large Language Model for Composed Image Retrieval

Paper • 2503.19910 • Published Mar 25, 2025 • 15
LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing

Paper • 2503.21541 • Published Mar 27, 2025 • 1
HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration

Paper • 2504.03536 • Published Apr 4, 2025 • 13
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis

Paper • 2504.04842 • Published Apr 7, 2025 • 35

about 18 hours ago

BrushEdit: All-In-One Image Inpainting and Editing

Paper • 2412.10316 • Published Dec 13, 2024 • 37
ColorFlow: Retrieval-Augmented Image Sequence Colorization

Paper • 2412.11815 • Published Dec 16, 2024 • 27
FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers

Paper • 2412.09611 • Published Dec 12, 2024 • 11
FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing

Paper • 2412.07517 • Published Dec 10, 2024 • 11

Contrastive Decoding Improves Reasoning in Large Language Models

Paper • 2309.09117 • Published Sep 17, 2023 • 40
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Paper • 2310.08491 • Published Oct 12, 2023 • 57
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding

Paper • 2411.04282 • Published Nov 6, 2024 • 37
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 25

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs