Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2511.04570

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 211
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data

Paper • 2511.12609 • Published Nov 16, 2025 • 103
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

Paper • 2511.02779 • Published Nov 4, 2025 • 58

One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models

Paper • 2511.10629 • Published Nov 13, 2025 • 124
PAN: A World Model for General, Interactable, and Long-Horizon World Simulation

Paper • 2511.09057 • Published Nov 12, 2025 • 76
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 211

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 211
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Paper • 2511.06221 • Published Nov 9, 2025 • 132

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 211

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 23
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 151
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 211

When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

Paper • 2511.02779 • Published Nov 4, 2025 • 58
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains

Paper • 2511.04962 • Published Nov 7, 2025 • 54
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 211

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 211
V-Thinker: Interactive Thinking with Images

Paper • 2511.04460 • Published Nov 6, 2025 • 97
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning

Paper • 2511.01833 • Published Nov 3, 2025 • 15
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning

Paper • 2510.27492 • Published Oct 30, 2025 • 82

Video Understanding

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Paper • 2510.20579 • Published Oct 23, 2025 • 55
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 211

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 23
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 151
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 211
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data

Paper • 2511.12609 • Published Nov 16, 2025 • 103
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

Paper • 2511.02779 • Published Nov 4, 2025 • 58

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 211

One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models

Paper • 2511.10629 • Published Nov 13, 2025 • 124
PAN: A World Model for General, Interactable, and Long-Horizon World Simulation

Paper • 2511.09057 • Published Nov 12, 2025 • 76
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 211

When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

Paper • 2511.02779 • Published Nov 4, 2025 • 58
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains

Paper • 2511.04962 • Published Nov 7, 2025 • 54
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 211

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 211
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Paper • 2511.06221 • Published Nov 9, 2025 • 132

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 211
V-Thinker: Interactive Thinking with Images

Paper • 2511.04460 • Published Nov 6, 2025 • 97
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning

Paper • 2511.01833 • Published Nov 3, 2025 • 15
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning

Paper • 2510.27492 • Published Oct 30, 2025 • 82

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 211

Video Understanding

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Paper • 2510.20579 • Published Oct 23, 2025 • 55
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 211

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs