Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2503.05860

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13 • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14 • 18
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published Aug 19 • 48

GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

Paper • 2411.18499 • Published Nov 27, 2024 • 18
VLSBench: Unveiling Visual Leakage in Multimodal Safety

Paper • 2411.19939 • Published Nov 29, 2024 • 10
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?

Paper • 2412.02611 • Published Dec 3, 2024 • 26
U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs

Paper • 2412.03205 • Published Dec 4, 2024 • 18

Code Generation

CodeEditorBench: Evaluating Code Editing Capability of Large Language Models

Paper • 2404.03543 • Published Apr 4, 2024 • 18
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

Paper • 2406.11931 • Published Jun 17, 2024 • 67
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

Paper • 2407.18901 • Published Jul 26, 2024 • 35
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents

Paper • 2408.07060 • Published Aug 13, 2024 • 42

RuCCoD: Towards Automated ICD Coding in Russian

Paper • 2502.21263 • Published Feb 28 • 133
Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published Mar 7 • 122
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7 • 46
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

Paper • 2503.05592 • Published Mar 7 • 27

LinFusion: 1 GPU, 1 Minute, 16K Image

Paper • 2409.02097 • Published Sep 3, 2024 • 34
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

Paper • 2409.11406 • Published Sep 17, 2024 • 27
Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27, 2024 • 126
Segment Anything with Multiple Modalities

Paper • 2408.09085 • Published Aug 17, 2024 • 22

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 241
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Paper • 2311.16502 • Published Nov 27, 2023 • 37
BLINK: Multimodal Large Language Models Can See but Not Perceive

Paper • 2404.12390 • Published Apr 18, 2024 • 26
RULER: What's the Real Context Size of Your Long-Context Language Models?

Paper • 2404.06654 • Published Apr 9, 2024 • 39

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13 • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14 • 18
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published Aug 19 • 48

RuCCoD: Towards Automated ICD Coding in Russian

Paper • 2502.21263 • Published Feb 28 • 133
Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published Mar 7 • 122
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7 • 46
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

Paper • 2503.05592 • Published Mar 7 • 27

GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

Paper • 2411.18499 • Published Nov 27, 2024 • 18
VLSBench: Unveiling Visual Leakage in Multimodal Safety

Paper • 2411.19939 • Published Nov 29, 2024 • 10
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?

Paper • 2412.02611 • Published Dec 3, 2024 • 26
U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs

Paper • 2412.03205 • Published Dec 4, 2024 • 18

LinFusion: 1 GPU, 1 Minute, 16K Image

Paper • 2409.02097 • Published Sep 3, 2024 • 34
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

Paper • 2409.11406 • Published Sep 17, 2024 • 27
Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27, 2024 • 126
Segment Anything with Multiple Modalities

Paper • 2408.09085 • Published Aug 17, 2024 • 22

Code Generation

CodeEditorBench: Evaluating Code Editing Capability of Large Language Models

Paper • 2404.03543 • Published Apr 4, 2024 • 18
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

Paper • 2406.11931 • Published Jun 17, 2024 • 67
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

Paper • 2407.18901 • Published Jul 26, 2024 • 35
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents

Paper • 2408.07060 • Published Aug 13, 2024 • 42

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 241
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Paper • 2311.16502 • Published Nov 27, 2023 • 37
BLINK: Multimodal Large Language Models Can See but Not Perceive

Paper • 2404.12390 • Published Apr 18, 2024 • 26
RULER: What's the Real Context Size of Your Long-Context Language Models?

Paper • 2404.06654 • Published Apr 9, 2024 • 39

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs