Universal Image Restoration Pre-training via Degradation Classification Paper • 2501.15510 • Published Jan 26 • 1
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration Paper • 2510.10395 • Published Oct 12 • 29
Universal Image Restoration Pre-training via Masked Degradation Classification Paper • 2510.13282 • Published Oct 15 • 10
UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing Paper • 2507.23278 • Published Jul 31 • 1
RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark Paper • 2509.24897 • Published Sep 29 • 46
VQ-Logits: Compressing the Output Bottleneck of Large Language Models via Vector Quantized Logits Paper • 2505.10202 • Published May 15
Power-Law Decay Loss for Large Language Model Finetuning: A Theory Perspective Paper • 2505.16900 • Published May 22
ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention Paper • 2505.10222 • Published May 15
Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective Paper • 2505.17997 • Published May 23
MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems Paper • 2505.18943 • Published May 25 • 24
Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction Paper • 2502.17239 • Published Feb 24 • 3
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies Paper • 2503.14324 • Published Mar 18 • 2
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning Paper • 2503.19470 • Published Mar 25 • 19
Master: Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic Style Transfer Paper • 2304.11818 • Published Apr 24, 2023
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse Paper • 2503.16365 • Published Mar 20 • 40
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse Paper • 2503.16365 • Published Mar 20 • 40
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface Paper • 2503.01342 • Published Mar 3 • 8
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface Paper • 2503.01342 • Published Mar 3 • 8
Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning Paper • 2410.12952 • Published Oct 16, 2024