-
MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training
Paper • 2501.07556 • Published • 7 -
MINIMA: Modality Invariant Image Matching
Paper • 2412.19412 • Published • 4 -
OmniGlue: Generalizable Feature Matching with Foundation Model Guidance
Paper • 2405.12979 • Published • 12
Collections
Discover the best community collections!
Collections including paper arxiv:2501.07556
-
DepthMaster: Taming Diffusion Models for Monocular Depth Estimation
Paper • 2501.02576 • Published • 15 -
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion
Paper • 2412.09626 • Published • 21 -
Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion
Paper • 2412.13389 • Published • 7 -
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up
Paper • 2412.16112 • Published • 23
-
SuperPoint: Self-Supervised Interest Point Detection and Description
Paper • 1712.07629 • Published -
D2-Net: A Trainable CNN for Joint Detection and Description of Local Features
Paper • 1905.03561 • Published -
R2D2: Repeatable and Reliable Detector and Descriptor
Paper • 1906.06195 • Published -
SuperGlue: Learning Feature Matching with Graph Neural Networks
Paper • 1911.11763 • Published
-
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Paper • 2501.04001 • Published • 47 -
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Paper • 2501.03895 • Published • 52 -
An Empirical Study of Autoregressive Pre-training from Videos
Paper • 2501.05453 • Published • 41 -
MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training
Paper • 2501.07556 • Published • 7
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training
Paper • 2501.07556 • Published • 7 -
MINIMA: Modality Invariant Image Matching
Paper • 2412.19412 • Published • 4 -
OmniGlue: Generalizable Feature Matching with Foundation Model Guidance
Paper • 2405.12979 • Published • 12
-
SuperPoint: Self-Supervised Interest Point Detection and Description
Paper • 1712.07629 • Published -
D2-Net: A Trainable CNN for Joint Detection and Description of Local Features
Paper • 1905.03561 • Published -
R2D2: Repeatable and Reliable Detector and Descriptor
Paper • 1906.06195 • Published -
SuperGlue: Learning Feature Matching with Graph Neural Networks
Paper • 1911.11763 • Published
-
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Paper • 2501.04001 • Published • 47 -
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Paper • 2501.03895 • Published • 52 -
An Empirical Study of Autoregressive Pre-training from Videos
Paper • 2501.05453 • Published • 41 -
MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training
Paper • 2501.07556 • Published • 7
-
DepthMaster: Taming Diffusion Models for Monocular Depth Estimation
Paper • 2501.02576 • Published • 15 -
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion
Paper • 2412.09626 • Published • 21 -
Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion
Paper • 2412.13389 • Published • 7 -
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up
Paper • 2412.16112 • Published • 23
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23