If you like our project, please give us a star ⭐ on Github for the latest update.

🌏 Model Zoo

Model Name	Visual Encoder	Language Decoder	# Training Frames
VideoRefer-7B	siglip-so400m-patch14-384	Qwen2-7B-Instruct	16
VideoRefer-7B-stage2	siglip-so400m-patch14-384	Qwen2-7B-Instruct	16
VideoRefer-7B-stage2.5	siglip-so400m-patch14-384	Qwen2-7B-Instruct	16

📑 Citation

If you find VideoRefer Suite useful for your research and applications, please cite using this BibTeX:

@article{yuan2024videorefersuite,
  title = {VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM},
  author = {Yuqian Yuan, Hang Zhang, Wentong Li, Zesen Cheng, Boqiang Zhang, Long Li, Xin Li, Deli Zhao, Wenqiao Zhang, Yueting Zhuang, Jianke Zhu, Lidong Bing},
  journal={arXiv},
  year={2024},
  url = {}
}

Downloads last month: 35

Safetensors

Model size

8B params

Tensor type

F16

Model tree for DAMO-NLP-SG/VideoRefer-7B

Finetunes

1 model

Collection including DAMO-NLP-SG/VideoRefer-7B

VideoRefer

Collection

9 items • Updated Jun 26, 2025 • 3

Paper for DAMO-NLP-SG/VideoRefer-7B

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Paper • 2406.07476 • Published Jun 11, 2024 • 36