VideoRefer
Collection
9 items β’ Updated β’ 3
How to use DAMO-NLP-SG/VideoRefer-7B with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("visual-question-answering", model="DAMO-NLP-SG/VideoRefer-7B") # Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("DAMO-NLP-SG/VideoRefer-7B", dtype="auto")
| Model Name | Visual Encoder | Language Decoder | # Training Frames |
|---|---|---|---|
| VideoRefer-7B | siglip-so400m-patch14-384 | Qwen2-7B-Instruct | 16 |
| VideoRefer-7B-stage2 | siglip-so400m-patch14-384 | Qwen2-7B-Instruct | 16 |
| VideoRefer-7B-stage2.5 | siglip-so400m-patch14-384 | Qwen2-7B-Instruct | 16 |
If you find VideoRefer Suite useful for your research and applications, please cite using this BibTeX:
@article{yuan2024videorefersuite,
title = {VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM},
author = {Yuqian Yuan, Hang Zhang, Wentong Li, Zesen Cheng, Boqiang Zhang, Long Li, Xin Li, Deli Zhao, Wenqiao Zhang, Yueting Zhuang, Jianke Zhu, Lidong Bing},
journal={arXiv},
year={2024},
url = {}
}