TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Baichuan Zhou
bczhou
AI & ML interests
Computer Vision
Organizations
models 8
bczhou/tiny-llava-v1-hf
Image-Text-to-Text • 1B • Updated • 861 • 57
bczhou/TinyLLaVA-2.0B
Image-Text-to-Text • 2B • Updated • 143 • 6
bczhou/TinyLLaVA-1.5B
Image-Text-to-Text • 2B • Updated • 254 • 19
bczhou/TinyLLaVA-3.1B-Pretrain
Text Generation • 3B • Updated • 23
bczhou/TinyLLaVA-3.1B
Text Generation • 3B • Updated • 154 • 27
bczhou/TinyLLaVA-2.0B-SigLIP
0.4B • Updated • 57 • 1
bczhou/TinyLLaVA-1.5B-SigLIP
0.4B • Updated • 1.84k • 1
bczhou/TinyLLaVA-3.1B-SigLIP
0.4B • Updated • 91 • 4
datasets 7
bczhou/UrBench
Viewer • Updated • 11.6k • 420 • 4
bczhou/LOKI
Preview • Updated • 95 • 5
bczhou/CityBench-SubTasks
Viewer • Updated • 12.8k • 6
bczhou/SyntheticBench-Videos
Viewer • Updated • 264 • 6
bczhou/CityBench-v0.3
Viewer • Updated • 9.71k • 6
bczhou/CityBench-v0.2
Viewer • Updated • 9.71k • 4
bczhou/CityVQA-v0.2
Viewer • Updated • 2.5k • 5 • 1