Vintern-1B-v2-ViTable-docvqa

Report Link👁️

Vintern-1B-v2-ViTable-docvqa is a fine-tuned version of the 5CD-AI/Vintern-1B-v2 multimodal model for the Vietnamese DocVQA (Table data)

Benchmarks

Model	ANLS	Semantic Similarity	MLLM-as-judge (Gemini)
Gemini 1.5 Flash	0.35	0.56	0.40
Vintern-1B-v2	0.04	0.45	0.50
Vintern-1B-v2-ViTable-docvqa	0.50	0.71	0.59

Usage

Check out this 🤗 HF Demo, or you can open it in Colab:

Citation:

@misc{doan2024vintern1befficientmultimodallarge,
      title={Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese}, 
      author={Khang T. Doan and Bao G. Huynh and Dung T. Hoang and Thuc D. Pham and Nhat H. Pham and Quan T. M. Nguyen and Bang Q. Vo and Suong N. Hoang},
      year={2024},
      eprint={2408.12480},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2408.12480}, 
}

Downloads last month: 24

Safetensors

Model size

0.9B params

Tensor type

BF16

Model tree for YuukiAsuna/Vintern-1B-v2-ViTable-docvqa

Base model

OpenGVLab/InternVL2-1B

Finetuned

5CD-AI/Vintern-1B-v2

Finetuned

(2)

this model

Dataset used to train YuukiAsuna/Vintern-1B-v2-ViTable-docvqa

Space using YuukiAsuna/Vintern-1B-v2-ViTable-docvqa 1

Paper for YuukiAsuna/Vintern-1B-v2-ViTable-docvqa

Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese

Paper • 2408.12480 • Published Aug 22, 2024 • 27