YuukiAsuna/VietnameseTableVQA
Viewer • Updated • 19.6k • 315 • 3
How to use YuukiAsuna/Vintern-1B-v2-ViTable-docvqa with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("document-question-answering", model="YuukiAsuna/Vintern-1B-v2-ViTable-docvqa", trust_remote_code=True) # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("YuukiAsuna/Vintern-1B-v2-ViTable-docvqa", trust_remote_code=True, dtype="auto")Vintern-1B-v2-ViTable-docvqa is a fine-tuned version of the 5CD-AI/Vintern-1B-v2 multimodal model for the Vietnamese DocVQA (Table data)
| Model | ANLS | Semantic Similarity | MLLM-as-judge (Gemini) |
|---|---|---|---|
| Gemini 1.5 Flash | 0.35 | 0.56 | 0.40 |
| Vintern-1B-v2 | 0.04 | 0.45 | 0.50 |
| Vintern-1B-v2-ViTable-docvqa | 0.50 | 0.71 | 0.59 |
Check out this 🤗 HF Demo, or you can open it in Colab:
Citation:
@misc{doan2024vintern1befficientmultimodallarge,
title={Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese},
author={Khang T. Doan and Bao G. Huynh and Dung T. Hoang and Thuc D. Pham and Nhat H. Pham and Quan T. M. Nguyen and Bang Q. Vo and Suong N. Hoang},
year={2024},
eprint={2408.12480},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2408.12480},
}