-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 17 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 43
Collections
Discover the best community collections!
Collections including paper arxiv:2401.00908
-
Language models are weak learners
Paper • 2306.14101 • Published • 10 -
Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence
Paper • 2306.07075 • Published • 10 -
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT
Paper • 2307.08674 • Published • 48 -
Nougat: Neural Optical Understanding for Academic Documents
Paper • 2308.13418 • Published • 41
-
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Paper • 2403.12895 • Published • 32 -
microsoft/layoutlm-base-uncased
0.1B • Updated • 114k • 61 -
microsoft/layoutlmv3-base
0.1B • Updated • 927k • 473 -
naver-clova-ix/donut-base-finetuned-docvqa
Document Question Answering • Updated • 11.7k • 271
-
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 54 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
DocFormerv2: Local Features for Document Understanding
Paper • 2306.01733 • Published • 1
-
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 54 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 81 -
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 40 -
Context-Aware Meta-Learning
Paper • 2310.10971 • Published • 17
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 17 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 43
-
Language models are weak learners
Paper • 2306.14101 • Published • 10 -
Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence
Paper • 2306.07075 • Published • 10 -
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT
Paper • 2307.08674 • Published • 48 -
Nougat: Neural Optical Understanding for Academic Documents
Paper • 2308.13418 • Published • 41
-
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Paper • 2403.12895 • Published • 32 -
microsoft/layoutlm-base-uncased
0.1B • Updated • 114k • 61 -
microsoft/layoutlmv3-base
0.1B • Updated • 927k • 473 -
naver-clova-ix/donut-base-finetuned-docvqa
Document Question Answering • Updated • 11.7k • 271
-
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 54 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
DocFormerv2: Local Features for Document Understanding
Paper • 2306.01733 • Published • 1
-
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 54 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 81 -
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 40 -
Context-Aware Meta-Learning
Paper • 2310.10971 • Published • 17