In Donut Where the output of swin diffused with the text->1.At the starting of Bart encoder,2. cross attention(K,V from swin,Q from attention) of second attention of Bart encoder,3.directly the decoder part of BART

shubham05 · August 2, 2023, 8:28am

is it the same architecture AS follows

is it trained or test in same manner as follows

Topic		Replies	Views
DistilBERT for Donut Decoder 🤗Transformers	0	217	March 29, 2023
Finetuing BART in SQuAD 🤗Transformers	0	218	September 10, 2022
Bert2bert translator? 🤗Transformers	6	62	August 28, 2025
Torchscript with Encoder-Decoder architecture Intermediate	0	309	October 11, 2021
Encoder Decoder Embedding layer shared in BartModel code 🤗Transformers	1	358	September 20, 2023