Encoder Decoder Models¶
This class can wrap an encoder model, such as BertModel and a decoder modeling with a language modeling head, such as BertForMaskedLM into a encoder-decoder model.
The EncoderDecoderModel class allows to instantiate a encoder decoder model using the from_encoder_decoder_pretrain class method taking a pretrained encoder and pretrained decoder model as an input.
The EncoderDecoderModel is saved using the standard save_pretrained() method and can also again be loaded using the standard from_pretrained() method.
An application of this architecture could be summarization using two pretrained Bert models as is shown in the paper: Text Summarization with Pretrained Encoders by Yang Liu and Mirella Lapata.
EncoderDecoderConfig¶
-
class
transformers.EncoderDecoderConfig(**kwargs)[source]¶ EncoderDecoderConfigis the configuration class to store the configuration of a EncoderDecoderModel.It is used to instantiate an Encoder Decoder model according to the specified arguments, defining the encoder and decoder configs. Configuration objects inherit from
PretrainedConfigand can be used to control the model outputs. See the documentation forPretrainedConfigfor more information.- Parameters
kwargs (optional) –
- Remaining dictionary of keyword arguments. Notably:
- encoder (
PretrainedConfig, optional, defaults to None): An instance of a configuration object that defines the encoder config.
- decoder (
PretrainedConfig, optional, defaults to None): An instance of a configuration object that defines the decoder config.
- encoder (
Example:
>>> from transformers import BertConfig, EncoderDecoderConfig, EncoderDecoderModel >>> # Initializing a BERT bert-base-uncased style configuration >>> config_encoder = BertConfig() >>> config_decoder = BertConfig() >>> config = EncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder) >>> # Initializing a Bert2Bert model from the bert-base-uncased style configurations >>> model = EncoderDecoderModel(config=config) >>> # Accessing the model configuration >>> config_encoder = model.config.encoder >>> config_decoder = model.config.decoder
-
classmethod
from_encoder_decoder_configs(encoder_config: transformers.configuration_utils.PretrainedConfig, decoder_config: transformers.configuration_utils.PretrainedConfig) → transformers.configuration_utils.PretrainedConfig[source]¶ Instantiate a
EncoderDecoderConfig(or a derived class) from a pre-trained encoder model configuration and decoder model configuration.- Returns
An instance of a configuration object
- Return type
EncoderDecoderModel¶
-
class
transformers.EncoderDecoderModel(config: Optional[transformers.configuration_utils.PretrainedConfig] = None, encoder: Optional[transformers.modeling_utils.PreTrainedModel] = None, decoder: Optional[transformers.modeling_utils.PreTrainedModel] = None)[source]¶ EncoderDecoderis a generic model class that will be instantiated as a transformer architecture with one of the base model classes of the library as encoder and another one as decoder when created with the AutoModel.from_pretrained(pretrained_model_name_or_path) class method for the encoder and AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path) class method for the decoder.-
config_class¶ alias of
transformers.configuration_encoder_decoder.EncoderDecoderConfig
-
forward(input_ids=None, inputs_embeds=None, attention_mask=None, head_mask=None, encoder_outputs=None, decoder_input_ids=None, decoder_attention_mask=None, decoder_head_mask=None, decoder_inputs_embeds=None, labels=None, **kwargs)[source]¶ - Parameters
input_ids (
torch.LongTensorof shape(batch_size, sequence_length)) – Indices of input sequence tokens in the vocabulary for the encoder. Indices can be obtained usingtransformers.PretrainedTokenizer. Seetransformers.PreTrainedTokenizer.encode()andtransformers.PreTrainedTokenizer.convert_tokens_to_ids()for details.inputs_embeds (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional, defaults toNone) – Optionally, instead of passinginput_idsyou can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.attention_mask (
torch.FloatTensorof shape(batch_size, sequence_length), optional, defaults toNone) – Mask to avoid performing attention on padding token indices for the encoder. Mask values selected in[0, 1]:1for tokens that are NOT MASKED,0for MASKED tokens.head_mask – (
torch.FloatTensorof shape(num_heads,)or(num_layers, num_heads), optional, defaults toNone): Mask to nullify selected heads of the self-attention modules for the encoder. Mask values selected in[0, 1]:1indicates the head is not masked,0indicates the head is masked.encoder_outputs (
tuple(tuple(torch.FloatTensor), optional, defaults toNone) – Tuple consists of (last_hidden_state, optional: hidden_states, optional: attentions) last_hidden_state of shape(batch_size, sequence_length, hidden_size), optional, defaults toNone) is a sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder.decoder_input_ids (
torch.LongTensorof shape(batch_size, target_sequence_length), optional, defaults toNone) – Provide for sequence to sequence training to the decoder. Indices can be obtained usingtransformers.PretrainedTokenizer. Seetransformers.PreTrainedTokenizer.encode()andtransformers.PreTrainedTokenizer.convert_tokens_to_ids()for details.decoder_attention_mask (
torch.BoolTensorof shape(batch_size, tgt_seq_len), optional, defaults toNone) – Default behavior: generate a tensor that ignores pad tokens in decoder_input_ids. Causal mask will also be used by default.decoder_head_mask – (
torch.FloatTensorof shape(num_heads,)or(num_layers, num_heads), optional, defaults toNone): Mask to nullify selected heads of the self-attention modules for the decoder. Mask values selected in[0, 1]:1indicates the head is not masked,0indicates the head is masked.decoder_inputs_embeds (
torch.FloatTensorof shape(batch_size, target_sequence_length, hidden_size), optional, defaults toNone) – Optionally, instead of passingdecoder_input_idsyou can choose to directly pass an embedded representation. This is useful if you want more control over how to convert decoder_input_ids indices into associated vectors than the model’s internal embedding lookup matrix.labels (
torch.LongTensorof shape(batch_size, sequence_length), optional, defaults toNone) – Labels for computing the masked language modeling loss for the decoder. Indices should be in[-100, 0, ..., config.vocab_size](seeinput_idsdocstring) Tokens with indices set to-100are ignored (masked), the loss is only computed for the tokens with labels in[0, ..., config.vocab_size]kwargs – (optional) Remaining dictionary of keyword arguments. Keyword arguments come in two flavors: - Without a prefix which will be input as **encoder_kwargs for the encoder forward function. - With a decoder_ prefix which will be input as **decoder_kwargs for the decoder forward function.
Examples:
>>> from transformers import EncoderDecoderModel, BertTokenizer >>> import torch >>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') >>> model = EncoderDecoderModel.from_encoder_decoder_pretrained('bert-base-uncased', 'bert-base-uncased') # initialize Bert2Bert >>> # forward >>> input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1 >>> outputs = model(input_ids=input_ids, decoder_input_ids=input_ids) >>> # training >>> loss, outputs = model(input_ids=input_ids, decoder_input_ids=input_ids, labels=input_ids)[:2] >>> # generation >>> generated = model.generate(input_ids, decoder_start_token_id=model.config.decoder.pad_token_id)
-
classmethod
from_encoder_decoder_pretrained(encoder_pretrained_model_name_or_path: str = None, decoder_pretrained_model_name_or_path: str = None, *model_args, **kwargs) → transformers.modeling_utils.PreTrainedModel[source]¶ Instantiates an encoder and a decoder from one or two base classes of the library from pre-trained model checkpoints.
The model is set in evaluation mode by default using model.eval() (Dropout modules are deactivated). To train the model, you need to first set it back in training mode with model.train().
- Params:
- encoder_pretrained_model_name_or_path (:obj: str, optional, defaults to None):
information necessary to initiate the encoder. Either:
a string with the shortcut name of a pre-trained model to load from cache or download, e.g.:
bert-base-uncased.a string with the identifier name of a pre-trained model that was user-uploaded to our S3, e.g.:
dbmdz/bert-base-german-cased.a path to a directory containing model weights saved using
save_pretrained(), e.g.:./my_model_directory/encoder.a path or url to a tensorflow index checkpoint file (e.g. ./tf_model/model.ckpt.index). In this case,
from_tfshould be set to True and a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- decoder_pretrained_model_name_or_path (:obj: str, optional, defaults to None):
information necessary to initiate the decoder. Either:
a string with the shortcut name of a pre-trained model to load from cache or download, e.g.:
bert-base-uncased.a string with the identifier name of a pre-trained model that was user-uploaded to our S3, e.g.:
dbmdz/bert-base-german-cased.a path to a directory containing model weights saved using
save_pretrained(), e.g.:./my_model_directory/decoder.a path or url to a tensorflow index checkpoint file (e.g. ./tf_model/model.ckpt.index). In this case,
from_tfshould be set to True and a configuration object should be provided asconfigargument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
- model_args: (optional) Sequence of positional arguments:
All remaning positional arguments will be passed to the underlying model’s
__init__method- kwargs: (optional) Remaining dictionary of keyword arguments.
Can be used to update the configuration object (after it being loaded) and initiate the model. (e.g.
output_attention=True). Behave differently depending on whether a config is provided or automatically loaded:
Examples:
>>> from transformers import EncoderDecoderModel >>> model = EncoderDecoderModel.from_encoder_decoder_pretrained('bert-base-uncased', 'bert-base-uncased') # initialize Bert2Bert
-
get_input_embeddings()[source]¶ Returns the model’s input embeddings.
- Returns
A torch module mapping vocabulary to hidden states.
- Return type
nn.Module
-