Utilities for Generation¶

This page lists all the utility functions used by generate(), greedy_search(), sample(), beam_search(), and beam_sample().

Most of those are only useful if you are studying the code of the generate methods in the library.

LogitsProcessor¶

A LogitsProcessor can be used to modify the prediction scores of a language model head for generation.

class transformers.LogitsProcessor[source]¶

Abstract base class for all logit processors that can be applied during generation.

__call__(input_ids: torch.LongTensor, scores: torch.FloatTensor) → torch.FloatTensor[source]¶

Args:

input_ids (torch.LongTensor of shape (batch_size, sequence_length)):
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using BertTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?

scores (torch.FloatTensor of shape (batch_size, config.vocab_size)):
Prediction scores of a language modeling head. These can be scores for each vocabulary token before SoftMax or scores for each vocabulary token after SoftMax.

Return:
torch.FloatTensor of shape (batch_size, config.vocab_size): The processed prediction scores.

Torch method for processing logits.

class transformers.LogitsProcessorList[source]¶

This class can be used to create a list of LogitsProcessor or LogitsWarper to subsequently process a scores input tensor. This class inherits from list and adds a specific __call__ method to apply each LogitsProcessor or LogitsProcessor to the inputs.

__call__(input_ids: torch.LongTensor, scores: torch.FloatTensor) → torch.FloatTensor[source]¶

Parameters

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using BertTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?
scores (torch.FloatTensor of shape (batch_size, config.vocab_size)) – Prediction scores of a language modeling head. These can be scores for each vocabulary token before SoftMax or scores for each vocabulary token after SoftMax.

Returns

The processed prediction scores.

Return type

torch.FloatTensor of shape (batch_size, config.vocab_size)

class transformers.MinLengthLogitsProcessor(min_length: int, eos_token_id: int)[source]¶

transformers.LogitsProcessor enforcing a min-length by setting EOS probability to 0.

Parameters

min_length (int) – The minimum length below which the score of eos_token_id is set to -float("Inf").
eos_token_id (int) – The id of the end-of-sequence token.

__call__(input_ids: torch.LongTensor, scores: torch.FloatTensor) → torch.FloatTensor[source]¶

Parameters

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using BertTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?
scores – Prediction scores of a language modeling head. These can be scores for each vocabulary token before SoftMax or scores for each vocabulary token after SoftMax.

Torch method for processing logits.

class transformers.TemperatureLogitsWarper(temperature: float)[source]¶

transformers.LogitsWarper for temperature (exponential scaling output probability distribution).

Parameters: temperature (float) – The value used to module the logits distribution.

__call__(input_ids: torch.Tensor, scores: torch.Tensor) → torch.Tensor[source]¶

Parameters

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using BertTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?
scores – Prediction scores of a language modeling head. These can be scores for each vocabulary token before SoftMax or scores for each vocabulary token after SoftMax.

Torch method for warping logits.

class transformers.RepetitionPenaltyLogitsProcessor(penalty: float)[source]¶

transformers.LogitsProcessor enforcing an exponential penalty on repeated sequences.

Parameters: repetition_penalty (float) – The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details.

__call__(input_ids: torch.LongTensor, scores: torch.FloatTensor) → torch.FloatTensor[source]¶

Parameters

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using BertTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?
scores – Prediction scores of a language modeling head. These can be scores for each vocabulary token before SoftMax or scores for each vocabulary token after SoftMax.

Torch method for processing logits.

class transformers.TopPLogitsWarper(top_p: float, filter_value: float = - inf, min_tokens_to_keep: int = 1)[source]¶

transformers.LogitsWarper that performs top-p, i.e. restricting to top tokens summing to prob_cut_off <= prob_cut_off.

Parameters

top_p (float) – If set to < 1, only the most probable tokens with probabilities that add up to top_p or higher are kept for generation.
filter_value (float, optional, defaults to -float("Inf")) – All filtered values will be set to this float value.
min_tokens_to_keep (int, optional, defaults to 1) – Minimum number of tokens that cannot be filtered.

__call__(input_ids: torch.LongTensor, scores: torch.FloatTensor) → torch.FloatTensor[source]¶

Parameters

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using BertTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?
scores – Prediction scores of a language modeling head. These can be scores for each vocabulary token before SoftMax or scores for each vocabulary token after SoftMax.

Torch method for warping logits.

class transformers.TopKLogitsWarper(top_k: int, filter_value: float = - inf, min_tokens_to_keep: int = 1)[source]¶

transformers.LogitsWarper that performs top-k, i.e. restricting to the k highest probability elements.

Parameters

top_k (int) – The number of highest probability vocabulary tokens to keep for top-k-filtering.
filter_value (float, optional, defaults to -float("Inf")) – All filtered values will be set to this float value.
min_tokens_to_keep (int, optional, defaults to 1) – Minimum number of tokens that cannot be filtered.

__call__(input_ids: torch.LongTensor, scores: torch.FloatTensor) → torch.FloatTensor[source]¶

Parameters

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using BertTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?
scores – Prediction scores of a language modeling head. These can be scores for each vocabulary token before SoftMax or scores for each vocabulary token after SoftMax.

Torch method for warping logits.

class transformers.NoRepeatNGramLogitsProcessor(ngram_size: int)[source]¶

transformers.LogitsProcessor that enforces no repetition of n-grams. See Fairseq.

Parameters: ngram_size (int) – All ngrams of size ngram_size can only occur once.

__call__(input_ids: torch.LongTensor, scores: torch.FloatTensor) → torch.FloatTensor[source]¶

Parameters

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using BertTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?
scores – Prediction scores of a language modeling head. These can be scores for each vocabulary token before SoftMax or scores for each vocabulary token after SoftMax.

Torch method for processing logits.

class transformers.NoBadWordsLogitsProcessor(bad_words_ids: Iterable[Iterable[int]], eos_token_id: int)[source]¶

transformers.LogitsProcessor that enforces that specified sequences will never be sampled.

Parameters

bad_words_ids (List[List[int]]) – List of list of token ids that are not allowed to be generated. In order to get the tokens of the words that should not appear in the generated text, use tokenizer(bad_word, add_prefix_space=True).input_ids.
eos_token_id (int) – The id of the end-of-sequence token.

__call__(input_ids: torch.LongTensor, scores: torch.FloatTensor) → torch.FloatTensor[source]¶

Parameters

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using BertTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?
scores – Prediction scores of a language modeling head. These can be scores for each vocabulary token before SoftMax or scores for each vocabulary token after SoftMax.

Torch method for processing logits.

BeamSearch¶

class transformers.BeamScorer[source]¶

Abstract base class for all beam scorers that are used for beam_search() and beam_sample().

abstract finalize(input_ids: torch.LongTensor, next_scores: torch.FloatTensor, next_tokens: torch.LongTensor, next_indices: torch.LongTensor, **kwargs) → torch.LongTensor[source]¶

Parameters

input_ids (torch.LongTensor of shape (batch_size * num_beams, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using any class inheriting from PretrainedTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?
final_beam_scores (torch.FloatTensor of shape (batch_size * num_beams)) – The final scores of all non-finished beams.
final_beam_tokens (torch.FloatTensor of shape (batch_size * num_beams)) – The last tokens to be added to the non-finished beam_hypotheses.
final_beam_indices (torch.FloatTensor of shape (batch_size * num_beams)) – The beam indices indicating to which beam the final_beam_tokens shall be added.
pad_token_id (int, optional) – The id of the padding token.
eos_token_id (int, optional) – The id of the end-of-sequence token.

Returns

The generated sequences. The second dimension (sequence_length) is either equal to max_length or shorter if all batches finished early due to the eos_token_id.

Return type

torch.LongTensor of shape (batch_size * num_return_sequences, sequence_length)

abstract process(input_ids: torch.LongTensor, next_scores: torch.FloatTensor, next_tokens: torch.LongTensor, next_indices: torch.LongTensor, **kwargs) → Tuple[torch.Tensor][source]¶

Parameters

input_ids (torch.LongTensor of shape (batch_size * num_beams, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using any class inheriting from PretrainedTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?
next_scores (torch.FloatTensor of shape (batch_size, 2 * num_beams)) – Current scores of the top 2 * num_beams non-finished beam hypotheses.
next_tokens (torch.LongTensor of shape (batch_size, 2 * num_beams)) – input_ids of the tokens corresponding to the top 2 * num_beams non-finished beam hypotheses.
next_indices (torch.LongTensor of shape (batch_size, 2 * num_beams)) – Beam indices indicating to which beam hypothesis the next_tokens correspond.
pad_token_id (int, optional) – The id of the padding token.
eos_token_id (int, optional) – The id of the end-of-sequence token.

Returns

A dictionary composed of the fields as defined above:

next_beam_scores (torch.FloatTensor of shape (batch_size * num_beams)) – Updated scores of all non-finished beams.

next_beam_tokens (torch.FloatTensor of shape (batch_size * num_beams)) – Next tokens to be added to the non-finished beam_hypotheses.

next_beam_indices (torch.FloatTensor of shape (batch_size * num_beams)) – Beam indices indicating to which beam the next tokens shall be added.

Return type

UserDict

class transformers.BeamSearchScorer(batch_size: int, max_length: int, num_beams: int, device: torch.device, length_penalty: Optional[float] = 1.0, do_early_stopping: Optional[bool] = False, num_beam_hyps_to_keep: Optional[int] = 1)[source]¶

transformers.BeamScorer implementing standard beam search decoding.

Adapted in part from Facebook’s XLM beam search code.

Parameters

batch_size (int) – Batch Size of input_ids for which beam search decoding is run in parallel.
max_length (int) – The maximum length of the sequence to be generated.
num_beams (int) – Number of beams for beam search.
device (torch.device) – Defines the device type (e.g., "cpu" or "cuda") on which this instance of BeamSearchScorer will be allocated.
length_penalty (float, optional, defaults to 1.0) – Exponential penalty to the length. 1.0 means no penalty. Set to values < 1.0 in order to encourage the model to generate shorter sequences, to a value > 1.0 in order to encourage the model to produce longer sequences.
do_early_stopping (bool, optional, defaults to False) – Whether to stop the beam search when at least num_beams sentences are finished per batch or not.
num_beam_hyps_to_keep (int, optional, defaults to 1) – The number of beam hypotheses that shall be returned upon calling finalize().

finalize(input_ids: torch.LongTensor, final_beam_scores: torch.FloatTensor, final_beam_tokens: torch.LongTensor, final_beam_indices: torch.LongTensor, pad_token_id: Optional[int] = None, eos_token_id: Optional[int] = None) → torch.LongTensor[source]¶

Parameters

input_ids (torch.LongTensor of shape (batch_size * num_beams, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using any class inheriting from PretrainedTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?
final_beam_scores (torch.FloatTensor of shape (batch_size * num_beams)) – The final scores of all non-finished beams.
final_beam_tokens (torch.FloatTensor of shape (batch_size * num_beams)) – The last tokens to be added to the non-finished beam_hypotheses.
final_beam_indices (torch.FloatTensor of shape (batch_size * num_beams)) – The beam indices indicating to which beam the final_beam_tokens shall be added.
pad_token_id (int, optional) – The id of the padding token.
eos_token_id (int, optional) – The id of the end-of-sequence token.

Returns

The generated sequences. The second dimension (sequence_length) is either equal to max_length or shorter if all batches finished early due to the eos_token_id.

Return type

torch.LongTensor of shape (batch_size * num_return_sequences, sequence_length)

process(input_ids: torch.LongTensor, next_scores: torch.FloatTensor, next_tokens: torch.LongTensor, next_indices: torch.LongTensor, pad_token_id: Optional[int] = None, eos_token_id: Optional[int] = None) → Tuple[torch.Tensor][source]¶

Parameters

input_ids (torch.LongTensor of shape (batch_size * num_beams, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using any class inheriting from PretrainedTokenizer. See transformers.PreTrainedTokenizer.encode() and transformers.PreTrainedTokenizer.__call__() for details.

What are input IDs?
next_scores (torch.FloatTensor of shape (batch_size, 2 * num_beams)) – Current scores of the top 2 * num_beams non-finished beam hypotheses.
next_tokens (torch.LongTensor of shape (batch_size, 2 * num_beams)) – input_ids of the tokens corresponding to the top 2 * num_beams non-finished beam hypotheses.
next_indices (torch.LongTensor of shape (batch_size, 2 * num_beams)) – Beam indices indicating to which beam hypothesis the next_tokens correspond.
pad_token_id (int, optional) – The id of the padding token.
eos_token_id (int, optional) – The id of the end-of-sequence token.

Returns

A dictionary composed of the fields as defined above:

next_beam_scores (torch.FloatTensor of shape (batch_size * num_beams)) – Updated scores of all non-finished beams.

next_beam_tokens (torch.FloatTensor of shape (batch_size * num_beams)) – Next tokens to be added to the non-finished beam_hypotheses.

next_beam_indices (torch.FloatTensor of shape (batch_size * num_beams)) – Beam indices indicating to which beam the next tokens shall be added.

Return type

UserDict