Hello,
I had a question when going through the implementation of BERT for question answering.
In L1876 in the implementation of BertForQuestionAnswering the loss is calculated.
The variable ignored_index should be the sequence length, as it is used to clamp the start_positions and end_positions in the previous lines.
I was wondering why the ignore_index is set to ignored_index (==the sequence length) instead of leaving it at the default (-100)?