Models to accompany research paper on training multi token prediction language models using self-distillation.