EleutherAI org

Tokenizer important details :
Bytelevel() pretokenizer
BPE algorithm with vocabsize=102400 including added tokens and special tokens
training corpus : korean, english, code

hac541309 changed pull request status to merged

Sign up or log in to comment