add first tokenizer
#1
by
hac541309
- opened
Tokenizer important details :
Bytelevel() pretokenizer
BPE algorithm with vocabsize=102400 including added tokens and special tokens
training corpus : korean, english, code
hac541309
changed pull request status to
merged