codelion commited on
Commit
dd65546
·
verified ·
1 Parent(s): a61f958

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -6
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
  language:
3
  - en
4
- license: mit
5
  tags:
6
  - text-generation
7
  - gpt2
@@ -34,6 +34,10 @@ model-index:
34
  - name: Average
35
  type: accuracy
36
  value: 38.15
 
 
 
 
37
  ---
38
 
39
  # GPT-2 70M - Optimal Dataset Mixing
@@ -126,9 +130,9 @@ print(tokenizer.decode(outputs[0]))
126
  If you use this model, please cite:
127
 
128
  ```bibtex
129
- @model{gpt2-70m-optimal-mixing,
130
- title={GPT-2 70M: Optimal Dataset Mixing for Efficient Pretraining},
131
- author={CodeLion},
132
  year={2025},
133
  url={https://huggingface.co/codelion/gpt-2-70m}
134
  }
@@ -136,8 +140,8 @@ If you use this model, please cite:
136
 
137
  ## Model Card Authors
138
 
139
- CodeLion
140
 
141
  ## Model Card Contact
142
 
143
- For questions or issues, please open an issue on the model repository.
 
1
  ---
2
  language:
3
  - en
4
+ license: apache-2.0
5
  tags:
6
  - text-generation
7
  - gpt2
 
34
  - name: Average
35
  type: accuracy
36
  value: 38.15
37
+ datasets:
38
+ - codelion/finepdfs-1B
39
+ - codelion/dclm-baseline-1B
40
+ - codelion/fineweb-edu-1B
41
  ---
42
 
43
  # GPT-2 70M - Optimal Dataset Mixing
 
130
  If you use this model, please cite:
131
 
132
  ```bibtex
133
+ @article{gpt2-70m-optimal-mixing,
134
+ title={Optimal Pre-training Dataset Composition for Language Models: A Systematic Study of Dataset Mixing Strategies},
135
+ author={codelion},
136
  year={2025},
137
  url={https://huggingface.co/codelion/gpt-2-70m}
138
  }
 
140
 
141
  ## Model Card Authors
142
 
143
+ codelion
144
 
145
  ## Model Card Contact
146
 
147
+ For questions or issues, please open an issue on the model repository.