codelion
/

gpt-2-70m

Text Generation

Eval Results (legacy)

Model card Files Files and versions

codelion commited on Nov 1, 2025

Commit

dd65546

·

verified ·

1 Parent(s): a61f958

Update README.md

Files changed (1) hide show

README.md +10 -6

README.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 language:
 - en
-license: mit
 tags:
 - text-generation
 - gpt2
@@ -34,6 +34,10 @@ model-index:
     - name: Average
       type: accuracy
       value: 38.15
 ---
 # GPT-2 70M - Optimal Dataset Mixing
@@ -126,9 +130,9 @@ print(tokenizer.decode(outputs[0]))
 If you use this model, please cite:
 ```bibtex
-@model{gpt2-70m-optimal-mixing,
-  title={GPT-2 70M: Optimal Dataset Mixing for Efficient Pretraining},
-  author={CodeLion},
   year={2025},
   url={https://huggingface.co/codelion/gpt-2-70m}
 }
@@ -136,8 +140,8 @@ If you use this model, please cite:
 ## Model Card Authors
-CodeLion
 ## Model Card Contact
-For questions or issues, please open an issue on the model repository.

 ---
 language:
 - en
+license: apache-2.0
 tags:
 - text-generation
 - gpt2
     - name: Average
       type: accuracy
       value: 38.15
+datasets:
+- codelion/finepdfs-1B
+- codelion/dclm-baseline-1B
+- codelion/fineweb-edu-1B
 ---
 # GPT-2 70M - Optimal Dataset Mixing
 If you use this model, please cite:
 ```bibtex
+@article{gpt2-70m-optimal-mixing,
+  title={Optimal Pre-training Dataset Composition for Language Models: A Systematic Study of Dataset Mixing Strategies},
+  author={codelion},
   year={2025},
   url={https://huggingface.co/codelion/gpt-2-70m}
 }
 ## Model Card Authors
+codelion
 ## Model Card Contact
+For questions or issues, please open an issue on the model repository.