Update README.md
Browse files
README.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
---
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
-
license:
|
| 5 |
tags:
|
| 6 |
- text-generation
|
| 7 |
- gpt2
|
|
@@ -34,6 +34,10 @@ model-index:
|
|
| 34 |
- name: Average
|
| 35 |
type: accuracy
|
| 36 |
value: 38.15
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
---
|
| 38 |
|
| 39 |
# GPT-2 70M - Optimal Dataset Mixing
|
|
@@ -126,9 +130,9 @@ print(tokenizer.decode(outputs[0]))
|
|
| 126 |
If you use this model, please cite:
|
| 127 |
|
| 128 |
```bibtex
|
| 129 |
-
@
|
| 130 |
-
title={
|
| 131 |
-
author={
|
| 132 |
year={2025},
|
| 133 |
url={https://huggingface.co/codelion/gpt-2-70m}
|
| 134 |
}
|
|
@@ -136,8 +140,8 @@ If you use this model, please cite:
|
|
| 136 |
|
| 137 |
## Model Card Authors
|
| 138 |
|
| 139 |
-
|
| 140 |
|
| 141 |
## Model Card Contact
|
| 142 |
|
| 143 |
-
For questions or issues, please open an issue on the model repository.
|
|
|
|
| 1 |
---
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
+
license: apache-2.0
|
| 5 |
tags:
|
| 6 |
- text-generation
|
| 7 |
- gpt2
|
|
|
|
| 34 |
- name: Average
|
| 35 |
type: accuracy
|
| 36 |
value: 38.15
|
| 37 |
+
datasets:
|
| 38 |
+
- codelion/finepdfs-1B
|
| 39 |
+
- codelion/dclm-baseline-1B
|
| 40 |
+
- codelion/fineweb-edu-1B
|
| 41 |
---
|
| 42 |
|
| 43 |
# GPT-2 70M - Optimal Dataset Mixing
|
|
|
|
| 130 |
If you use this model, please cite:
|
| 131 |
|
| 132 |
```bibtex
|
| 133 |
+
@article{gpt2-70m-optimal-mixing,
|
| 134 |
+
title={Optimal Pre-training Dataset Composition for Language Models: A Systematic Study of Dataset Mixing Strategies},
|
| 135 |
+
author={codelion},
|
| 136 |
year={2025},
|
| 137 |
url={https://huggingface.co/codelion/gpt-2-70m}
|
| 138 |
}
|
|
|
|
| 140 |
|
| 141 |
## Model Card Authors
|
| 142 |
|
| 143 |
+
codelion
|
| 144 |
|
| 145 |
## Model Card Contact
|
| 146 |
|
| 147 |
+
For questions or issues, please open an issue on the model repository.
|