gpt-rope-swiglu

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0001

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 15

Training results

Training Loss Epoch Step Validation Loss
0.1967 0.1870 500 0.0657
0.0081 0.3740 1000 0.0042
0.0038 0.5610 1500 0.0016
0.0018 0.7479 2000 0.0011
0.0011 0.9349 2500 0.0006
0.0007 1.1219 3000 0.0005
0.0006 1.3089 3500 0.0005
0.0004 1.4959 4000 0.0004
0.0023 1.6829 4500 0.0011
0.0005 1.8699 5000 0.0005
0.0002 2.0568 5500 0.0006
0.0069 2.2438 6000 0.0019
0.0004 2.4308 6500 0.0004
0.0003 2.6178 7000 0.0002
0.0001 2.8048 7500 0.0003
0.0003 2.9918 8000 0.0004
0.0001 3.1788 8500 0.0001
0.0001 3.3657 9000 0.0003
0.0001 3.5527 9500 0.0003
0.0 3.7397 10000 0.0001
0.0051 3.9267 10500 0.0007
0.0008 4.1137 11000 0.0003
0.0001 4.3007 11500 0.0003
0.0 4.4877 12000 0.0002
0.0 4.6746 12500 0.0002
0.0 4.8616 13000 0.0002
0.0 5.0486 13500 0.0002
0.0 5.2356 14000 0.0002
0.0 5.4226 14500 0.0002
0.0 5.6096 15000 0.0002
0.0 5.7966 15500 0.0002
0.0 5.9835 16000 0.0001
0.0 6.1705 16500 0.0001
0.0 6.3575 17000 0.0001
0.0 6.5445 17500 0.0001
0.0 6.7315 18000 0.0001
0.0 6.9185 18500 0.0001
0.0 7.1055 19000 0.0001
0.0 7.2924 19500 0.0001
0.0 7.4794 20000 0.0001
0.0 7.6664 20500 0.0001
0.0 7.8534 21000 0.0001
0.0 8.0404 21500 0.0001
0.0 8.2274 22000 0.0001
0.0 8.4144 22500 0.0001
0.0 8.6013 23000 0.0001
0.0 8.7883 23500 0.0001
0.0 8.9753 24000 0.0001
0.0 9.1623 24500 0.0001
0.0 9.3493 25000 0.0001
0.0 9.5363 25500 0.0001
0.0 9.7233 26000 0.0001
0.0 9.9102 26500 0.0001
0.0 10.0972 27000 0.0001
0.0 10.2842 27500 0.0001
0.0 10.4712 28000 0.0001
0.0 10.6582 28500 0.0001
0.0 10.8452 29000 0.0001
0.0 11.0322 29500 0.0001
0.0 11.2191 30000 0.0001
0.0 11.4061 30500 0.0001
0.0 11.5931 31000 0.0001
0.0 11.7801 31500 0.0001
0.0 11.9671 32000 0.0001
0.0 12.1541 32500 0.0001
0.0 12.3411 33000 0.0001
0.0 12.5280 33500 0.0001
0.0 12.7150 34000 0.0001
0.0 12.9020 34500 0.0001
0.0 13.0890 35000 0.0001
0.0 13.2760 35500 0.0001
0.0 13.4630 36000 0.0001
0.0 13.6500 36500 0.0001
0.0 13.8369 37000 0.0001
0.0 14.0239 37500 0.0001
0.0 14.2109 38000 0.0001
0.0 14.3979 38500 0.0001
0.0 14.5849 39000 0.0001
0.0 14.7719 39500 0.0001
0.0 14.9589 40000 0.0001

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0
  • Datasets 4.0.0
  • Tokenizers 0.22.1
Downloads last month
1
Safetensors
Model size
7.88M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support