train_sst2_789_1768397606

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0869
  • Num Input Tokens Seen: 30585184

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 789
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.0038 0.5000 15154 0.1144 1527088
0.0012 1.0000 30308 0.0997 3057616
0.0017 1.5000 45462 0.0869 4591072
0.4087 2.0001 60616 0.0934 6117360
0.0716 2.5001 75770 0.0916 7645184
0.0003 3.0001 90924 0.0902 9176176
0.0006 3.5001 106078 0.0974 10705536
0.1604 4.0001 121232 0.0891 12235008
0.1575 4.5001 136386 0.1000 13765232
0.0021 5.0002 151540 0.0998 15292592
0.0004 5.5002 166694 0.1003 16821584
0.2077 6.0002 181848 0.0990 18350176
0.0856 6.5002 197002 0.1071 19882144
0.0006 7.0002 212156 0.1131 21409968
0.0005 7.5002 227310 0.1138 22938912
0.0007 8.0003 242464 0.1120 24469056
0.0002 8.5003 257618 0.1160 25996544
0.2535 9.0003 272772 0.1172 27527296
0.1831 9.5003 287926 0.1156 29060528

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.1+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_sst2_789_1768397606

Adapter
(2367)
this model