train_sst2_789_1768397606

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

Loss: 0.0869
Num Input Tokens Seen: 30585184

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.0038	0.5000	15154	0.1144	1527088
0.0012	1.0000	30308	0.0997	3057616
0.0017	1.5000	45462	0.0869	4591072
0.4087	2.0001	60616	0.0934	6117360
0.0716	2.5001	75770	0.0916	7645184
0.0003	3.0001	90924	0.0902	9176176
0.0006	3.5001	106078	0.0974	10705536
0.1604	4.0001	121232	0.0891	12235008
0.1575	4.5001	136386	0.1000	13765232
0.0021	5.0002	151540	0.0998	15292592
0.0004	5.5002	166694	0.1003	16821584
0.2077	6.0002	181848	0.0990	18350176
0.0856	6.5002	197002	0.1071	19882144
0.0006	7.0002	212156	0.1131	21409968
0.0005	7.5002	227310	0.1138	22938912
0.0007	8.0003	242464	0.1120	24469056
0.0002	8.5003	257618	0.1160	25996544
0.2535	9.0003	272772	0.1172	27527296
0.1831	9.5003	287926	0.1156	29060528

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.1+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 11

Model tree for rbelanec/train_sst2_789_1768397606

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2367)

this model