train_sst2_456_1768397598

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

Loss: 0.0826
Num Input Tokens Seen: 30591040

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.0007	0.5000	15154	0.1241	1530624
0.1653	1.0000	30308	0.1001	3061264
0.0001	1.5000	45462	0.1004	4592688
0.002	2.0001	60616	0.0826	6121056
0.0016	2.5001	75770	0.0927	7651360
0.0003	3.0001	90924	0.0885	9179584
0.0005	3.5001	106078	0.0986	10707168
0.1854	4.0001	121232	0.0835	12236912
0.0008	4.5001	136386	0.0984	13765568
0.0013	5.0002	151540	0.1004	15295536
0.0009	5.5002	166694	0.1026	16829808
0.5641	6.0002	181848	0.1018	18356448
0.0017	6.5002	197002	0.1006	19882880
0.0002	7.0002	212156	0.1187	21413584
0.0003	7.5002	227310	0.1153	22943200
0.0002	8.0003	242464	0.1163	24473632
0.3504	8.5003	257618	0.1170	26002800
0.0001	9.0003	272772	0.1145	27533936
0.0001	9.5003	287926	0.1172	29061488

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.1+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 3

Model tree for rbelanec/train_sst2_456_1768397598

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2367)

this model