Log losses/metrics with CustomTrainer(Trainer) class in the same frequency as Trainer, with wandb

John6666 · August 1, 2025, 2:49pm

Adding callbacks are probably the cleanest method. Implementing it in other ways may cause problems if the library version is updated and the behavior changes.

from transformers import TrainerCallback

class MyLogger(TrainerCallback):
    def __init__(self):
...
    def on_log(self, args, state, control, logs=None, **kwargs):
...

trainer = CustomTrainer(
    ...,
    callbacks=[MyLogger()],
)

github.com/huggingface/transformers

MlFlow log artefacts

opened 08:33AM - 24 Mar 21 UTC

closed 03:02PM - 01 May 21 UTC

dmilcevski

## Environment info - `transformers` version: 4.4.2 - Platform: Darwin-20.…3.0-x86_64-i386-64bit - Python version: 3.7.4 - PyTorch version (GPU?): 1.3.1 (False) - Tensorflow version (GPU?): not installed (NA) - Using GPU in script?: No - Using distributed or parallel set-up in script?: No ### Who can help @sgugger ## Information Model I am using (Bert, XLNet ...): Bert The problem arises when using: * [x] the official example scripts: (give details below) * [ ] my own modified scripts: (give details below) The tasks I am working on is: * [x] an official GLUE/SQUaD task: NER * [ ] my own task or dataset: (give details below) ## To reproduce The bug is for the PR #8016. Steps to reproduce the behavior: 1. MlFlow installed and the following env variables exported ``` export HF_MLFLOW_LOG_ARTIFACTS=TRUE export MLFLOW_S3_ENDPOINT_URL=<custom endpont> export MLFLOW_TRACKING_URI=<custom uri> export MLFLOW_TRACKING_TOKEN=<custom token> ``` 2. Run the token classification example with the following command ``` python run_ner.py \ --model_name_or_path bert-base-uncased \ --dataset_name conll2003 \ --output_dir /tmp/test-ner \ --do_train \ --do_eval ``` ## Expected behavior When the training finishes, before the evaluation is performed, the `integrations.MLflowCallback` executes the method `on_train_end`, where if the env variable `HF_MLFLOW_LOG_ARTIFACTS` is set to `TRUE`, it logs the model artifacts to mlflow. The problem is, however, when the method `on_train_end` is called and the following line is executed: `self._ml_flow.log_artifacts(args.output_dir)`, the model is not stored on the `args.output_dir`. The model artefacts are stored once the `trainer.save_model()` is called, which is after the training ending. There is no callback in the `trainer.save_model()` that can be called from a `TrainerCallback` to save the model. There is a method `TrainierCallback.on_save()` method, that is called `trainer._maybe_log_save_evaluate()`, but even then the model is not available on the `output_dir`. Possible solutions would be to extend the `TrainierCallback` with `on_model_save()` callback method, insert the callback in the `trainer.save_model()`. Or, a workaround I have now is to change `on_train_end ` with `on_evaluate` in `integrations.MLflowCallback`, that is called after the model is saved in the example script. However, this is not the right solution since it depends on having set the `do_eval` parameter, and it is not semantically correct.

Topic		Replies	Views
Trainer log my custom metrics at training step Beginners	3	4605	July 11, 2024
Metrics for Training Set in Trainer 🤗Transformers	11	27448	March 14, 2025
Logging training accuracy using Trainer class 🤗Transformers	8	10599	December 2, 2021
Trainer doesn't show the loss at each step 🤗Transformers	20	36198	May 9, 2024
How to make the Trainer log custom quantities? 🤗Transformers	0	581	May 31, 2023

Log losses/metrics with CustomTrainer(Trainer) class in the same frequency as Trainer, with wandb

Related topics