ray.train.report#

ray.train.report(metrics: Dict, *, checkpoint: Optional[ray.air.checkpoint.Checkpoint] = None) None[source]#

Report metrics and optionally save a checkpoint.

Each invocation of this method will automatically increment the underlying iteration number. The physical meaning of this “iteration” is defined by user (or more specifically the way they call report). It does not necessarily map to one epoch.

This API is the canonical way to report metrics from Tune and Train, and replaces the legacy tune.report, with tune.checkpoint_dir, train.report and train.save_checkpoint calls.

Note on directory checkpoints: AIR will take ownership of checkpoints passed to report() by moving them to a new path. The original directory will no longer be accessible to the caller after the report call.

Example

import tensorflow as tf

from ray import train
from ray.train import Checkpoint, ScalingConfig
from ray.train.tensorflow import TensorflowTrainer

######## Using it in the *per worker* train loop (TrainSession) #######
def train_func():
    model = tf.keras.applications.resnet50.ResNet50()
    model.save("my_model", overwrite=True)
    train.report(
        metrics={"foo": "bar"},
        checkpoint=Checkpoint.from_directory("my_model")
    )
    # Air guarantees by this point, you can safely write new stuff to
    # "my_model" directory.

scaling_config = ScalingConfig(num_workers=2)
trainer = TensorflowTrainer(
    train_loop_per_worker=train_func, scaling_config=scaling_config
)
result = trainer.fit()
# If you navigate to result.checkpoint's path, you will find the
# content of ``model.save()`` under it.
# If you have `SyncConfig` configured, the content should also
# show up in the corresponding cloud storage path.
Parameters
  • metrics – The metrics you want to report.

  • checkpoint – The optional checkpoint you want to report.

PublicAPI (beta): This API is in beta and may change before becoming stable.