ray.rllib.policy.Policy.learn_on_batch
ray.rllib.policy.Policy.learn_on_batch#
- Policy.learn_on_batch(samples: ray.rllib.policy.sample_batch.SampleBatch) Dict[str, Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor]][source]#
Perform one learning update, given
samples.Either this method or the combination of
compute_gradientsandapply_gradientsmust be implemented by subclasses.- Parameters
samples – The SampleBatch object to learn from.
- Returns
Dictionary of extra metadata from
compute_gradients().
Examples
>>> policy, sample_batch = ... >>> policy.learn_on_batch(sample_batch)