ray.rllib.policy.torch_policy_v2.TorchPolicyV2.learn_on_batch_from_replay_buffer
ray.rllib.policy.torch_policy_v2.TorchPolicyV2.learn_on_batch_from_replay_buffer#
- TorchPolicyV2.learn_on_batch_from_replay_buffer(replay_actor: ray.actor.ActorHandle, policy_id: str) Dict[str, Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor]]#
Samples a batch from given replay actor and performs an update.
- Parameters
replay_actor – The replay buffer actor to sample from.
policy_id – The ID of this policy.
- Returns
Dictionary of extra metadata from
compute_gradients().