ray.rllib.core.learner.learner_group.LearnerGroup.update#

LearnerGroup.update(batch: ray.rllib.policy.sample_batch.MultiAgentBatch, *, minibatch_size: Optional[int] = None, num_iters: int = 1, reduce_fn: Optional[Callable[[List[Mapping[str, Any]]], dict]] = <function _reduce_mean_results>) Union[Mapping[str, Any], List[Mapping[str, Any]]][source]#

Do one or more gradient based updates to the Learner(s) based on given data.

Parameters
  • batch – The data batch to use for the update.

  • minibatch_size – The minibatch size to use for the update.

  • num_iters – The number of complete passes over all the sub-batches in the input multi-agent batch.

  • reduce_fn – An optional callable to reduce the results from a list of the Learner actors into a single result. This can be any arbitrary function that takes a list of dictionaries and returns a single dictionary. For example you can either take an average (default) or concatenate the results (for example for metrics) or be more selective about you want to report back to the algorithm’s training_step. If None is passed, the results will not get reduced.

Returns

A dictionary with the reduced results of the updates from the Learner(s) or a list of dictionaries of results from the updates from the Learner(s).