ray.rllib.utils.exploration.ornstein_uhlenbeck_noise.OrnsteinUhlenbeckNoise#

class ray.rllib.utils.exploration.ornstein_uhlenbeck_noise.OrnsteinUhlenbeckNoise(action_space, *, framework: str, ou_theta: float = 0.15, ou_sigma: float = 0.2, ou_base_scale: float = 0.1, random_timesteps: int = 1000, initial_scale: float = 1.0, final_scale: float = 0.02, scale_timesteps: int = 10000, scale_schedule: Optional[ray.rllib.utils.schedules.schedule.Schedule] = None, **kwargs)[source]#

Bases: ray.rllib.utils.exploration.gaussian_noise.GaussianNoise

An exploration that adds Ornstein-Uhlenbeck noise to continuous actions.

If explore=True, returns sampled actions plus a noise term X, which changes according to this formula: Xt+1 = -theta*Xt + sigma*N[0,stddev], where theta, sigma and stddev are constants. Also, some completely random period is possible at the beginning. If explore=False, returns the deterministic action.

Methods

__init__(action_space, *, framework[, ...])

Initializes an Ornstein-Uhlenbeck Exploration object.

before_compute_actions(*[, timestep, ...])

Hook for preparations before policy.compute_actions() is called.

get_exploration_optimizer(optimizers)

May add optimizer(s) to the Policy's own optimizers.

get_state([sess])

Returns the current scale value.

on_episode_end(policy, *[, environment, ...])

Handles necessary exploration logic at the end of an episode.

on_episode_start(policy, *[, environment, ...])

Handles necessary exploration logic at the beginning of an episode.

postprocess_trajectory(policy, sample_batch)

Handles post-processing of done episode trajectories.