ray.serve.deployment#

ray.serve.deployment(_func_or_class: Optional[Callable] = None, name: Union[ray.serve._private.utils.DEFAULT, str] = DEFAULT.VALUE, version: Union[ray.serve._private.utils.DEFAULT, str] = DEFAULT.VALUE, num_replicas: Optional[Union[ray.serve._private.utils.DEFAULT, int]] = DEFAULT.VALUE, init_args: Union[ray.serve._private.utils.DEFAULT, Tuple[Any]] = DEFAULT.VALUE, init_kwargs: Union[ray.serve._private.utils.DEFAULT, Dict[Any, Any]] = DEFAULT.VALUE, route_prefix: Optional[Union[ray.serve._private.utils.DEFAULT, str]] = DEFAULT.VALUE, ray_actor_options: Union[ray.serve._private.utils.DEFAULT, Dict] = DEFAULT.VALUE, user_config: Optional[Union[ray.serve._private.utils.DEFAULT, Any]] = DEFAULT.VALUE, max_concurrent_queries: Union[ray.serve._private.utils.DEFAULT, int] = DEFAULT.VALUE, autoscaling_config: Optional[Union[ray.serve._private.utils.DEFAULT, Dict, ray.serve.config.AutoscalingConfig]] = DEFAULT.VALUE, graceful_shutdown_wait_loop_s: Union[ray.serve._private.utils.DEFAULT, float] = DEFAULT.VALUE, graceful_shutdown_timeout_s: Union[ray.serve._private.utils.DEFAULT, float] = DEFAULT.VALUE, health_check_period_s: Union[ray.serve._private.utils.DEFAULT, float] = DEFAULT.VALUE, health_check_timeout_s: Union[ray.serve._private.utils.DEFAULT, float] = DEFAULT.VALUE, is_driver_deployment: Optional[bool] = DEFAULT.VALUE) Callable[[Callable], ray.serve.deployment.Deployment][source]#

Decorator that converts a Python class to a Deployment.

Example:

from ray import serve

@serve.deployment(num_replicas=2)
class MyDeployment:
    pass

app = MyDeployment.bind()
Parameters
  • name – Name uniquely identifying this deployment within the application. If not provided, the name of the class or function is used.

  • num_replicas – Number of replicas to run that handle requests to this deployment. Defaults to 1.

  • autoscaling_config – Parameters to configure autoscaling behavior. If this is set, num_replicas cannot be set.

  • init_args – [DEPRECATED] These should be passed to bind() instead.

  • init_kwargs – [DEPRECATED] These should be passed to bind() instead.

  • route_prefix – Requests to paths under this HTTP path prefix are routed to this deployment. Defaults to ‘/’. This can only be set for the ingress (top-level) deployment of an application.

  • ray_actor_options – Options to pass to the Ray Actor decorator, such as resource requirements. Valid options are: accelerator_type, memory, num_cpus, num_gpus, object_store_memory, resources, and runtime_env.

  • user_config – Config to pass to the reconfigure method of the deployment. This can be updated dynamically without restarting the replicas of the deployment. The user_config must be fully JSON-serializable.

  • max_concurrent_queries – Maximum number of queries that are sent to a replica of this deployment without receiving a response. Defaults to 100.

  • health_check_period_s – Duration between health check calls for the replica. Defaults to 10s. The health check is by default a no-op Actor call to the replica, but you can define your own health check using the “check_health” method in your deployment that raises an exception when unhealthy.

  • health_check_timeout_s – Duration in seconds, that replicas wait for a health check method to return before considering it as failed. Defaults to 30s.

  • graceful_shutdown_wait_loop_s – Duration that replicas wait until there is no more work to be done before shutting down. Defaults to 2s.

  • graceful_shutdown_timeout_s – Duration to wait for a replica to gracefully shut down before being forcefully killed. Defaults to 20s.

  • is_driver_deployment – [EXPERIMENTAL] when set, exactly one replica of this deployment runs on every node (like a daemon set).

Returns

Deployment

PublicAPI (beta): This API is in beta and may change before becoming stable.