ray.serve.deployment
ray.serve.deployment#
- ray.serve.deployment(_func_or_class: Optional[Callable] = None, name: Union[ray.serve._private.utils.DEFAULT, str] = DEFAULT.VALUE, version: Union[ray.serve._private.utils.DEFAULT, str] = DEFAULT.VALUE, num_replicas: Optional[Union[ray.serve._private.utils.DEFAULT, int]] = DEFAULT.VALUE, init_args: Union[ray.serve._private.utils.DEFAULT, Tuple[Any]] = DEFAULT.VALUE, init_kwargs: Union[ray.serve._private.utils.DEFAULT, Dict[Any, Any]] = DEFAULT.VALUE, route_prefix: Optional[Union[ray.serve._private.utils.DEFAULT, str]] = DEFAULT.VALUE, ray_actor_options: Union[ray.serve._private.utils.DEFAULT, Dict] = DEFAULT.VALUE, user_config: Optional[Union[ray.serve._private.utils.DEFAULT, Any]] = DEFAULT.VALUE, max_concurrent_queries: Union[ray.serve._private.utils.DEFAULT, int] = DEFAULT.VALUE, autoscaling_config: Optional[Union[ray.serve._private.utils.DEFAULT, Dict, ray.serve.config.AutoscalingConfig]] = DEFAULT.VALUE, graceful_shutdown_wait_loop_s: Union[ray.serve._private.utils.DEFAULT, float] = DEFAULT.VALUE, graceful_shutdown_timeout_s: Union[ray.serve._private.utils.DEFAULT, float] = DEFAULT.VALUE, health_check_period_s: Union[ray.serve._private.utils.DEFAULT, float] = DEFAULT.VALUE, health_check_timeout_s: Union[ray.serve._private.utils.DEFAULT, float] = DEFAULT.VALUE, is_driver_deployment: Optional[bool] = DEFAULT.VALUE) Callable[[Callable], ray.serve.deployment.Deployment][source]#
Decorator that converts a Python class to a
Deployment.Example:
from ray import serve @serve.deployment(num_replicas=2) class MyDeployment: pass app = MyDeployment.bind()
- Parameters
name – Name uniquely identifying this deployment within the application. If not provided, the name of the class or function is used.
num_replicas – Number of replicas to run that handle requests to this deployment. Defaults to 1.
autoscaling_config – Parameters to configure autoscaling behavior. If this is set,
num_replicascannot be set.init_args – [DEPRECATED] These should be passed to
bind()instead.init_kwargs – [DEPRECATED] These should be passed to
bind()instead.route_prefix – Requests to paths under this HTTP path prefix are routed to this deployment. Defaults to ‘/’. This can only be set for the ingress (top-level) deployment of an application.
ray_actor_options – Options to pass to the Ray Actor decorator, such as resource requirements. Valid options are:
accelerator_type,memory,num_cpus,num_gpus,object_store_memory,resources, andruntime_env.user_config – Config to pass to the reconfigure method of the deployment. This can be updated dynamically without restarting the replicas of the deployment. The user_config must be fully JSON-serializable.
max_concurrent_queries – Maximum number of queries that are sent to a replica of this deployment without receiving a response. Defaults to 100.
health_check_period_s – Duration between health check calls for the replica. Defaults to 10s. The health check is by default a no-op Actor call to the replica, but you can define your own health check using the “check_health” method in your deployment that raises an exception when unhealthy.
health_check_timeout_s – Duration in seconds, that replicas wait for a health check method to return before considering it as failed. Defaults to 30s.
graceful_shutdown_wait_loop_s – Duration that replicas wait until there is no more work to be done before shutting down. Defaults to 2s.
graceful_shutdown_timeout_s – Duration to wait for a replica to gracefully shut down before being forcefully killed. Defaults to 20s.
is_driver_deployment – [EXPERIMENTAL] when set, exactly one replica of this deployment runs on every node (like a daemon set).
- Returns
PublicAPI (beta): This API is in beta and may change before becoming stable.