Serving an Object Detection Model
Serving an Object Detection Model#
This example runs an object detection application with Ray Serve.
To run this example, install the following:
pip install "ray[serve]" requests torch
This example uses the ultralytics/yolov5 model and FastAPI. Save the following code to a file named object_detection.py.
Use the following Serve code:
import torch
from PIL import Image
import numpy as np
from io import BytesIO
from fastapi.responses import Response
from fastapi import FastAPI
from ray import serve
app = FastAPI()
@serve.deployment(num_replicas=1, route_prefix="/")
@serve.ingress(app)
class APIIngress:
def __init__(self, object_detection_handle) -> None:
self.handle = object_detection_handle
@app.get(
"/detect",
responses={200: {"content": {"image/jpeg": {}}}},
response_class=Response,
)
async def detect(self, image_url: str):
image_ref = await self.handle.detect.remote(image_url)
image = await image_ref
file_stream = BytesIO()
image.save(file_stream, "jpeg")
return Response(content=file_stream.getvalue(), media_type="image/jpeg")
@serve.deployment(
ray_actor_options={"num_gpus": 1},
autoscaling_config={"min_replicas": 1, "max_replicas": 2},
)
class ObjectDetection:
def __init__(self):
self.model = torch.hub.load("ultralytics/yolov5", "yolov5s")
self.model.cuda()
def detect(self, image_url: str):
result_im = self.model(image_url)
return Image.fromarray(result_im.render()[0].astype(np.uint8))
entrypoint = APIIngress.bind(ObjectDetection.bind())
Use serve run object_detection:entrypoint to start the serve application.
Note
The autoscaling config sets min_replicas to 0, which means the deployment starts with no ObjectDetection replicas. These replicas spawn only when a request arrives. After a period where no requests arrive, Serve downscales ObjectDetection back to 0 replicas to save GPU resources.
You should see the following logs:
(ServeReplica:ObjectDection pid=4747) warnings.warn(
(ServeReplica:ObjectDection pid=4747) Downloading: "https://github.com/ultralytics/yolov5/zipball/master" to /home/ray/.cache/torch/hub/master.zip
(ServeReplica:ObjectDection pid=4747) YOLOv5 🚀 2023-3-8 Python-3.9.16 torch-1.13.0+cu116 CUDA:0 (Tesla T4, 15110MiB)
(ServeReplica:ObjectDection pid=4747)
(ServeReplica:ObjectDection pid=4747) Fusing layers...
(ServeReplica:ObjectDection pid=4747) YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
(ServeReplica:ObjectDection pid=4747) Adding AutoShape...
2023-03-08 21:10:21,685 SUCC <string>:93 -- Deployed Serve app successfully.
Use the following code to send requests:
import requests
image_url = "https://ultralytics.com/images/zidane.jpg"
resp = requests.get(f"http://127.0.0.1:8000/detect?image_url={image_url}")
with open("output.jpeg", 'wb') as f:
f.write(resp.content)
The output.png file is saved locally. Check it out!
