ray.data.preprocessors.Chain#

class ray.data.preprocessors.Chain(*preprocessors: ray.data.preprocessor.Preprocessor)[source]#

Bases: ray.data.preprocessor.Preprocessor

Combine multiple preprocessors into a single Preprocessor.

When you call fit, each preprocessor is fit on the dataset produced by the preceeding preprocessor’s fit_transform.

Example

>>> import pandas as pd
>>> import ray
>>> from ray.data.preprocessors import *
>>>
>>> df = pd.DataFrame({
...     "X0": [0, 1, 2],
...     "X1": [3, 4, 5],
...     "Y": ["orange", "blue", "orange"],
... })
>>> ds = ray.data.from_pandas(df)  
>>>
>>> preprocessor = Chain(
...     StandardScaler(columns=["X0", "X1"]),
...     Concatenator(include=["X0", "X1"], output_column_name="X"),
...     LabelEncoder(label_column="Y")
... )
>>> preprocessor.fit_transform(ds).to_pandas()  
   Y                                         X
0  1  [-1.224744871391589, -1.224744871391589]
1  0                                [0.0, 0.0]
2  1    [1.224744871391589, 1.224744871391589]
Parameters

preprocessors – The preprocessors to sequentially compose.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

Methods

fit(ds)

Fit this Preprocessor to the Dataset.

preferred_batch_format()

Batch format hint for upstream producers to try yielding best block format.

transform(ds)

Transform the given dataset.

transform_batch(data)

Transform a single batch of data.

transform_stats()

Return Dataset stats for the most recent transform call, if any.