DataIterator API
DataIterator API#
- class ray.data.DataIterator[source]#
An iterator for reading records from a
DatasetorDatasetPipeline.For Datasets, each iteration call represents a complete read of all items in the Dataset. For DatasetPipelines, each iteration call represents one pass (epoch) over the base Dataset. Note that for DatasetPipelines, each pass iterates over the original Dataset, instead of a window (if
.window()was used).If using Ray AIR, each trainer actor should get its own iterator by calling
ray.train.get_dataset_shard("train").Examples
>>> import ray >>> ds = ray.data.range(5) >>> ds Dataset(num_blocks=..., num_rows=5, schema={id: int64}) >>> ds.iterator() DataIterator(Dataset(num_blocks=..., num_rows=5, schema={id: int64}))
Tip
For debugging purposes, use
make_local_dataset_iterator()to create a localDataIteratorfrom aDataset, aPreprocessor, and aDatasetConfig.PublicAPI (beta): This API is in beta and may change before becoming stable.
|
Return a batched iterable over the dataset. |
|
Return a batched iterable of Torch Tensors over the dataset. |
|
Return a TF Dataset over this dataset. |
Returns a string containing execution timing information. |