ray.data.from_huggingface
ray.data.from_huggingface#
- ray.data.from_huggingface(dataset: datasets.Dataset) ray.data.dataset.MaterializedDataset[source]#
Create a
Datasetfrom a Hugging Face Datasets Dataset.This function isn’t parallelized, and is intended to be used with Hugging Face Datasets that are loaded into memory (as opposed to memory-mapped).
Example
import ray import datasets hf_dataset = datasets.load_dataset("tweet_eval", "emotion") ray_ds = ray.data.from_huggingface(hf_dataset["train"]) print(ray_ds)
MaterializedDataset( num_blocks=..., num_rows=3257, schema={text: string, label: int64} )- Parameters
dataset – A Hugging Face Datasets Dataset.
IterableDatasetand DatasetDict are not supported.- Returns
A
Datasetholding rows from the Hugging Face Datasets Dataset.
PublicAPI: This API is stable across Ray releases.