ray.train.huggingface.TransformersPredictor.predict#

TransformersPredictor.predict(data: Union[numpy.ndarray, pandas.DataFrame, Dict[str, numpy.ndarray]], feature_columns: Optional[Union[List[str], List[int]]] = None, **predict_kwargs) Union[numpy.ndarray, pandas.DataFrame, Dict[str, numpy.ndarray]][source]#

Run inference on data batch.

The data is converted into a list (unless pipeline is a TableQuestionAnsweringPipeline) and passed to the pipeline object.

Parameters
  • data – A batch of input data. Either a pandas DataFrame or numpy array.

  • feature_columns – The names or indices of the columns in the data to use as features to predict on. If None, use all columns.

  • **pipeline_call_kwargs – additional kwargs to pass to the pipeline object.

Examples

>>> import pandas as pd
>>> from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
>>> from transformers.pipelines import pipeline
>>> from ray.train.huggingface import TransformersPredictor
>>>
>>> model_checkpoint = "gpt2"
>>> tokenizer_checkpoint = "sgugger/gpt2-like-tokenizer"
>>> tokenizer = AutoTokenizer.from_pretrained(tokenizer_checkpoint)
>>>
>>> model_config = AutoConfig.from_pretrained(model_checkpoint)
>>> model = AutoModelForCausalLM.from_config(model_config)
>>> predictor = TransformersPredictor(
...     pipeline=pipeline(
...         task="text-generation", model=model, tokenizer=tokenizer
...     )
... )
>>>
>>> prompts = pd.DataFrame(
...     ["Complete me", "And me", "Please complete"], columns=["sentences"]
... )
>>> predictions = predictor.predict(prompts)
Returns

Prediction result.