ray.train.xgboost.XGBoostTrainer
ray.train.xgboost.XGBoostTrainer#
- class ray.train.xgboost.XGBoostTrainer(*args, **kwargs)[source]#
Bases:
ray.train.gbdt_trainer.GBDTTrainerA Trainer for data parallel XGBoost training.
This Trainer runs the XGBoost training loop in a distributed manner using multiple Ray Actors.
Note
XGBoostTrainerdoes not modify or otherwise alter the working of the XGBoost distributed training algorithm. Ray only provides orchestration, data ingest and fault tolerance. For more information on XGBoost distributed training, refer to XGBoost documentation.Example
import ray from ray.train.xgboost import XGBoostTrainer from ray.train import ScalingConfig train_dataset = ray.data.from_items( [{"x": x, "y": x + 1} for x in range(32)]) trainer = XGBoostTrainer( label_column="y", params={"objective": "reg:squarederror"}, scaling_config=ScalingConfig(num_workers=3), datasets={"train": train_dataset} ) result = trainer.fit()
- Parameters
datasets – Datasets to use for training and validation. Must include a “train” key denoting the training dataset. If a
preprocessoris provided and has not already been fit, it will be fit on the training dataset. All datasets will be transformed by thepreprocessorif one is provided. All non-training datasets will be used as separate validation sets, each reporting a separate metric.label_column – Name of the label column. A column with this name must be present in the training dataset.
params – XGBoost training parameters. Refer to XGBoost documentation for a list of possible parameters.
dmatrix_params – Dict of
dataset name:dict of kwargspassed to respectivexgboost_ray.RayDMatrixinitializations, which in turn are passed toxgboost.DMatrixobjects created on each worker. For example, this can be used to add sample weights with theweightsparameter.num_boost_round – Target number of boosting iterations (trees in the model). Note that unlike in
xgboost.train, this is the target number of trees, meaning that if you setnum_boost_round=10and pass a model that has already been trained for 5 iterations, it will be trained for 5 iterations more, instead of 10 more.scaling_config – Configuration for how to scale data parallel training.
run_config – Configuration for the execution of the training run.
preprocessor – A ray.data.Preprocessor to preprocess the provided datasets.
resume_from_checkpoint – A checkpoint to resume training from.
**train_kwargs – Additional kwargs passed to
xgboost.train()function.
PublicAPI (beta): This API is in beta and may change before becoming stable.
Methods
Converts self to a
tune.Trainableclass.can_restore(path)Checks whether a given directory contains a restorable Train experiment.
fit()Runs training.
restore(path[, datasets, preprocessor, ...])Restores a Train experiment from a previously interrupted/failed run.
setup()Called during fit() to perform initial setup on the Trainer.