Title: Model Wrappers for Tree-Based Models
Version: 0.4.0
Description: Bindings for additional tree-based model engines for use with the 'parsnip' package. Models include gradient boosted decision trees with 'LightGBM' (Ke et al, 2017.), conditional inference trees and conditional random forests with 'partykit' (Hothorn and Zeileis, 2015. and Hothorn et al, 2006. <doi:10.1198/106186006X133933>), and accelerated oblique random forests with 'aorsf' (Jaeger et al, 2022 <doi:10.5281/zenodo.7116854>).
License: MIT + file LICENSE
URL: https://bonsai.tidymodels.org/, https://github.com/tidymodels/bonsai
BugReports: https://github.com/tidymodels/bonsai/issues
Depends: parsnip (≥ 1.0.1), R (≥ 4.1)
Imports: cli, dials, dplyr, purrr, rlang (≥ 1.1.0), stats, tibble, utils, withr
Suggests: aorsf (≥ 0.1.5), covr, knitr, lightgbm, Matrix, modeldata, partykit, rmarkdown, rsample, testthat (≥ 3.0.0), tune
VignetteBuilder: knitr
Config/Needs/website: tidyverse/tidytemplate
Config/testthat/edition: 3
Config/usethis/last-upkeep: 2025-04-25
Encoding: UTF-8
RoxygenNote: 7.3.2
NeedsCompilation: no
Packaged: 2025-06-24 17:46:22 UTC; emilhvitfeldt
Author: Daniel Falbel [aut], Athos Damiani [aut], Roel M. Hogervorst ORCID iD [aut], Max Kuhn ORCID iD [aut], Simon Couch ORCID iD [aut], Emil Hvitfeldt ORCID iD [aut, cre], Posit Software, PBC ROR ID [cph, fnd]
Maintainer: Emil Hvitfeldt <emil.hvitfeldt@posit.co>
Repository: CRAN
Date/Publication: 2025-06-25 12:30:02 UTC

bonsai: Model Wrappers for Tree-Based Models

Description

logo

Bindings for additional tree-based model engines for use with the 'parsnip' package. Models include gradient boosted decision trees with 'LightGBM' (Ke et al, 2017.), conditional inference trees and conditional random forests with 'partykit' (Hothorn and Zeileis, 2015. and Hothorn et al, 2006. doi:10.1198/106186006X133933), and accelerated oblique random forests with 'aorsf' (Jaeger et al, 2022 doi:10.5281/zenodo.7116854).

Author(s)

Maintainer: Emil Hvitfeldt emil.hvitfeldt@posit.co (ORCID)

Authors:

Other contributors:

See Also

Useful links:


Internal functions

Description

Not intended for direct use.

Usage

predict_catboost_regression_numeric(object, new_data, ...)

predict_catboost_classification_class(object, new_data, ...)

predict_catboost_classification_prob(object, new_data, ...)

predict_catboost_classification_raw(object, new_data, ...)

Internal functions

Description

Not intended for direct use.

Usage

predict_lightgbm_classification_prob(object, new_data, ...)

predict_lightgbm_classification_class(object, new_data, ...)

predict_lightgbm_classification_raw(object, new_data, ...)

predict_lightgbm_regression_numeric(object, new_data, ...)

## S3 method for class ''_lgb.Booster''
multi_predict(object, new_data, type = NULL, trees = NULL, ...)

Objects exported from other packages

Description

These objects are imported from other packages. Follow the links below to see their documentation.

parsnip

%>%


Boosted trees with catboost

Description

train_catboost is a wrapper for catboost tree-based models where all of the model arguments are in the main function.

Usage

train_catboost(
  x,
  y,
  weights = NULL,
  iterations = 1000,
  learning_rate = 0.03,
  depth = 6,
  l2_leaf_reg = 3,
  random_strength = 1,
  bagging_temperature = 1,
  rsm = 1,
  quiet = TRUE,
  ...
)

Arguments

x

A data frame of predictors.

y

A vector (factor or numeric) or matrix (numeric) of outcome data.

weights

A numeric vector of sample weights, defaults to NULL.

iterations

The maximum number of trees that can be built when solving machine learning problems. Default to 1000.

learning_rate

A positive numeric value for the learning rate. Defaults to 0.03.

depth

An integer for the depth of the trees. Default to 6.

l2_leaf_reg

A numeric value for the L2 regularization coefficient. Used for leaf value calculation. Defaults to 3.

random_strength

The amount of randomness to use for scoring splits when the tree structure is selected. Use this parameter to avoid overfitting the model. Defaults to 1.

bagging_temperature

A numeric value, controls intensity of Bayesian bagging. The higher the temperature the more aggressive bagging is. Defaults to 1.

rsm

A numeric value between 0 and 1, random subspace method. The percentage of features to use at each iteration of building trees. At each iteration, features are selected over again at random. Defaults to 1.

quiet

A logical; should logging by catboost::catboost.train() be muted?

...

Other options to pass to catboost::catboost.train(). Arguments will be correctly routed to the param argument, or as a main argument, depending on their name.

Details

This is an internal function, not meant to be directly called by the user.

Value

A fitted catboost.Model object.

Source

https://catboost.ai/docs/en/references/training-parameters/.


Boosted trees with lightgbm

Description

train_lightgbm is a wrapper for lightgbm tree-based models where all of the model arguments are in the main function.

Usage

train_lightgbm(
  x,
  y,
  weights = NULL,
  max_depth = -1,
  num_iterations = 100,
  learning_rate = 0.1,
  feature_fraction_bynode = 1,
  min_data_in_leaf = 20,
  min_gain_to_split = 0,
  bagging_fraction = 1,
  early_stopping_round = NULL,
  validation = 0,
  counts = TRUE,
  quiet = FALSE,
  ...
)

Arguments

x

A data frame or matrix of predictors

y

A vector (factor or numeric) or matrix (numeric) of outcome data.

weights

A numeric vector of sample weights.

max_depth

An integer for the maximum depth of the tree.

num_iterations

An integer for the number of boosting iterations.

learning_rate

A numeric value between zero and one to control the learning rate.

feature_fraction_bynode

Fraction of predictors that will be randomly sampled at each split.

min_data_in_leaf

A numeric value for the minimum sum of instances needed in a child to continue to split.

min_gain_to_split

A number for the minimum loss reduction required to make a further partition on a leaf node of the tree.

bagging_fraction

Subsampling proportion of rows. Setting this argument to a non-default value will also set bagging_freq = 1. See the Bagging section in ?details_boost_tree_lightgbm for more details.

early_stopping_round

Number of iterations without an improvement in the objective function occur before training should be halted.

validation

The proportion of the training data that are used for performance assessment and potential early stopping.

counts

A logical; should feature_fraction_bynode be interpreted as the number of predictors that will be randomly sampled at each split? TRUE indicates that mtry will be interpreted in its sense as a count, FALSE indicates that the argument will be interpreted in its sense as a proportion.

quiet

A logical; should logging by lightgbm::lgb.train() be muted?

...

Other options to pass to lightgbm::lgb.train(). Arguments will be correctly routed to the param argument, or as a main argument, depending on their name.

Details

This is an internal function, not meant to be directly called by the user.

Value

A fitted lightgbm.Model object.