% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/decision_tree.R
\name{decision_tree}
\alias{decision_tree}
\alias{update.decision_tree}
\title{General Interface for Decision Tree Models}
\usage{
decision_tree(
  mode = "unknown",
  cost_complexity = NULL,
  tree_depth = NULL,
  min_n = NULL
)

\method{update}{decision_tree}(
  object,
  parameters = NULL,
  cost_complexity = NULL,
  tree_depth = NULL,
  min_n = NULL,
  fresh = FALSE,
  ...
)
}
\arguments{
\item{mode}{A single character string for the type of model.
Possible values for this model are "unknown", "regression", or
"classification".}

\item{cost_complexity}{A positive number for the the cost/complexity
parameter (a.k.a. \code{Cp}) used by CART models (\code{rpart} only).}

\item{tree_depth}{An integer for maximum depth of the tree.}

\item{min_n}{An integer for the minimum number of data points
in a node that are required for the node to be split further.}

\item{object}{A random forest model specification.}

\item{parameters}{A 1-row tibble or named list with \emph{main}
parameters to update. If the individual arguments are used,
these will supersede the values in \code{parameters}. Also, using
engine arguments in this object will result in an error.}

\item{fresh}{A logical for whether the arguments should be
modified in-place of or replaced wholesale.}

\item{...}{Not used for \code{update()}.}
}
\description{
\code{decision_tree()} is a way to generate a \emph{specification} of a model
before fitting and allows the model to be created using
different packages in R or via Spark. The main arguments for the
model are:
\itemize{
\item \code{cost_complexity}: The cost/complexity parameter (a.k.a. \code{Cp})
used by CART models (\code{rpart} only).
\item \code{tree_depth}: The \emph{maximum} depth of a tree (\code{rpart} and
\code{spark} only).
\item \code{min_n}: The minimum number of data points in a node
that are required for the node to be split further.
}
These arguments are converted to their specific names at the
time that the model is fit. Other options and argument can be
set using \code{set_engine()}. If left to their defaults
here (\code{NULL}), the values are taken from the underlying model
functions. If parameters need to be modified, \code{update()} can be used
in lieu of recreating the object from scratch.
}
\details{
The model can be created using the \code{fit()} function using the
following \emph{engines}:
\itemize{
\item \pkg{R}:  \code{"rpart"} (the default) or \code{"C5.0"} (classification only)
\item \pkg{Spark}: \code{"spark"}
}

Note that, for \code{rpart} models, but \code{cost_complexity} and
\code{tree_depth} can be both be specified but the package will give
precedence to \code{cost_complexity}. Also, \code{tree_depth} values
greater than 30 \code{rpart} will give nonsense results on 32-bit
machines.
}
\note{
For models created using the spark engine, there are
several differences to consider. First, only the formula
interface to via \code{fit()} is available; using \code{fit_xy()} will
generate an error. Second, the predictions will always be in a
spark table format. The names will be the same as documented but
without the dots. Third, there is no equivalent to factor
columns in spark tables so class predictions are returned as
character columns. Fourth, to retain the model object for a new
R session (via \code{save()}), the \code{model$fit} element of the \code{parsnip}
object should be serialized via \code{ml_save(object$fit)} and
separately saved to disk. In a new session, the object can be
reloaded and reattached to the \code{parsnip} object.
}
\section{Engine Details}{


The standardized parameter names in parsnip can be mapped to their original
names in each engine:\tabular{llll}{
   \strong{parsnip} \tab \strong{rpart} \tab \strong{C5.0} \tab \strong{spark} \cr
   tree_depth \tab maxdepth \tab NA \tab max_depth \cr
   min_n \tab minsplit \tab minCases \tab min_instances_per_node \cr
   cost_complexity \tab cp \tab NA \tab NA \cr
}


Engines may have pre-set default arguments when executing the
model fit call. For this type of
model, the template of the fit calls are::

\pkg{rpart} classification

\Sexpr[results=rd]{parsnip:::show_fit(parsnip:::decision_tree(mode = "classification"), "rpart")}

\pkg{rpart} regression

\Sexpr[results=rd]{parsnip:::show_fit(parsnip:::decision_tree(mode = "regression"), "rpart")}

\pkg{C5.0} classification

\Sexpr[results=rd]{parsnip:::show_fit(parsnip:::decision_tree(mode = "classification"), "C5.0")}

\pkg{spark} classification

\Sexpr[results=rd]{parsnip:::show_fit(parsnip:::decision_tree(mode = "classification"), "spark")}

\pkg{spark} regression

\Sexpr[results=rd]{parsnip:::show_fit(parsnip:::decision_tree(mode = "regression"), "spark")}
}

\examples{
decision_tree(mode = "classification", tree_depth = 5)
# Parameters can be represented by a placeholder:
decision_tree(mode = "regression", cost_complexity = varying())
model <- decision_tree(cost_complexity = 10, min_n = 3)
model
update(model, cost_complexity = 1)
update(model, cost_complexity = 1, fresh = TRUE)
}
\seealso{
[\code{\link[=fit]{fit()}}
}
