% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/cubist_rules.R
\name{cubist_rules}
\alias{cubist_rules}
\title{Cubist rule-based regression models}
\usage{
cubist_rules(
  mode = "regression",
  committees = NULL,
  neighbors = NULL,
  max_rules = NULL,
  engine = "Cubist"
)
}
\arguments{
\item{mode}{A single character string for the type of model.
The only possible value for this model is "regression".}

\item{committees}{A non-negative integer (no greater than 100 for the number
of members of the ensemble.}

\item{neighbors}{An integer between zero and nine for the number of training
set instances that are used to adjust the model-based prediction.}

\item{max_rules}{The largest number of rules.}

\item{engine}{A single character string specifying what computational engine
to use for fitting.}
}
\description{
\code{cubist_rules()} defines a model that derives simple feature rules from a tree
ensemble and uses creates regression models within each rule.

The engine for this model is:

\Sexpr[stage=render,results=rd]{parsnip:::make_engine_list("cubist_rules", pkg = "rules")}

More information on how \pkg{parsnip} is used for modeling is at
\url{https://www.tidymodels.org/}.
}
\details{
Cubist is a rule-based ensemble regression model. A basic model tree
(Quinlan, 1992) is created that has a separate linear regression model
corresponding for each terminal node. The paths along the model tree is
flattened into rules these rules are simplified and pruned. The parameter
\code{min_n} is the primary method for controlling the size of each tree while
\code{max_rules} controls the number of rules.

Cubist ensembles are created using \emph{committees}, which are similar to
boosting. After the first model in the committee is created, the second
model uses a modified version of the outcome data based on whether the
previous model under- or over-predicted the outcome. For iteration \emph{m}, the
new outcome \verb{y*} is computed using

\figure{comittees.png}

If a sample is under-predicted on the previous iteration, the outcome is
adjusted so that the next time it is more likely to be over-predicted to
compensate. This adjustment continues for each ensemble iteration. See
Kuhn and Johnson (2013) for details.

After the model is created, there is also an option for a post-hoc
adjustment that uses the training set (Quinlan, 1993). When a new sample is
predicted by the model, it can be modified by its nearest neighbors in the
original training set. For \emph{K} neighbors, the model based predicted value is
adjusted by the neighbor using:

\figure{adjust.png}

where \code{t} is the training set prediction and \code{w} is a weight that is inverse
to the distance to the neighbor.

This function only defines what \emph{type} of model is being fit. Once an engine
is specified, the \emph{method} to fit the model is also defined.

The model is not trained or fit until the \code{\link[=fit.model_spec]{fit.model_spec()}} function is used
with the data.
}
\examples{
cubist_rules()

# ------------------------------------------------------------------------------

data(car_prices, package = "modeldata")
car_rules <-
  cubist_rules(committees = 1) \%>\%
  fit(log10(Price) ~ ., data = car_prices)

car_rules

summary(car_rules$fit)
}
\references{
\url{https://www.tidymodels.org}, \href{https://www.tmwr.org/}{\emph{Tidy Models with R}}

Quinlan R (1992). "Learning with Continuous Classes." Proceedings
of the 5th Australian Joint Conference On Artificial Intelligence, pp.
343-348.

Quinlan R (1993)."Combining Instance-Based and Model-Based Learning."
Proceedings of the Tenth International Conference on Machine Learning, pp.
236-243.

Kuhn M and Johnson K (2013). \emph{Applied Predictive Modeling}. Springer.
}
\seealso{
\code{\link[Cubist:cubist.default]{Cubist::cubist()}}, \code{\link[Cubist:cubistControl]{Cubist::cubistControl()}}, \Sexpr[stage=render,results=rd]{parsnip:::make_seealso_list("cubist_rules", "rules")}
}
