% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/PipeOpClassBalancing.R
\name{mlr_pipeops_classbalancing}
\alias{mlr_pipeops_classbalancing}
\alias{PipeOpClassBalancing}
\title{Class Balancing}
\format{
\code{\link[R6:R6Class]{R6Class}} object inheriting from \code{\link{PipeOpTaskPreproc}}/\code{\link{PipeOp}}.
}
\description{
Both undersamples a \code{\link[mlr3:Task]{Task}} to keep only a fraction of the rows of the majority class,
as well as oversamples (repeats data points) rows of the minority class.

Sampling happens only during training phase. Class-balancing a \code{\link[mlr3:Task]{Task}} by sampling may be
beneficial for classification with imbalanced training data.
}
\section{Construction}{


\if{html}{\out{<div class="sourceCode">}}\preformatted{PipeOpClassBalancing$new(id = "classbalancing", param_vals = list())
}\if{html}{\out{</div>}}
\itemize{
\item \code{id} :: \code{character(1)}
Identifier of the resulting  object, default \code{"classbalancing"}
\item \code{param_vals} :: named \code{list}\cr
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default \code{list()}.
}
}

\section{Input and Output Channels}{

Input and output channels are inherited from \code{\link{PipeOpTaskPreproc}}. Instead of a \code{\link[mlr3:Task]{Task}}, a
\code{\link[mlr3:TaskClassif]{TaskClassif}} is used as input and output during training and prediction.

The output during training is the input \code{\link[mlr3:Task]{Task}} with added or removed rows to balance target classes.
The output during prediction is the unchanged input.
}

\section{State}{

The \verb{$state} is a named \code{list} with the \verb{$state} elements inherited from \code{\link{PipeOpTaskPreproc}}.
}

\section{Parameters}{

The parameters are the parameters inherited from \code{\link{PipeOpTaskPreproc}}; however, the \code{affect_columns} parameter is \emph{not} present. Further parameters are:
\itemize{
\item \code{ratio} :: \code{numeric(1)} \cr
Ratio of number of rows of classes to keep, relative
to the \verb{$reference} value. Initialized to 1.
\item \code{reference} :: \code{numeric(1)} \cr
What the \verb{$ratio} value is measured against. Can be \code{"all"} (mean instance count of
all classes), \code{"major"} (instance count of class with most instances), \code{"minor"}
(instance count of class with fewest instances), \code{"nonmajor"} (average instance
count of all classes except the major one), \code{"nonminor"} (average instance count
of all classes except the minor one), and \code{"one"} (\verb{$ratio} determines the number of
instances to have, per class). Initialized to \code{"all"}.
\item \code{adjust} :: \code{numeric(1)} \cr
Which classes to up / downsample. Can be \code{"all"} (up and downsample all to match required
instance count), \code{"major"}, \code{"minor"}, \code{"nonmajor"}, \code{"nonminor"} (see respective values
for \verb{$reference}), \code{"upsample"} (only upsample), and \code{"downsample"}. Initialized to \code{"all"}.
\item \code{shuffle} :: \code{logical(1)} \cr
Whether to shuffle the rows of the resulting task.
In case the data is upsampled and \code{shuffle = FALSE}, the resulting task will have the original
rows (which were not removed in downsampling) in the original order, followed by all newly added rows
ordered by target class.
Initialized to \code{TRUE}.
}
}

\section{Internals}{

Up / downsampling happens as follows: At first, a "target class count" is calculated, by taking the mean
class count of all classes indicated by the \code{reference} parameter (e.g. if \code{reference} is \code{"nonmajor"}:
the mean class count of all classes that are not the "major" class, i.e. the class with the most samples)
and multiplying this with the value of the \code{ratio} parameter. If \code{reference} is \code{"one"}, then the "target
class count" is just the value of \code{ratio} (i.e. \code{1 * ratio}).

Then for each class that is referenced by the \code{adjust} parameter (e.g. if \code{adjust} is \code{"nonminor"}:
each class that is not the class with the fewest samples), \code{PipeOpClassBalancing} either throws out
samples (downsampling), or adds additional rows that are equal to randomly chosen samples (upsampling),
until the number of samples for these classes equals the "target class count".
No upsampling is performed for classes that were not observed during training (i.e. empty factor levels in the target column).

Uses \code{task$filter()} to remove rows. When identical rows are added during upsampling, then the \code{task$row_roles$use} can \emph{not} be used
to duplicate rows because of [inaudible]; instead the \code{task$rbind()} function is used, and
a new \code{\link[data.table:data.table]{data.table}} is attached that contains all rows that are being duplicated exactly as many times as they are being added.
}

\section{Fields}{

Only fields inherited from \code{\link{PipeOp}}.
}

\section{Methods}{

Only methods inherited from \code{\link{PipeOpTaskPreproc}}/\code{\link{PipeOp}}.
}

\examples{
library("mlr3")

task = tsk("spam")
opb = po("classbalancing")

# target class counts
table(task$truth())

# double the instances in the minority class (spam)
opb$param_set$values = list(ratio = 2, reference = "minor",
  adjust = "minor", shuffle = FALSE)
result = opb$train(list(task))[[1L]]
table(result$truth())

# up or downsample all classes until exactly 20 per class remain
opb$param_set$values = list(ratio = 20, reference = "one",
  adjust = "all", shuffle = FALSE)
result = opb$train(list(task))[[1]]
table(result$truth())
}
\seealso{
https://mlr-org.com/pipeops.html

Other PipeOps: 
\code{\link{PipeOp}},
\code{\link{PipeOpEncodePL}},
\code{\link{PipeOpEnsemble}},
\code{\link{PipeOpImpute}},
\code{\link{PipeOpTargetTrafo}},
\code{\link{PipeOpTaskPreproc}},
\code{\link{PipeOpTaskPreprocSimple}},
\code{\link{mlr_pipeops}},
\code{\link{mlr_pipeops_adas}},
\code{\link{mlr_pipeops_blsmote}},
\code{\link{mlr_pipeops_boxcox}},
\code{\link{mlr_pipeops_branch}},
\code{\link{mlr_pipeops_chunk}},
\code{\link{mlr_pipeops_classifavg}},
\code{\link{mlr_pipeops_classweights}},
\code{\link{mlr_pipeops_colapply}},
\code{\link{mlr_pipeops_collapsefactors}},
\code{\link{mlr_pipeops_colroles}},
\code{\link{mlr_pipeops_copy}},
\code{\link{mlr_pipeops_datefeatures}},
\code{\link{mlr_pipeops_decode}},
\code{\link{mlr_pipeops_encode}},
\code{\link{mlr_pipeops_encodeimpact}},
\code{\link{mlr_pipeops_encodelmer}},
\code{\link{mlr_pipeops_encodeplquantiles}},
\code{\link{mlr_pipeops_encodepltree}},
\code{\link{mlr_pipeops_featureunion}},
\code{\link{mlr_pipeops_filter}},
\code{\link{mlr_pipeops_fixfactors}},
\code{\link{mlr_pipeops_histbin}},
\code{\link{mlr_pipeops_ica}},
\code{\link{mlr_pipeops_imputeconstant}},
\code{\link{mlr_pipeops_imputehist}},
\code{\link{mlr_pipeops_imputelearner}},
\code{\link{mlr_pipeops_imputemean}},
\code{\link{mlr_pipeops_imputemedian}},
\code{\link{mlr_pipeops_imputemode}},
\code{\link{mlr_pipeops_imputeoor}},
\code{\link{mlr_pipeops_imputesample}},
\code{\link{mlr_pipeops_kernelpca}},
\code{\link{mlr_pipeops_learner}},
\code{\link{mlr_pipeops_learner_pi_cvplus}},
\code{\link{mlr_pipeops_learner_quantiles}},
\code{\link{mlr_pipeops_missind}},
\code{\link{mlr_pipeops_modelmatrix}},
\code{\link{mlr_pipeops_multiplicityexply}},
\code{\link{mlr_pipeops_multiplicityimply}},
\code{\link{mlr_pipeops_mutate}},
\code{\link{mlr_pipeops_nearmiss}},
\code{\link{mlr_pipeops_nmf}},
\code{\link{mlr_pipeops_nop}},
\code{\link{mlr_pipeops_ovrsplit}},
\code{\link{mlr_pipeops_ovrunite}},
\code{\link{mlr_pipeops_pca}},
\code{\link{mlr_pipeops_proxy}},
\code{\link{mlr_pipeops_quantilebin}},
\code{\link{mlr_pipeops_randomprojection}},
\code{\link{mlr_pipeops_randomresponse}},
\code{\link{mlr_pipeops_regravg}},
\code{\link{mlr_pipeops_removeconstants}},
\code{\link{mlr_pipeops_renamecolumns}},
\code{\link{mlr_pipeops_replicate}},
\code{\link{mlr_pipeops_rowapply}},
\code{\link{mlr_pipeops_scale}},
\code{\link{mlr_pipeops_scalemaxabs}},
\code{\link{mlr_pipeops_scalerange}},
\code{\link{mlr_pipeops_select}},
\code{\link{mlr_pipeops_smote}},
\code{\link{mlr_pipeops_smotenc}},
\code{\link{mlr_pipeops_spatialsign}},
\code{\link{mlr_pipeops_subsample}},
\code{\link{mlr_pipeops_targetinvert}},
\code{\link{mlr_pipeops_targetmutate}},
\code{\link{mlr_pipeops_targettrafoscalerange}},
\code{\link{mlr_pipeops_textvectorizer}},
\code{\link{mlr_pipeops_threshold}},
\code{\link{mlr_pipeops_tomek}},
\code{\link{mlr_pipeops_tunethreshold}},
\code{\link{mlr_pipeops_unbranch}},
\code{\link{mlr_pipeops_updatetarget}},
\code{\link{mlr_pipeops_vtreat}},
\code{\link{mlr_pipeops_yeojohnson}}
}
\concept{PipeOps}
