% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/demean.R
\name{demean}
\alias{demean}
\title{Compute group-meaned and de-meaned variables}
\usage{
demean(
  x,
  select,
  group,
  suffix_demean = "_within",
  suffix_groupmean = "_between"
)
}
\arguments{
\item{x}{A data frame.}

\item{select}{Character vector with names of variables to select that should be group- and de-meaned.}

\item{group}{Name of the variable that indicates the group- or cluster-ID.}

\item{suffix_demean, suffix_groupmean}{String value, will be appended to the names of the
group-meaned and de-meaned variables of \code{x}. By default, de-meaned
variables will be suffixed with \code{"_within"} and grouped-meaned variables
with \code{"_between"}.}
}
\value{
A data frame with the group-/de-meaned variables, which get the suffix
  \code{"_between"} (for the group-meaned variable) and \code{"_within"} (for
  the de-meaned variable) by default.
}
\description{
\code{demean()} computes group- and de-meaned versions of a
   variable that can be used in regression analysis to model the between-
   and within-subject effect.
}
\details{
\subsection{Heterogeneity Bias}{
    Mixed models include different levels of sources of variability, i.e.
    error terms at each level. When macro-indicators (or level-2 predictors,
    or higher-level units, or more general: \emph{group-level predictors that
    are \strong{constant} within groups}, such as "education" within participants,
    or GDP within countries) are included as fixed effects (i.e. treated as
    covariate at level-1), the variance that is left unaccounted for this covariate
    will be absorbed into the error terms of level-1 and level-2. Hence, the error
    terms will be correlated with the covariate, which violates one of the
    assumptions of mixed models (iid, independent and identically distributed
    error terms). This bias is also called the \emph{heterogeneity bias}
    (\cite{Bell et al. 2015}). To resolve this problem, level-2 predictors
    used as (level-1) covariates should be "group-meaned".
  }
  \subsection{Panel data and correlating fixed and group effects}{
    \code{demean()} is intended to create group- and de-meaned variables
    for panel regression models (fixed effects models), or for complex
    random-effect-within-between models (see \cite{Bell et al. 2015, 2018}),
    where group-effects (random effects) and fixed effects correlate (see
    \cite{Bafumi and Gelman 2006}). This can happen, for instance, when
    analyzing panel data, which can lead to \emph{Heterogeneity Bias}. To
    control for correlating predictors and group effects, it is recommended
    to include the group-meaned and de-meaned version of \emph{time-varying covariates}
    (and group-meaned version of \emph{time-invariant covariates} that are on
    a higher level, e.g. level-2 predictors) in the model. By this, one can
    fit complex multilevel models for panel data, including time-varying
    predictors, time-invariant predictors and random effects.
   }
  \subsection{Why mixed models are preferred over fixed effects models}{
    A mixed models approach including time-varying and time-constant fixed
    effects as well as random effects is superior to classic fixed-effects
    models, which lack information of variation in the group-effects or
    between-subject effects. Furthermore, fixed effects regression cannot
    include random slopes, which means that fixed effects regressions are
    neglecting \dQuote{cross-cluster differences in the effects of lower-level
    controls (which) reduces the precision of estimated context effects,
    resulting in unnecessarily wide confidence intervals and low statistical
    power} (\cite{Heisig et al. 2017}).
  }
  \subsection{Terminology}{
    The group-meaned variable is simply the mean of an independent variable
    within each group (or id-level or cluster) represented by \code{group}.
    It represents the cluster-mean of an independent variable. The de-meaned
    variable is then the centered version of the group-meaned variable. De-meaning
    is sometimes also called person-mean centering or centering within clusters.
  }
  \subsection{De-meaning with continuous predictors}{
    For continuous time-varying predictors, the recommendation is to include
    both their de-meaned and group-meaned versions as fixed effects, but not
    the raw (untransformed) time-varying predictors themselves. The de-meaned
    predictor should also be included as random effect (random slope). In
    regression models, the coefficient of the de-meaned predictors indicates
    the within-subject effect, while the coefficient of the group-meaned
    predictor indicates the between-subject effect.
  }
  \subsection{De-meaning with binary predictors}{
    For binary time-varying predictors, the recommendation is to include
    the raw (untransformed) binary predictor as fixed effect only and the
    \emph{de-meaned} variable as random effect (random slope)
    (\cite{Hoffmann 2015, chapter 8-2.I}). \code{demean()} will thus coerce
    categorical time-varying predictors to numeric to compute the de- and
    group-meaned versions for these variables.
  }
  \subsection{De-meaning of factors with more than 2 levels}{
    Factors with more than two levels are demeaned in two ways: first, these
    are also converted to numeric and de-meaned; second, dummy variables
    are created (binary, with 0/1 coding for each level) and these binary
    dummy-variables are de-meaned in the same way (as described above).
    Packages like \pkg{panelr} internally convert factors to dummies before
    demeaning, so this behaviour can be mimicked here.
  }
  \subsection{De-meaning interaction terms}{
    There are multiple ways to deal with interaction terms of within- and
    between-effects. A classical approach is to simply use the product
    term of the de-meaned variables (i.e. introducing the de-meaned variables
    as interaction term in the model formula, e.g. \code{y ~ x_within * time_within}).
    This approach, however, might be subject to bias (see \cite{Giesselmann & Schmidt-Catran 2018}).
    \cr \cr
    Another option is to first calculate the product term and then apply the
    de-meaning to it. This approach produces an estimator \dQuote{that reflects
    unit-level differences of interacted variables whose moderators vary
    within units}, which is desirable if \emph{no} within interaction of
    two time-dependent variables is required. \cr \cr
    A third option, when the interaction should result in a genuine within
    estimator, is to "double de-mean" the interaction terms
    (\cite{Giesselmann & Schmidt-Catran 2018}), however, this is currently
    not supported by \code{demean()}. If this is required, the \code{wmb()}
    function from the \pkg{panelr} package should be used. \cr \cr
    To de-mean interaction terms for within-between models, simply specify
    the term as interaction for the \code{select}-argument, e.g.
    \code{select = "a*b"} (see 'Examples').
  }
  \subsection{Analysing panel data with mixed models using lme4}{
    A description of how to translate the
    formulas described in \emph{Bell et al. 2018} into R using \code{lmer()}
    from \pkg{lme4} or \code{glmmTMB()} from \pkg{glmmTMB} can be found here:
    \href{https://strengejacke.github.io/mixed-models-snippets/random-effects-within-between-effects-model.html}{for lmer()}
    and \href{https://strengejacke.github.io/mixed-models-snippets/random-effects-within-between-effects-model-glmmtmb.html}{for glmmTMB()}.
  }
}
\examples{
data(iris)
iris$ID <- sample(1:4, nrow(iris), replace = TRUE) # fake-ID
iris$binary <- as.factor(rbinom(150, 1, .35)) # binary variable

x <- demean(iris, select = c("Sepal.Length", "Petal.Length"), group = ID)
head(x)

x <- demean(iris, select = c("Sepal.Length", "binary", "Species"), group = ID)
head(x)

# demean interaction term x*y
dat <- data.frame(
  a = c(1, 2, 3, 4, 1, 2, 3, 4),
  x = c(4, 3, 3, 4, 1, 2, 1, 2),
  y = c(1, 2, 1, 2, 4, 3, 2, 1),
  ID = c(1, 2, 3, 1, 2, 3, 1, 2)
)
demean(dat, select = c("a", "x*y"), group = "ID")
}
\references{
\itemize{
  \item Bafumi J, Gelman A. 2006. Fitting Multilevel Models When Predictors and Group Effects Correlate. In. Philadelphia, PA: Annual meeting of the American Political Science Association.
  \item Bell A, Fairbrother M, Jones K. 2018. Fixed and Random Effects Models: Making an Informed Choice. Quality & Quantity.
  \item Bell A, Jones K. 2015. Explaining Fixed Effects: Random Effects Modeling of Time-Series Cross-Sectional and Panel Data. Political Science Research and Methods, 3(1), 133–153.
  \item Giesselmann M, Schmidt-Catran A. 2018. Interactions in fixed effects regression models (Discussion Papers of DIW Berlin No. 1748). DIW Berlin, German Institute for Economic Research. Retrieved from https://ideas.repec.org/p/diw/diwwpp/dp1748.html
  \item Heisig JP, Schaeffer M, Giesecke J. 2017. The Costs of Simplicity: Why Multilevel Models May Benefit from Accounting for Cross-Cluster Differences in the Effects of Controls. American Sociological Review 82 (4): 796–827.
  \item Hoffman L. 2015. Longitudinal analysis: modeling within-person fluctuation and change. New York: Routledge
}
}
