% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/pglmm.R
\name{pglmm}
\alias{pglmm}
\alias{communityPGLMM}
\title{Phylogenetic Generalised Linear Mixed Model for Community Data}
\usage{
pglmm(formula, data = NULL, family = "gaussian", cov_ranef = NULL,
  random.effects = NULL, REML = TRUE,
  optimizer = c("nelder-mead-nlopt", "bobyqa", "Nelder-Mead", "subplex"),
  repulsion = FALSE, add.obs.re = TRUE, verbose = FALSE,
  cpp = TRUE, bayes = FALSE, s2.init = NULL, B.init = NULL,
  reltol = 10^-6, maxit = 500, tol.pql = 10^-6, maxit.pql = 200,
  marginal.summ = "mean", calc.DIC = FALSE, prior = "inla.default",
  prior_alpha = 0.1, prior_mu = 1, ML.init = FALSE, tree = NULL,
  tree_site = NULL, sp = NULL, site = NULL)

communityPGLMM(formula, data = NULL, family = "gaussian",
  cov_ranef = NULL, random.effects = NULL, REML = TRUE,
  optimizer = c("nelder-mead-nlopt", "bobyqa", "Nelder-Mead", "subplex"),
  repulsion = FALSE, add.obs.re = TRUE, verbose = FALSE,
  cpp = TRUE, bayes = FALSE, s2.init = NULL, B.init = NULL,
  reltol = 10^-6, maxit = 500, tol.pql = 10^-6, maxit.pql = 200,
  marginal.summ = "mean", calc.DIC = FALSE, prior = "inla.default",
  prior_alpha = 0.1, prior_mu = 1, ML.init = FALSE, tree = NULL,
  tree_site = NULL, sp = NULL, site = NULL)
}
\arguments{
\item{formula}{A two-sided linear formula object describing the
mixed-effects of the model; it follows similar syntax with \code{\link[lme4:lmer]{lmer}}.
There are some differences though.

First, to specify that a random term should have phylogenetic cov matrix along
with non-phylogenetic one, add \code{__} (two underscores) at the end of the group variable,
e.g. \code{+ (1 | sp__)} will construct two random terms,
one with phylogenetic cov matrix and another with non-phylogenetic (Identity) matrix;
However, \code{__} in the nested terms (below) will only create a phlylogenetic cov-matrix.
Therefore, nested random term has four forms:
\enumerate{
\item \code{(1|sp__@site)} represents correlated species are nested within independent sites
(i.e. kronecker(I_sites, V_sp)). This should be the most common one for community analysis (to test for overdispersion or underdispersion).
\item \code{(1|sp@site__)} represents independent species are nested within correlated sites
(i.e. kron(V_sites, I_sp)). This one can be used for bipartite questions.
You can, for example, treat sp as insects and site as plants with \code{(1|insects@plants__)}.
Remember to add the phylogeny of plants in the argument \code{cov_ranef = list(plants = plant_phylo)}.
\item \code{(1|sp__@site__)} represents correlated species are nested within correlated sites
(i.e. kron(V_sites, V_sp)). This one can also be used for bipartite questions such as
pollinators and plants (e.g. \code{(1|pollinators__@plants__)}). Remember to add their phylogenies
in the argument \code{cov_ranef = list(pollinators = pollinator_phylo, plants = plant_phylo)}.
\item \code{(1|sp@site)} will generate a identity matrix, which will be the same as
an observation level random term or the residual of LMM. So not very meaningful for gaussian models;
observation-level random term will be automatically added for binomial and poisson models.
}

Second, note that correlated random terms will not be allowed at this moment. For example,
\code{(x|g)} will be equal with \code{(0 + x|g)} in the \code{lme4::lmer} syntax;
also, \code{(x1 + x2|g)} won't work.}

\item{data}{A \code{\link{data.frame}} containing the variables named in formula.}

\item{family}{Either "gaussian" for a Linear Mixed Model, or
"binomial" for binomial dependent data, or "poisson" for count data.
It should be specified as a character string (i.e., quoted). At this moment,
for binomial data, we fixed the link function to logit; for poisson data,
we fixed the link function to log. Binomial data can be either
presence/absence, or a two column array of 'success' and 'fail'.
For both poisson and binomial data, we add an observation-level
random term by default via \code{add.obs.re = TRUE}. If \code{bayes = TRUE} there are
two additional families available: "zeroinflated.binomial", and "zeroinflated.poisson",
which add a "zero inflation" parameter, which is the probability that a the response is
a zero. The rest of the parameters of the model then reflect the "non-zero" part part
of the model. Note that "zeroinflated.binomial" only makes sense as a using successes /
fail type of response data.}

\item{cov_ranef}{A named list of var-cov matrices of random terms. The names should be the
group variables that are used as random terms with specified var-cov matrices
(without the two underscores, e.g. \code{list(sp = tree1, site = tree2)}). The actual object
can be either a phylogeny with class "phylo" or a prepared var-cov matrix. If it is a phylogeny,
we will prune it and then convert it to a var-cov matrix assuming brownian motion evolution.
We will also standardize all var-cov matrices to have determinant of one. Group variables
will be converted to factors and all var-cov matrices will be rearranged so that rows and
columns are in the same order as the levels of their corresponding group variables.}

\item{random.effects}{Optional pre-build list of random effects. If \code{NULL} (the default),
the function \code{\link{prep_dat_pglmm}} will prepare it for you based on the information
in \code{formula}, \code{data}, and \code{cov_ranef}. A list of pre-generated
random terms is also accepted (mainly to be compatible with code from previous versions).
If so, make sure that the orders of rows and columns of var-cov matrices in the generated
list are the same as their corresponding group variables in the data. This argument can be
useful if users want to use more complicated random terms.}

\item{REML}{Whether REML or ML is used for model fitting. For the
generalized linear mixed model for binary data, these don't have
standard interpretations, and there is no log likelihood function
that can be used in likelihood ratio tests. Ignored if \code{bayes = TRUE}}

\item{optimizer}{nelder-mead-nlopt (default) or bobyqa or Nelder-Mead or subplex.
Nelder-Mead is from the stats package and the other ones are from the nloptr package.
Ignored if \code{bayes = TRUE}.}

\item{repulsion}{When nested random term specified, do you want to test repulsion
(i.e., overdispersion) or underdispersion? Default is \code{FALSE}, i.e. test underdispersion.
This argument can be either a logical vector of length 1 or >1.
If its length is 1, then all cov matrices in nested terms will be either inverted (overdispersion) or not.
If its length is >1, then this means the users can select which cov matrix in the nested terms to be inverted.
If so, make sure to get the length right: for all the terms with \code{@},
count the number of "__" and this will be the length of repulsion.
For example, \code{sp__@site} will take one length as well as \code{sp@site__}.
\code{sp__@site__} will take two elements (repulsion for sp and repulsion for site). So, if you nested terms are
\code{(1|sp__@site) + (1|sp@site__) + (1|sp__@site__)}
in the formula, then you should set the repulsion to be something like
\code{c(TRUE, FALSE, TURE, TURE)} (length of 4).
The TRUE/FALSE combinations depend on your questions.}

\item{add.obs.re}{Whether add observation-level random term for poisson and binomial
distributions? Normally it would be a good idea to add this to account for overdispersions.
Thus, we set it to \code{TRUE} by default.}

\item{verbose}{If \code{TRUE}, the model deviance and running
estimates of \code{s2} and \code{B} are plotted each iteration
during optimization.}

\item{cpp}{Whether to use c++ function for optim. Default is TRUE. Ignored if \code{bayes = TRUE}.}

\item{bayes}{Whether to fit a Bayesian version of the PGLMM using \code{r-inla}.}

\item{s2.init}{An array of initial estimates of s2 for each random
effect that scales the variance. If s2.init is not provided for
\code{family="gaussian"}, these are estimated using in a clunky way
using \code{\link{lm}} assuming no phylogenetic signal. A better
approach is to run \code{link[lme4:lmer]{lmer}} and use the output
random effects for \code{s2.init}. If \code{s2.init} is not
provided for \code{family = "binomial"}, these are set to 0.25.}

\item{B.init}{Initial estimates of \eqn{B}{B}, a matrix containing
regression coefficients in the model for the fixed effects. This
matrix must have \code{dim(B.init) = c(p + 1, 1)}, where \code{p} is the
number of predictor (independent) variables; the first element of
\code{B} corresponds to the intercept, and the remaining elements
correspond in order to the predictor (independent) variables in the
formula. If \code{B.init} is not provided, these are estimated
using in a clunky way using \code{\link{lm}} or \code{\link{glm}}
assuming no phylogenetic signal. A better approach is to run
\code{\link[lme4:lmer]{lmer}} and use the output fixed effects for
\code{B.init}. When \code{bayes = TRUE}, initial values are estimated
using the maximum likelihood fit unless \code{ML.init = FALSE}, in
which case the default \code{INLA} initial values will be used.}

\item{reltol}{A control parameter dictating the relative tolerance
for convergence in the optimization; see \code{\link{optim}}.}

\item{maxit}{A control parameter dictating the maximum number of
iterations in the optimization; see \code{\link{optim}}.}

\item{tol.pql}{A control parameter dictating the tolerance for
convergence in the PQL estimates of the mean components of the
binomial GLMM.}

\item{maxit.pql}{A control parameter dictating the maximum number
of iterations in the PQL estimates of the mean components of the
binomial GLMM.}

\item{marginal.summ}{Summary statistic to use for the estimate of coefficients when
doing a Bayesian PGLMM (when \code{bayes = TRUE}). Options are: "mean",
"median", or "mode", referring to different characterizations of the central
tendency of the bayesian posterior marginal distributions. Ignored if \code{bayes = FALSE}.}

\item{calc.DIC}{Should the Deviance Informatiob Criterion be calculated and returned,
when doing a bayesian PGLMM? Ignored if \code{bayes = FALSE}.}

\item{prior}{Which type of default prior should be used by \code{pglmm}?
Only used if \code{bayes = TRUE}, ignored otherwise. There are currently four options:
"inla.default", which uses the default \code{INLA} priors; "pc.prior.auto", which uses a
complexity penalizing prior (as described in
\href{https://arxiv.org/abs/1403.4630v3}{Simpson et al. (2017)}), which tries to automatically
choose good parameters (only available for gaussian and binomial responses); "pc.prior", which
allows the user to set custom parameters on the "pc.prior" prior, using the \code{prior_alpha}
and \code{prior_mu} parameters (Run \code{INLA::inla.doc("pc.prec")} for details on these
parameters); and "uninformative", which sets a very uniformative prior
(nearly uniform) by using a very flat exponential distribution. This last one is generally
not recommended but may in some cases give estimates closer to the maximum likelihood estimates.
"pc.prior.auto" is only implemented for \code{family = "gaussian"} and \code{family = "binomial"}
currently.}

\item{prior_alpha}{Only used if \code{bayes = TRUE} and \code{prior = "pc.prior"}, in
which case it sets the alpha parameter of \code{INLA}'s complexity penalizing prior for the
random effects.The prior is an exponential distribution where prob(sd > mu) = alpha,
where sd is the standard deviation of the random effect.}

\item{prior_mu}{Only used if \code{bayes = TRUE} and \code{prior = "pc.prior"}, in
which case it sets the mu parameter of \code{INLA}'s complexity penalizing prior for the
random effects.The prior is an exponential distribution where prob(sd > mu) = alpha,
where sd is the standard deviation of the random effect.}

\item{ML.init}{Only relevant if \code{bayes = TRUE}. Should maximum
likelihood estimates be calculated and used as initial values for
the bayesian model fit? Sometimes this can be helpful; but most of the
time it may not help; thus we set the default to \code{FALSE}. Also, it
does not work with the zero-inflated families.}

\item{tree}{A phylogeny for column sp, with "phylo" class. Or a var-cov matrix for sp,
make sure to have all species in the matrix; if the matrix is not standarized,
i.e. det(tree) != 1, we will try to standarize it for you.
No longer used, keep here for compatibility.}

\item{tree_site}{A second phylogeny for "site". This is required only if the
site column contains species instead of sites. This can be used for bipartitie
questions. tree_site can also be a var-cov matrix, make sure to have all sites
in the matrix; if the matrix is not standarized, i.e. det(tree_site) != 1,
we will try to standarize for you. No longer used, keep here for compatibility.}

\item{sp}{No longer used, keep here for compatibility.}

\item{site}{No longer used, keep here for compatibility.}
}
\value{
An object (list) of class \code{communityPGLMM} with the following elements:
\item{formula}{the formula for fixed effects}
\item{formula_original}{the formula for both fixed effects and random effects}
\item{data}{the dataset}
\item{family}{either \code{gaussian} or \code{binomial} or \code{poisson} depending on the model fit}
\item{random.effects}{the list of random effects}
\item{B}{estimates of the regression coefficients}
\item{B.se}{approximate standard errors of the fixed effects regression coefficients.
This is set to NULL if \code{bayes = TRUE}.}
\item{B.ci}{approximate bayesian credible interval of the fixed effects regression coefficients.
This is set to NULL if \code{bayes = FALSE}}
\item{B.cov}{approximate covariance matrix for the fixed effects regression coefficients}
\item{B.zscore}{approximate Z scores for the fixed effects regression coefficients. This is set to NULL if \code{bayes = TRUE}}
\item{B.pvalue}{approximate tests for the fixed effects regression coefficients being different from zero. This is set to NULL if \code{bayes = TRUE}}
\item{ss}{random effects' standard deviations for the covariance matrix \eqn{\sigma^2V}{sigma^2 V} for each random effect in order. For the linear mixed model, the residual variance is listed last}
\item{s2r}{random effects variances for non-nested random effects}
\item{s2n}{random effects variances for nested random effects}
\item{s2resid}{for linear mixed models, the residual variance}
\item{s2r.ci}{Bayesian credible interval for random effects variances for non-nested random effects.
This is set to NULL if \code{bayes = FALSE}}
\item{s2n.ci}{Bayesian credible interval for random effects variances for nested random effects.
This is set to NULL if \code{bayes = FALSE}}
\item{s2resid.ci}{Bayesian credible interval for linear mixed models, the residual variance.
This is set to NULL if \code{bayes = FALSE}}
\item{logLik}{for linear mixed models, the log-likelihood for either the restricted likelihood (\code{REML=TRUE}) or the overall likelihood (\code{REML=FALSE}). This is set to NULL for generalised linear mixed models. If \code{bayes = TRUE}, this is the marginal log-likelihood}
\item{AIC}{for linear mixed models, the AIC for either the restricted likelihood (\code{REML=TRUE}) or the overall likelihood (\code{REML=FALSE}). This is set to NULL for generalised linear mixed models}
\item{BIC}{for linear mixed models, the BIC for either the restricted likelihood (\code{REML=TRUE}) or the overall likelihood (\code{REML=FALSE}). This is set to NULL for generalised linear mixed models}
\item{DIC}{for bayesian PGLMM, this is the Deviance Information Criterion metric of model fit. This is set to NULL if \code{bayes = FALSE}.}
\item{REML}{whether or not REML is used (\code{TRUE} or \code{FALSE}).}
\item{bayes}{whether or not a Bayesian model was fit.}
\item{marginal.summ}{The specified summary statistic used to summarise the Bayesian marginal distributions.
Only present if \code{bayes = TRUE}}
\item{s2.init}{the user-provided initial estimates of \code{s2}}
\item{B.init}{the user-provided initial estimates of \code{B}}
\item{Y}{the response (dependent) variable returned in matrix form}
\item{X}{the predictor (independent) variables returned in matrix form (including 1s in the first column)}
\item{H}{the residuals. For linear mixed models, this does not account for random terms,
To get residuals after accounting for both fixed and random terms, use \code{residuals()}.
For the generalized linear mixed model, these are the predicted residuals in the
logit -1 space.}
\item{iV}{the inverse of the covariance matrix for the entire system (of dimension (nsp\emph{nsite)
by (nsp}nsite)). This is NULL if \code{bayes = TRUE}.}
\item{mu}{predicted mean values for the generalized linear mixed model (i.e. similar to \code{fitted(merMod)}).
Set to NULL for linear mixed models, for which we can use \code{\link[=fitted]{fitted()}}.}
\item{nested}{matrices used to construct the nested design matrix. This is set to NULL if \code{bayes = TRUE}}
\item{Zt}{the design matrix for random effects. This is set to NULL if \code{bayes = TRUE}}
\item{St}{diagonal matrix that maps the random effects variances onto the design matrix}
\item{convcode}{the convergence code provided by \code{\link{optim}}. This is set to NULL if \code{bayes = TRUE}}
\item{niter}{number of iterations performed by \code{\link{optim}}. This is set to NULL if \code{bayes = TRUE}}
\item{inla.model}{Model object fit by underlying \code{inla} function. Only returned
if \code{bayes = TRUE}}
}
\description{
This function performs Generalized Linear Mixed Models for binary, count,
and continuous data, estimating regression coefficients with
approximate standard errors. It is modeled after
\code{\link[lme4:lmer]{lmer}} but is more general by allowing
correlation structure within random effects; these correlations can
be phylogenetic among species, or any other correlation structure,
such as geographical correlations among sites. It is, however, much
more specific than \code{\link[lme4:lmer]{lmer}} in that it can
only analyze a subset of the types of model designed handled by
\code{\link[lme4:lmer]{lmer}}. It is also slower than
\code{\link[lme4:lmer]{lmer}}. \code{pglmm} can analyze models in Ives and
Helmus (2011). It can also analyze bipartite phylogenetic data,
such as that analyzed in Rafferty and Ives (2011), by giving sites
phylogenetic correlations.
A Bayesian version of PGLMM can be fit by specifying the \code{bayes = TRUE}.
This uses the package \code{INLA},
which is not available on CRAN yet. If you wish to use this option,
you must first install \code{INLA} from \url{http://www.r-inla.org/} by running
\code{install.packages('INLA', repos='https://www.math.ntnu.no/inla/R/stable')} in R.
Note that while \code{bayes = TRUE} currently only supports \code{family} arguments of
\code{"gaussian"}, \code{"binomial"}, \code{"poisson"},
\code{"zeroinflated.binomial"}, and \code{"zeroinflated.poisson"}.
}
\details{
\deqn{Y = \beta_0 + \beta_1x + b_0 + b_1x}{Y = beta_0 + beta_1x + b_0 + b_1x}
\deqn{b_0 ~ Gaussian(0, \sigma_0^2I_{sp})}{b_0 ~ Gaussian(0, sigma_0^2I_(sp))}
\deqn{b_1 ~ Gaussian(0, \sigma_0^2V_{sp})}{b_0 ~ Gaussian(0, sigma_0^2V_(sp))}
\deqn{\eta ~ Gaussian(0,\sigma^2)}{e ~ Gaussian(0,sigma^2)}

where \eqn{\beta_0}{beta_0} and \eqn{\beta_1}{beta_1} are fixed
effects, and \eqn{V_{sp}}{V_(sp)} is a variance-covariance matrix
derived from a phylogeny (typically under the assumption of
Brownian motion evolution). Here, the variation in the mean
(intercept) for each species is given by the random effect
\eqn{b_0}{b_0} that is assumed to be independent among
species. Variation in species' responses to predictor variable
\eqn{x}{x} is given by a random effect \eqn{b_0}{b_0} that is
assumed to depend on the phylogenetic relatedness among species
given by \eqn{V_{sp}}{V_(sp)}; if species are closely related,
their specific responses to \eqn{x}{x} will be similar. This
particular model would be specified as

\code{z <- pglmm(Y ~ X + (1|sp__), data = data, family = "gaussian", cov_ranef = list(sp = phy))}

Or you can prepare the random terms manually (not recommended for simple models but may be necessary for complex models):

\code{re.1 <- list(1, sp = dat$sp, covar = diag(nspp))}

\code{re.2 <- list(dat$X, sp = dat$sp, covar = Vsp)}

\code{z <- pglmm(Y ~ X, data = data, family = "gaussian", random.effects = list(re.1, re.2))}

The covariance matrix covar is standardized to have its determinant
equal to 1. This in effect standardizes the interpretation of the
scalar \eqn{\sigma^2}{sigma^2}. Although mathematically this is
not required, it is a very good idea to standardize the predictor
(independent) variables to have mean 0 and variance 1. This will
make the function more robust and improve the interpretation of the
regression coefficients. For categorical (factor) predictor
variables, you will need to construct 0-1 dummy variables, and
these should not be standardized (for obvious reasons).

For binary generalized linear mixed models (\code{family =
'binomial'}), the function estimates parameters for the model of
the form, for example,

\deqn{y = \beta_0 + \beta_1x + b_0 + b_1x}{y = beta_0 + beta_1x + b_0 + b_1x}
\deqn{Y = logit^{-1}(y)}{Y = logit^(-1)(y)}
\deqn{b_0 ~ Gaussian(0, \sigma_0^2I_{sp})}{b_0 ~ Gaussian(0, sigma_0^2I_(sp))}
\deqn{b_1 ~ Gaussian(0, \sigma_0^2V_{sp})}{b_0 ~ Gaussian(0, sigma_0^2V_(sp))}

where \eqn{\beta_0}{beta_0} and \eqn{\beta_1}{beta_1} are fixed
effects, and \eqn{V_{sp}}{V_(sp)} is a variance-covariance matrix
derived from a phylogeny (typically under the assumption of
Brownian motion evolution).

\code{z <- pglmm(Y ~ X + (1|sp__), data = data, family = "binomial", cov_ranef = list(sp = phy))}

As with the linear mixed model, it is a very good idea to
standardize the predictor (independent) variables to have mean 0
and variance 1. This will make the function more robust and improve
the interpretation of the regression coefficients.
}
\examples{
## Structure of examples:
# First, a (brief) description of model types, and how they are specified
# - these are *not* to be run 'as-is'; they show how models should be organised
# Second, a run-through of how to simulate, and then analyse, data
# - these *are* to be run 'as-is'; they show how to format and work with data

\donttest{
#########################################################
#First section; brief summary of models and their use####
#########################################################
## Model structures from Ives & Helmus (2011)
# dat = data set for regression (note: must have a column "sp" and a column "site")
# phy = phylogeney of class "phylo"
# repulsion = to test phylogenetic repulsion or not

# Model 1 (Eq. 1)
z <- pglmm(freq ~ sp + (1|site) + (1|sp__@site), data = dat, family = "binomial", 
           cov_ranef = list(sp = phy), REML = TRUE, verbose = TRUE, s2.init=.1)

# Model 2 (Eq. 2)
z <- pglmm(freq ~ sp + X + (1|site) + (X|sp__), data = dat, family = "binomial",
           cov_ranef = list(sp = phy), REML = TRUE, verbose = TRUE, s2.init=.1)

# Model 3 (Eq. 3)
z <- pglmm(freq ~ sp*X + (1|site) + (1|sp__@site), data = dat, family = "binomial",
           cov_ranef = list(sp = phy), REML = TRUE, verbose = TRUE, s2.init=.1)

## Model structure from Rafferty & Ives (2013) (Eq. 3)
# dat = data set
# phyPol = phylogeny for pollinators (pol)
# phyPlt = phylogeny for plants (plt)

z <- pglmm(freq ~ pol * X + (1|pol__) + (1|plt__) + (1|pol__@plt) +
           (1|pol@plt__) + (1|pol__@plt__), 
           data = dat, family = "binomial", 
           cov_ranef = list(pol = phyPol, plt = phyPlt), 
           REML = TRUE, verbose = TRUE, s2.init=.1)
}

#########################################################
#Second section; detailed simulation and analysis #######
#########################################################
library(ape)

# Generate simulated data for nspp species and nsite sites
nspp <- 15
nsite <- 10

# residual variance (set to zero for binary data)
sd.resid <- 0

# fixed effects
beta0 <- 0
beta1 <- 0

# magnitude of random effects
sd.B0 <- 1
sd.B1 <- 1

# whether or not to include phylogenetic signal in B0 and B1
signal.B0 <- TRUE
signal.B1 <- TRUE

# simulate a phylogenetic tree
phy <- rtree(n = nspp)
phy <- compute.brlen(phy, method = "Grafen", power = 0.5)

# standardize the phylogenetic covariance matrix to have determinant 1
Vphy <- vcv(phy)
Vphy <- Vphy/(det(Vphy)^(1/nspp))

# Generate environmental site variable
X <- matrix(1:nsite, nrow = 1, ncol = nsite)
X <- (X - mean(X))/sd(X)

# Perform a Cholesky decomposition of Vphy. This is used to
# generate phylogenetic signal: a vector of independent normal random
# variables, when multiplied by the transpose of the Cholesky
# deposition of Vphy will have covariance matrix equal to Vphy.

iD <- t(chol(Vphy))

# Set up species-specific regression coefficients as random effects
if (signal.B0 == TRUE) {
  b0 <- beta0 + iD \%*\% rnorm(nspp, sd = sd.B0)
} else {
  b0 <- beta0 + rnorm(nspp, sd = sd.B0)
}
if (signal.B1 == TRUE) {
  b1 <- beta1 + iD \%*\% rnorm(nspp, sd = sd.B1)
} else {
  b1 <- beta1 + rnorm(nspp, sd = sd.B1)
}

# Simulate species abundances among sites to give matrix Y that
# contains species in rows and sites in columns
y <- rep(b0, each=nsite)
y <- y + rep(b1, each=nsite) * rep(X, nspp)
y <- y + rnorm(nspp*nsite) #add some random 'error'
Y <- rbinom(length(y), size=1, prob=exp(y)/(1+exp(y)))
y <- matrix(outer(b0, array(1, dim = c(1, nsite))), nrow = nspp,
            ncol = nsite) + matrix(outer(b1, X), nrow = nspp, ncol = nsite)
e <- rnorm(nspp * nsite, sd = sd.resid)
y <- y + matrix(e, nrow = nspp, ncol = nsite)
y <- matrix(y, nrow = nspp * nsite, ncol = 1)

Y <- rbinom(n = length(y), size = 1, prob = exp(y)/(1 + exp(y)))
Y <- matrix(Y, nrow = nspp, ncol = nsite)

# name the simulated species 1:nspp and sites 1:nsites
rownames(Y) <- 1:nspp
colnames(Y) <- 1:nsite

opar <- par(mfrow = c(3, 1), las = 1, mar = c(2, 4, 2, 2) - 0.1)
matplot(t(X), type = "l", ylab = "X", main = "X among sites")
hist(b0, xlab = "b0", main = "b0 among species")
hist(b1, xlab = "b1", main = "b1 among species")

#Plot out; you get essentially this from plot(your.pglmm.model)
image(t(Y), ylab = "species", xlab = "sites", main = "abundance",
      col=c("black","white"))
par(opar)

# Transform data matrices into "long" form, and generate a data frame
YY <- matrix(Y, nrow = nspp * nsite, ncol = 1)

XX <- matrix(kronecker(X, matrix(1, nrow = nspp, ncol = 1)), nrow =
               nspp * nsite, ncol = 1)

site <- matrix(kronecker(1:nsite, matrix(1, nrow = nspp, ncol =
                                           1)), nrow = nspp * nsite, ncol = 1)
sp <- paste0("t", matrix(kronecker(matrix(1, nrow = nsite, ncol = 1), 1:nspp),
                         nrow = nspp * nsite, ncol = 1))

dat <- data.frame(Y = YY, X = XX, site = as.factor(site), sp = as.factor(sp))

# Random effects
# random intercept with species independent
# random intercept with species showing phylogenetic covariances
# random slope with species independent
# random slope with species showing phylogenetic covariances
# random effect for site
pglmm(Y ~ X + (1|site), data = dat, family = "binomial", REML = TRUE)
# The rest of these tests are not run to save CRAN server time;
# - please take a look at them because they're *very* useful!
\donttest{ 
  z.binary <- pglmm(Y ~ X + (1|sp__) + (X|sp__), data = dat, family = "binomial",
                    cov_ranef = list(sp = phy), REML = TRUE, verbose = FALSE, 
                    optimizer = "Nelder-Mead")
  
  # output results
  z.binary
  plot(z.binary) # orginal data
  
  # test statistical significance of the phylogenetic random effect
  # on species slopes using a likelihood ratio test
  pglmm.profile.LRT(z.binary, re.number = 4)$Pr
  
  # extract the predicted values of Y
  pglmm.predicted.values(z.binary)
  # plot both orginal data and predicted data (in logit^-1 space)
  plot(z.binary, predicted = TRUE) 
  
  # examine the structure of the first covariance matrix
  ar1 = pglmm.matrix.structure(Y ~ X + (1|sp__) + (X|sp__), data = dat, 
                               family = "binomial", 
                               cov_ranef = list(sp = phy))
  Matrix::image(ar1)
  
  # plot random terms' var-cov matrix
  pglmm.plot.re(x = z.binary)
  
  # compare results to glmer() when the model contains no
  # phylogenetic covariance among species; the results should be
  # similar.
  pglmm(Y ~ X + (1|sp) + (X|sp), data = dat, family = "binomial", REML = FALSE)
  
  # lmer
  if(require(lme4)){
    summary(lme4::glmer(Y ~ X + (1 | sp) + (0 + X | sp), data=dat, family = "binomial"))
    
    # compare results to lmer() when the model contains no phylogenetic
    # covariance among species; the results should be similar.
    pglmm(Y ~ X + (1 | sp) + (0 + X | sp), data = dat, family = "gaussian", REML = FALSE)
    summary(lme4::lmer(Y ~ X + (1 | sp) + (0 + X | sp), data=dat, REML = FALSE))    
  }  
}
}
\references{
Ives, A. R. and M. R. Helmus. 2011. Generalized linear
mixed models for phylogenetic analyses of community
structure. Ecological Monographs 81:511-525.

Rafferty, N. E., and A. R. Ives. 2013. Phylogenetic
trait-based analyses of ecological networks. Ecology 94:2321-2333.

Simpson, Daniel, et al. 2017. Penalising model component complexity:
A principled, practical approach to constructing priors.
Statistical science 32(1): 1-28.

Li, D., Ives, A. R., & Waller, D. M. 2017.
Can functional traits account for phylogenetic signal in community composition?
New Phytologist, 214(2), 607-618.
}
\author{
Anthony R. Ives, Daijiang Li, Russell Dinnage
}
