Type: Package
Title: Spatio-Temporal Generalised Linear Mixed Models for Areal Unit Data
Version: 4.0
Date: 2023-10-31
Author: Duncan Lee, Alastair Rushworth, Gary Napier and William Pettersson
Maintainer: Duncan Lee <Duncan.Lee@glasgow.ac.uk>
Description: Implements a class of univariate and multivariate spatio-temporal generalised linear mixed models for areal unit data, with inference in a Bayesian setting using Markov chain Monte Carlo (MCMC) simulation. The response variable can be binomial, Gaussian, or Poisson, but for some models only the binomial and Poisson data likelihoods are available. The spatio-temporal autocorrelation is modelled by random effects, which are assigned conditional autoregressive (CAR) style prior distributions. A number of different random effects structures are available, including models similar to Rushworth et al. (2014) <doi:10.1016/j.sste.2014.05.001>. Full details are given in the vignette accompanying this package. The creation and development of this package was supported by the Engineering and Physical Sciences Research Council (EPSRC) grants EP/J017442/1 and EP/T004878/1 and the Medical Research Council (MRC) grant MR/L022184/1.
License: GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]
Depends: MASS, R (≥ 3.5.0), Rcpp (≥ 0.11.5)
Imports: CARBayesdata, coda, dplyr, GGally, ggplot2, gridExtra, gtools, leaflet, matrixStats, MCMCpack, parallel, sf, spam, spdep, stats, truncdist, truncnorm, utils
LinkingTo: Rcpp
LazyLoad: yes
ByteCompile: yes
URL: https://github.com/duncanplee/CARBayesST
BugReports: https://github.com/duncanplee/CARBayesST/issues
NeedsCompilation: yes
Packaged: 2023-10-30 15:43:55 UTC; duncanlee
Repository: CRAN
Date/Publication: 2023-10-30 16:40:02 UTC

Spatio-Temporal Generalised Linear Mixed Models For Areal Unit Data

Description

Implements a class of univariate and multivariate spatio-temporal generalised linear mixed models for areal unit data, with inference in a Bayesian setting using Markov chain Monte Carlo (MCMC) simulation. The response variable can be binomial, Gaussian or Poisson, but for some models only the binomial and Poisson data likelihoods are available. The spatio-temporal autocorrelation is modelled by random effects, which are assigned conditional autoregressive (CAR) style prior distributions. A number of different random effects structures are available, and full details are given in the vignette accompanying this package and the references below. The creation and development of this package was supported by the Engineering and Physical Sciences Research Council (EPSRC) grants EP/J017442/1 and EP/T004878/1 and the Medical Research Council (MRC) grant MR/L022184/1.

Details

Package: CARBayesST
Type: Package
Version: 4.0
Date: 2023-10-31
License: GPL (>= 2)

Author(s)

Author: Duncan Lee, Alastair Rushworth, Gary Napier and William Pettersson

Maintainer: Duncan Lee <Duncan.Lee@glasgow.ac.uk>

References

Bernardinelli, L., D. Clayton, C.Pascuto, C.Montomoli, M.Ghislandi, and M. Songini (1995). Bayesian analysis of space-time variation in disease risk. Statistics in Medicine, 14, 2433-2443.

Knorr-Held, L. (2000). Bayesian modelling of inseparable space-time variation in disease risk. Statistics in Medicine, 19, 2555-2567.

Lee, D and Lawson, C (2016). Quantifying the spatial inequality and temporal trends in maternal smoking rates in Glasgow, Annals of Applied Statistics, 10, 1427-1446.

Lee, D and Rushworth, A and Napier, G (2018). Spatio-Temporal Areal Unit Modeling in R with Conditional Autoregressive Priors Using the CARBayesST Package, Journal of Statistical Software, 84, 9, 1-39.

Lee, D and Meeks, K and Pettersson, W (2021). Improved inference for areal unit count data using graph-based optimisation. Statistics and Computing, 31:51.

Lee D, Robertson C, and Marques, D (2022). Quantifying the small-area spatio-temporal dynamics of the Covid-19 pandemic in Scotland during a period with limited testing capacity, Spatial Statistics, https://doi.org/10.1016/j.spasta.2021.100508.

Napier, G, D. Lee, C. Robertson, A. Lawson, and K. Pollock (2016). A model to estimate the impact of changes in MMR vaccination uptake on inequalities in measles susceptibility in Scotland, Statistical Methods in Medical Research, 25, 1185-1200.

Napier, G., Lee, D., Robertson, C., and Lawson, A. (2019). A Bayesian space-time model for clustering areal units based on their disease trends, Biostatistics, 20, 681-697.

Rushworth, A., D. Lee, and R. Mitchell (2014). A spatio-temporal model for estimating the long-term effects of air pollution on respiratory hospital admissions in Greater London. Spatial and Spatio-temporal Epidemiology 10, 29-38.

Rushworth, A., Lee, D., and Sarran, C (2017). An adaptive spatio-temporal smoothing model for estimating trends and step changes in disease risk. Journal of the Royal Statistical Society Series C, 66, 141-157.

Examples

## See the examples in the function specific help files and in the vignette
## accompanying this package.

Fit a multivariate spatio-temporal generalised linear mixed model to data, with a multivariate spatio-temporal autoregressive process.

Description

Fit a multivariate spatio-temporal generalised linear mixed model to multivariate areal unit data, where the response variable can be binomial, Gaussian or Poisson. The linear predictor is modelled by known covariates and a vector of random effects. The latter allows for correlations over: (i) K areal units; (ii) N time periods; and (iii) J outcomes. These random effects are modelled by either a multivariate first order autoregressive time series process or a multivariate second order autoregressive time series process. In both cases the spatial and between outcome correlation is modelled via the precision matrix, and the spatial correlation is represented by the conditional autoregressive (CAR) prior proposed by Leroux et al. (2000). In contrast, the between outcome correlation structure is estimated from the data, and no prior form is assumed. Missing values are allowed in the response in this model, and are sampled from in the model using data augmentation. Further details are given in the vignette accompanying this package. Inference is conducted in a Bayesian setting using Markov chain Monte Carlo (MCMC) simulation.

Usage

MVST.CARar(formula, family, data=NULL,  trials=NULL, W, burnin, n.sample, thin=1, 
n.chains=1,  n.cores=1, prior.mean.beta=NULL, prior.var.beta=NULL, prior.nu2=NULL, 
prior.Sigma.df=NULL, prior.Sigma.scale=NULL, AR=NULL, rho.S=NULL, rho.T=NULL, 
MALA=TRUE, verbose=TRUE)

Arguments

formula

A formula for the covariate part of the model using the syntax of the lm() function. Offsets can be included here using the offset() function. The response and the offset (if included) should be matrices of dimension (KN)*J, where K is the number of spatial units, N is the number of time periods and J is the number of different variables. Each column of the response and offset matrices relates to a different outcome variable. The values in each column of these matrices should be ordered so that the first K data points are the set of all K spatial locations at time 1, the next K are the set of spatial locations for time 2 and so on. The covariates should each be a (KN)*1 vector, and different regression parameters are estimated for each of the J variables. The response can contain missing (NA) values.

family

One of either "binomial", "gaussian" or "poisson", which respectively specify a binomial likelihood model with a logistic link function, a Gaussian likelihood model with an identity link function, or a Poisson likelihood model with a log link function.

data

An optional data.frame containing the variables in the formula.

trials

A (KN)*J matrix of the same dimension as the response. Only used if family="binomial".

W

A non-negative K by K neighbourhood matrix (where K is the number of spatial units). Typically a binary specification is used, where the jkth element equals one if areas (j, k) are spatially close (e.g. share a common border) and is zero otherwise. The matrix can be non-binary, but each row must contain at least one non-zero entry.

burnin

The number of MCMC samples to discard as the burn-in period.

n.sample

The number of MCMC samples to generate.

thin

The level of thinning to apply to the MCMC samples to reduce their temporal autocorrelation. Defaults to 1 (no thinning).

n.chains

The number of MCMC chains to run when fitting the model. Defaults to 1.

n.cores

The number of computer cores to run the MCMC chains on. Must be less than or equal to n.chains. Defaults to 1.

prior.mean.beta

A vector of prior means for the regression parameters beta (Gaussian priors are assumed). Defaults to a vector of zeros.

prior.var.beta

A vector of prior variances for the regression parameters beta (Gaussian priors are assumed). Defaults to a vector with values 100,000.

prior.nu2

The prior shape and scale in the form of c(shape, scale) for an Inverse-Gamma(shape, scale) prior for nu2_j for each outcome j. Defaults to c(1, 0.01) and only used if family="Gaussian".

prior.Sigma.df

The degrees of freedom for the Inverse-Wishart prior formulation for the covariance matrix Sigma. This prior formulation follows the marginally weakly-informative specification proposed by Huang and Wand (2013). Defaults to 2, which corresponds to non-informative uniform priors on the interval [-1,1] being assigned to each correlation parameter within the Sigma matrix.

prior.Sigma.scale

The J times 1 vector of prior scales for the square roots of the diagonal elements of the covariance matrix Sigma. This prior formulation is the marginally weakly-informative prior specification proposed by Huang and Wand (2013). Thus, the jth element of this vector is the scale parameter for the zero centred half-t prior (with shape given by prior.Sigma.df) assumed for the standard deviation of the random effects corresponding to the jth outcome. Defaults to a vector of values of 100,000.

AR

The order of the autoregressive time series process that must be either 1 or 2.

rho.S

The value in the interval [0, 1] that the spatial dependence parameter rho.S is fixed at if it should not be estimated. If this arugment is NULL then rho.S is estimated in the model.

rho.T

Whether to fix or estimate the temporal dependence parameter(s) rho.T in the model. If this arugment is NULL then they are estimated in the model. If you want to fix them and AR=1 then it must be a single value. If AR=2 then it must be a vector of length two with the first and second order autoregressive coefficients.

MALA

Logical, should the function use Metropolis adjusted Langevin algorithm (MALA) updates (TRUE, default) or simple random walk (FALSE) updates for the regression parameters. Not applicable if family="gaussian".

verbose

Logical, should the function update the user on its progress.

Value

summary.results

A summary table of the parameters.

samples

A list containing the MCMC samples from the model.

fitted.values

A matrix of fitted values for each area, time period and response variable in the same order as the response variable.

residuals

A list with 2 elements, where each element is a matrix of a type of residual. Each row of a matrix relates to an area and time period and each column to a response (category). The types of residual are "response" (raw), and "pearson".

modelfit

Model fit criteria including the Deviance Information Criterion (DIC) and its corresponding estimated effective number of parameters (p.d), the Log Marginal Predictive Likelihood (LMPL), the Watanabe-Akaike Information Criterion (WAIC) and its corresponding estimated number of effective parameters (p.w), and the loglikelihood.

accept

The acceptance probabilities for the parameters.

localised.structure

NULL, for compatability with the other models.

formula

The formula (as a text string) for the response, covariate and offset parts of the model.

model

A text string describing the model fit.

mcmc.info

A vector giving details of the numbers of MCMC samples generated.

X

The design matrix of covariates.

Author(s)

Duncan Lee

References

Gelfand, A and Vounatsou, P (2003). Proper multivariate conditional autoregressive models for spatial data analysis, Biostatistics, 4, 11-25.

Huang, A., and Wand, M (2013). Simple Marginally Noninformative Prior Distributions for Covariance Matrices. Bayesian Analysis, 8, 439-452.

Lee D, Robertson C, and Marques, D (2022). Quantifying the small-area spatio-temporal dynamics of the Covid-19 pandemic in Scotland during a period with limited testing capacity, Spatial Statistics, https://doi.org/10.1016/j.spasta.2021.100508.

Leroux B, Lei X, Breslow N (2000). "Estimation of Disease Rates in SmallAreas: A New Mixed Model for Spatial Dependence." In M Halloran, D Berry (eds.), Statistical Models in Epidemiology, the Environment and Clinical Trials, pp. 179-191. Springer-Verlag, New York.

Examples

#################################################
#### Run the model on simulated data on a lattice
#################################################
#### Set up a square lattice region
x.easting <- 1:8
x.northing <- 1:8
Grid <- expand.grid(x.easting, x.northing)


#### Set up the coordinate dimensions
K <- nrow(Grid)
N <- 15
J <- 2
N.all <- N * K * J


#### set up distance and neighbourhood (W, based on sharing a common border) matrices
distance <- as.matrix(dist(Grid))
W <-array(0, c(K,K))
W[distance==1] <-1 	


#### Set up the spatial covariance matrix
Q.W <- 0.8 * (diag(apply(W, 2, sum)) - W) + 0.2 * diag(rep(1,K))
Q.W.inv <- solve(Q.W)


#### Set up the multivariate outcome covariance matrix
Sigma <- 0.01 * array(c(1, 1, 1, 2), c(2,2))
Sigma.inv <- solve(Sigma)


#### Spatial and between outcome covariance
QSig.prec <- kronecker(Q.W, Sigma.inv)
QSig.var <-solve(QSig.prec)


#### Generate the covariate
x1 <- rnorm(n=N * K, mean=0, sd=1)
lp.regression.mat <- cbind(0.1 + 0.1 * x1, 0.1 - 0.1*x1)
lp.regression <- as.numeric(t(lp.regression.mat))


#### Spatio-temporal random effects
phi.temp <- mvrnorm(n=1, mu=rep(0,(J*K)), Sigma=QSig.var)
phi <- phi.temp
    for(i in 2:N)
    {
    phi.temp2 <- mvrnorm(n=1, mu=(0.8 * phi.temp), Sigma=QSig.var)
    phi.temp <- phi.temp2
    phi <- c(phi, phi.temp)
    }
phi <- phi - mean(phi)
phi.true <- matrix(phi, ncol=2, byrow=TRUE)


#### Generate the binomial counts
lp <- lp.regression + phi
p <- exp(lp) / (1+exp(lp))
trials <- rpois(N.all, lambda=100)
Y <- rbinom(n=N.all, size=trials, prob=p)
Y.mat <- matrix(Y, nrow=(K*N), ncol=J, byrow=TRUE)
trials.mat <- matrix(trials, nrow=(K*N), ncol=J, byrow=TRUE)
formula <- Y.mat~x1

#### Run the model
formula <- Y.mat ~ x1
## Not run: mod <- MVST.CARar(formula=formula, family="binomial", trials=trials.mat, W=W, 
burnin=10000, n.sample=50000, AR=1, MALA=FALSE)
## End(Not run)

#### Toy example for checking
mod <- MVST.CARar(formula=formula, family="binomial", trials=trials.mat, W=W, 
burnin=10, n.sample=50, AR=1, MALA=FALSE)

Fit a spatio-temporal generalised linear mixed model to data, with a spatio-temporal autoregressive process that has an adaptive autocorrelation stucture.

Description

Fit a spatio-temporal generalised linear mixed model to areal unit data, where the response variable can be binomial, Gaussian or Poisson. The linear predictor is modelled by known covariates and a vector of random effects. The latter follows a multivariate first order autoregressive time series process, where spatial autocorrelation is modelled via the precision matrix, which is based on a CAR type structure and a neighbourhood (adjacency) matrix W. The non-zero elements of W associated with each pair of geographically adjacent areal units are treated as random variables with ranges in the unit interval, which allows step changes to be identified in the random effects surface between geographically adjacent regions. The model is similar to that proposed by Rushworth et al. (2017). Further details are given in the vignette accompanying this package. Inference is conducted in a Bayesian setting using Markov chain Monte Carlo (MCMC) simulation.

Usage

ST.CARadaptive(formula, family, data=NULL, trials=NULL, W, burnin, n.sample, thin=1,  
prior.mean.beta=NULL, prior.var.beta=NULL, prior.nu2=NULL, prior.tau2=NULL, 
rho = NULL, epsilon = 0, MALA=TRUE, verbose=TRUE) 

Arguments

formula

A formula for the covariate part of the model using the syntax of the lm() function. Offsets can be included here using the offset() function. The response and each covariate should be vectors of length (KN)*1, where K is the number of spatial units and N is the number of time periods. Each vector should be ordered so that the first K data points are the set of all K spatial locations at time 1, the next K are the set of spatial locations for time 2 and so on. The response must NOT contain missing (NA) values.

family

One of either "binomial", "gaussian", or "poisson", which respectively specify a binomial likelihood model with a logistic link function, a Gaussian likelihood model with an identity link function, or a Poisson likelihood model with a log link function.

data

An optional data.frame containing the variables in the formula.

trials

A vector the same length and in the same order as the response containing the total number of trials for each area and time period. Only used if family="binomial".

W

A non-negative K by K neighbourhood matrix (where K is the number of spatial units). Typically a binary specification is used, where the jkth element equals one if areas (j, k) are spatially close (e.g. share a common border) and is zero otherwise. For this model the matrix must be binary.

burnin

The number of MCMC samples to discard as the burn-in period.

n.sample

The number of MCMC samples to generate.

thin

The level of thinning to apply to the MCMC samples to reduce their temporal autocorrelation. Defaults to 1 (no thinning).

prior.mean.beta

A vector of prior means for the regression parameters beta (Gaussian priors are assumed). Defaults to a vector of zeros.

prior.var.beta

A vector of prior variances for the regression parameters beta (Gaussian priors are assumed). Defaults to a vector with values 100,000.

prior.nu2

The prior shape and scale in the form of c(shape, scale) for an Inverse-Gamma(shape, scale) prior for nu2. Defaults to c(1, 0.01) and only used if family="Gaussian".

prior.tau2

The prior shape and scale in the form of c(shape, scale) for an Inverse-Gamma(shape, scale) prior for tau2. Defaults to c(1, 0.01).

rho

The value in the interval [0, 1] that the spatial dependence parameter rho is fixed at if it should not be estimated. If this arugment is NULL then rho is estimated in the model. Setting rho=1, reduces the random effects prior to the intrinsic CAR model but does require epsilon>0.

epsilon

Diagonal ridge parameter to add to the random effects prior precision matrix, only required when rho = 1, and the prior precision is improper. Defaults to 0.

MALA

Logical, should the function use Metropolis adjusted Langevin algorithm (MALA) updates (TRUE, default) or simple random walk (FALSE) updates for the regression parameters. Not applicable if family="gaussian".

verbose

Logical, should the function update the user on its progress.

Value

summary.results

A summary table of the parameters.

samples

A list containing the MCMC samples from the model.

fitted.values

A vector of fitted values for each area and time period.

residuals

A matrix with 2 columns where each column is a type of residual and each row relates to an area and time period. The types are "response" (raw), and "pearson".

modelfit

Model fit criteria including the Deviance Information Criterion (DIC) and its corresponding estimated effective number of parameters (p.d), the Log Marginal Predictive Likelihood (LMPL), the Watanabe-Akaike Information Criterion (WAIC) and its corresponding estimated number of effective parameters (p.w), and the loglikelihood.

accept

The acceptance probabilities for the parameters.

localised.structure

A list with 2 K*K matrices, Wmedian and W99 summarising the estimated adjacency relationships. Wmedian contains the posterior median for each w_ij element estimated in the model for adjacent areal units, while W99 contains binary indicator variables for whether Prob(w_ij < 0.5|data)>0.99. For both matrices, elements corresponding to non-adjacent pairs of areas have NA values.

formula

The formula (as a text string) for the response, covariate and offset parts of the model.

model

A text string describing the model fit.

X

The design matrix of covariates.

Author(s)

Alastair Rushworth

References

Rushworth, A., Lee, D., and Sarran, C (2017). An adaptive spatio-temporal smoothing model for estimating trends and step changes in disease risk. Journal of the Royal Statistical Society Series C, 66, 141-157.

Examples

#################################################
#### Run the model on simulated data on a lattice
#################################################
#### set up the regular lattice    
x.easting <- 1:10
x.northing <- 1:10
Grid <- expand.grid(x.easting, x.northing)
K <- nrow(Grid)
N <- 10
N.all <- N * K
  
    
#### set up spatial neighbourhood matrix W
distance <- as.matrix(dist(Grid))
W <-array(0, c(K,K))
W[distance==1] <-1 	


#### Simulate the elements in the linear predictor and the data
Q.W <- 0.99 * (diag(apply(W, 2, sum)) - W) + 0.01 * diag(rep(1,K))
Q.W.inv <- solve(Q.W)
phi.temp <- mvrnorm(n=1, mu=rep(0,K), Sigma=(0.1 * Q.W.inv))
phi <- phi.temp
    for(i in 2:N)
    {
    phi.temp2 <- mvrnorm(n=1, mu=(0.8 * phi.temp), Sigma=(0.1 * Q.W.inv))
    phi.temp <- phi.temp2
    phi <- c(phi, phi.temp)
    }
jump <- rep(c(rep(2, 70), rep(4, 30)),N)
LP <- jump + phi
fitted <- exp(LP)
Y <- rpois(n=N.all, lambda=fitted)


#### Run the model     
## Not run: model <- ST.CARadaptive(formula=Y~1, family="poisson", W=W, burnin=10000,
n.sample=50000)
## End(Not run)


#### Toy example for checking    
model <- ST.CARadaptive(formula=Y~1, family="poisson", W=W, burnin=10,
n.sample=50)

Fit a spatio-temporal generalised linear mixed model to data, with spatial and temporal main effects and a spatio-temporal interaction.

Description

Fit a spatio-temporal generalised linear mixed model to areal unit data, where the response variable can be binomial, Gaussian or Poisson. The linear predictor is modelled by known covariates and three vectors of random effects. The latter include spatial and temporal main effects and a spatio-temporal interaction. The spatial and temporal main effects are modelled by the conditional autoregressive (CAR) prior proposed by Leroux et al. (2000), while the spatio-temporal interaction random effects are independent. Due to the lack of identifiability between the interactions and the Gaussian errors, only main effects are allowed in the Gaussian model. Missing values are allowed in the response in this model, and are sampled from in the model using data augmentation. The model is similar to that proposed by Knorr-Held (2000), and further details are given in the vignette accompanying this package. Inference is conducted in a Bayesian setting using Markov chain Monte Carlo (MCMC) simulation.

Usage

ST.CARanova(formula, family, data=NULL,  trials=NULL, W, interaction=TRUE, burnin, 
n.sample, thin=1, n.chains=1,  n.cores=1, prior.mean.beta=NULL, prior.var.beta=NULL, 
prior.nu2=NULL, prior.tau2=NULL, rho.S=NULL, rho.T=NULL, MALA=TRUE, verbose=TRUE)

Arguments

formula

A formula for the covariate part of the model using the syntax of the lm() function. Offsets can be included here using the offset() function. The response and each covariate should be vectors of length (KN)*1, where K is the number of spatial units and N is the number of time periods. Each vector should be ordered so that the first K data points are the set of all K spatial locations at time 1, the next K are the set of spatial locations for time 2 and so on. The response can contain missing (NA) values.

family

One of either "binomial", "gaussian" or "poisson", which respectively specify a binomial likelihood model with a logistic link function, a Gaussian likelihood model with an identity link function, or a Poisson likelihood model with a log link function.

data

An optional data.frame containing the variables in the formula.

trials

A vector the same length and in the same order as the response containing the total number of trials for each area and time period. Only used if family="binomial".

W

A non-negative K by K neighbourhood matrix (where K is the number of spatial units). Typically a binary specification is used, where the jkth element equals one if areas (j, k) are spatially close (e.g. share a common border) and is zero otherwise. The matrix can be non-binary, but each row must contain at least one non-zero entry.

interaction

TRUE or FALSE indicating whether the spatio-temporal interaction random effects should be included. Defaults to TRUE unless family="gaussian" in which case interactions are not allowed.

burnin

The number of MCMC samples to discard as the burn-in period.

n.sample

The number of MCMC samples to generate.

thin

The level of thinning to apply to the MCMC samples to reduce their temporal autocorrelation. Defaults to 1 (no thinning).

n.chains

The number of MCMC chains to run when fitting the model. Defaults to 1.

n.cores

The number of computer cores to run the MCMC chains on. Must be less than or equal to n.chains. Defaults to 1.

prior.mean.beta

A vector of prior means for the regression parameters beta (Gaussian priors are assumed). Defaults to a vector of zeros.

prior.var.beta

A vector of prior variances for the regression parameters beta (Gaussian priors are assumed). Defaults to a vector with values 100,000.

prior.nu2

The prior shape and scale in the form of c(shape, scale) for an Inverse-Gamma(shape, scale) prior for nu2. Defaults to c(1, 0.01) and only used if family="Gaussian".

prior.tau2

The prior shape and scale in the form of c(shape, scale) for an Inverse-Gamma(shape, scale) prior for tau2. Defaults to c(1, 0.01).

rho.S

The value in the interval [0, 1] that the spatial dependence parameter rho.S is fixed at if it should not be estimated. If this arugment is NULL then rho.S is estimated in the model.

rho.T

The value in the interval [0, 1] that the temporal dependence parameter rho.T is fixed at if it should not be estimated. If this arugment is NULL then rho.T is estimated in the model.

MALA

Logical, should the function use Metropolis adjusted Langevin algorithm (MALA) updates (TRUE, default) or simple random walk (FALSE) updates for the regression parameters. Not applicable if family="gaussian".

verbose

Logical, should the function update the user on its progress.

Value

summary.results

A summary table of the parameters.

samples

A list containing the MCMC samples from the model.

fitted.values

A vector of fitted values for each area and time period.

residuals

A matrix with 2 columns where each column is a type of residual and each row relates to an area and time period. The types are "response" (raw), and "pearson".

modelfit

Model fit criteria including the Deviance Information Criterion (DIC) and its corresponding estimated effective number of parameters (p.d), the Log Marginal Predictive Likelihood (LMPL), the Watanabe-Akaike Information Criterion (WAIC) and its corresponding estimated number of effective parameters (p.w), and the loglikelihood.

accept

The acceptance probabilities for the parameters.

localised.structure

NULL, for compatability with the other models.

formula

The formula (as a text string) for the response, covariate and offset parts of the model.

model

A text string describing the model fit.

mcmc.info

A vector giving details of the numbers of MCMC samples generated.

X

The design matrix of covariates.

Author(s)

Duncan Lee

References

Knorr-Held, L. (2000). Bayesian modelling of inseparable space-time variation in disease risk. Statistics in Medicine, 19, 2555-2567.

Leroux, B., X. Lei, and N. Breslow (2000). Estimation of disease rates in small areas: A new mixed model for spatial dependence, Chapter Statistical Models in Epidemiology, the Environment and Clinical Trials, Halloran, M and Berry, D (eds), pp. 135-178. Springer-Verlag, New York.

Examples

#################################################
#### Run the model on simulated data on a lattice
#################################################
#### set up the regular lattice    
x.easting <- 1:10
x.northing <- 1:10
Grid <- expand.grid(x.easting, x.northing)
K <- nrow(Grid)
N <- 10
N.all <- N * K


#### set up spatial (W) and temporal (D) neighbourhood matrices
distance <- as.matrix(dist(Grid))
W <-array(0, c(K,K))
W[distance==1] <-1 	
	
D <-array(0, c(N,N))
for(i in 1:N)
{
    for(j in 1:N)
    {
        if(abs((i-j))==1)  D[i,j] <- 1 
    }	
}


#### Simulate the elements in the linear predictor and the data
gamma <- rnorm(n=N.all, mean=0, sd=0.001)
x <- rnorm(n=N.all, mean=0, sd=1)
beta <- 0.1

Q.W <- 0.99 * (diag(apply(W, 2, sum)) - W) + 0.01 * diag(rep(1,K))
Q.W.inv <- solve(Q.W)
phi <- mvrnorm(n=1, mu=rep(0,K), Sigma=(0.01 * Q.W.inv))

Q.D <- 0.99 * (diag(apply(D, 2, sum)) - D) + 0.01 * diag(rep(1,N))
Q.D.inv <- solve(Q.D)
delta <- mvrnorm(n=1, mu=rep(0,N), Sigma=(0.01 * Q.D.inv))

phi.long <- rep(phi, N)
delta.long <- kronecker(delta, rep(1,K))
LP <- 4 +  x * beta + phi.long +  delta.long + gamma
mean <- exp(LP)
Y <- rpois(n=N.all, lambda=mean)


#### Run the model
## Not run: model <- ST.CARanova(formula=Y~x, family="poisson", interaction=TRUE, 
W=W, burnin=10000, n.sample=50000)
## End(Not run)


#### Toy example for checking
model <- ST.CARanova(formula=Y~x, family="poisson", interaction=TRUE, 
W=W, burnin=10, n.sample=50)

Fit a spatio-temporal generalised linear mixed model to data, with a spatio-temporal autoregressive process.

Description

Fit a spatio-temporal generalised linear mixed model to areal unit data, where the response variable can be binomial, Gaussian or Poisson. The linear predictor is modelled by known covariates and a vector of random effects. The latter follows either a multivariate first order autoregressive time series process or a multivariate second order autoregressive time series process. In both cases the spatial autocorrelation is modelled via the precision matrix corresponding to the conditional autoregressive (CAR) prior proposed by Leroux et al. (2000), and the initial AR(1) model was proposed by Rushworth et al. (2014). Missing values are allowed in the response in this model, and are sampled from in the model using data augmentation. Further details are given in the vignette accompanying this package. Inference is conducted in a Bayesian setting using Markov chain Monte Carlo (MCMC) simulation.

Usage

ST.CARar(formula, family, data=NULL,  trials=NULL, W, burnin, n.sample, thin=1, 
n.chains=1,  n.cores=1, prior.mean.beta=NULL, prior.var.beta=NULL, prior.nu2=NULL, 
prior.tau2=NULL, AR=NULL, rho.S=NULL, rho.T=NULL, MALA=TRUE, verbose=TRUE)

Arguments

formula

A formula for the covariate part of the model using the syntax of the lm() function. Offsets can be included here using the offset() function. The response and each covariate should be vectors of length (KN)*1, where K is the number of spatial units and N is the number of time periods. Each vector should be ordered so that the first K data points are the set of all K spatial locations at time 1, the next K are the set of spatial locations for time 2 and so on. The response can contain missing (NA) values.

family

One of either "binomial", "gaussian" or "poisson", which respectively specify a binomial likelihood model with a logistic link function, a Gaussian likelihood model with an identity link function, or a Poisson likelihood model with a log link function.

data

An optional data.frame containing the variables in the formula.

trials

A vector the same length and in the same order as the response containing the total number of trials for each area and time period. Only used if family="binomial".

W

A non-negative K by K neighbourhood matrix (where K is the number of spatial units). Typically a binary specification is used, where the jkth element equals one if areas (j, k) are spatially close (e.g. share a common border) and is zero otherwise. The matrix can be non-binary, but each row must contain at least one non-zero entry.

burnin

The number of MCMC samples to discard as the burn-in period.

n.sample

The number of MCMC samples to generate.

thin

The level of thinning to apply to the MCMC samples to reduce their temporal autocorrelation. Defaults to 1 (no thinning).

n.chains

The number of MCMC chains to run when fitting the model. Defaults to 1.

n.cores

The number of computer cores to run the MCMC chains on. Must be less than or equal to n.chains. Defaults to 1.

prior.mean.beta

A vector of prior means for the regression parameters beta (Gaussian priors are assumed). Defaults to a vector of zeros.

prior.var.beta

A vector of prior variances for the regression parameters beta (Gaussian priors are assumed). Defaults to a vector with values 100,000.

prior.nu2

The prior shape and scale in the form of c(shape, scale) for an Inverse-Gamma(shape, scale) prior for nu2. Defaults to c(1, 0.01) and only used if family="Gaussian".

prior.tau2

The prior shape and scale in the form of c(shape, scale) for an Inverse-Gamma(shape, scale) prior for tau2. Defaults to c(1, 0.01).

AR

The order of the autoregressive time series process that must be either 1 or 2.

rho.S

The value in the interval [0, 1] that the spatial dependence parameter rho.S is fixed at if it should not be estimated. If this arugment is NULL then rho.S is estimated in the model.

rho.T

Whether to fix or estimate the temporal dependence parameter(s) rho.T in the model. If this arugment is NULL then they are estimated in the model. If you want to fix them and AR=1 then it must be a single value. If AR=2 then it must be a vector of length two with the first and second order autoregressive coefficients.

MALA

Logical, should the function use Metropolis adjusted Langevin algorithm (MALA) updates (TRUE, default) or simple random walk (FALSE) updates for the regression parameters. Not applicable if family="gaussian".

verbose

Logical, should the function update the user on its progress.

Value

summary.results

A summary table of the parameters.

samples

A list containing the MCMC samples from the model.

fitted.values

A vector of fitted values for each area and time period.

residuals

A matrix with 2 columns where each column is a type of residual and each row relates to an area and time period. The types are "response" (raw), and "pearson".

modelfit

Model fit criteria including the Deviance Information Criterion (DIC) and its corresponding estimated effective number of parameters (p.d), the Log Marginal Predictive Likelihood (LMPL), the Watanabe-Akaike Information Criterion (WAIC) and its corresponding estimated number of effective parameters (p.w), and the loglikelihood.

accept

The acceptance probabilities for the parameters.

localised.structure

NULL, for compatability with the other models.

formula

The formula (as a text string) for the response, covariate and offset parts of the model.

model

A text string describing the model fit.

mcmc.info

A vector giving details of the numbers of MCMC samples generated.

X

The design matrix of covariates.

Author(s)

Duncan Lee

References

Rushworth, A., D. Lee, and R. Mitchell (2014). A spatio-temporal model for estimating the long-term effects of air pollution on respiratory hospital admissions in Greater London. Spatial and Spatio-temporal Epidemiology 10, 29-38.

Examples

#################################################
#### Run the model on simulated data on a lattice
#################################################
#### set up the regular lattice    
x.easting <- 1:10
x.northing <- 1:10
Grid <- expand.grid(x.easting, x.northing)
K <- nrow(Grid)
N <- 10
N.all <- N * K

    
#### set up spatial neighbourhood matrix W
distance <- as.matrix(dist(Grid))
W <-array(0, c(K,K))
W[distance==1] <-1 	


#### Simulate the elements in the linear predictor and the data
gamma <- rnorm(n=N.all, mean=0, sd=0.001)
x <- rnorm(n=N.all, mean=0, sd=1)
beta <- 0.1
    
Q.W <- 0.99 * (diag(apply(W, 2, sum)) - W) + 0.01 * diag(rep(1,K))
Q.W.inv <- solve(Q.W)
phi.temp <- mvrnorm(n=1, mu=rep(0,K), Sigma=(0.1 * Q.W.inv))
phi <- phi.temp
    for(i in 2:N)
    {
    phi.temp2 <- mvrnorm(n=1, mu=(0.8 * phi.temp), Sigma=(0.1 * Q.W.inv))
    phi.temp <- phi.temp2
    phi <- c(phi, phi.temp)
    }
    
LP <- 3 + x * beta  + phi
mean <- exp(LP)
Y <- rpois(n=N.all, lambda=mean)
    

#### Run the model
## Not run: model <- ST.CARar(formula=Y~x, family="poisson",  W=W, burnin=10000,
    n.sample=50000, AR=1)
## End(Not run)
    
    
#### Toy example for checking  
model <- ST.CARar(formula=Y~x, family="poisson",  W=W, burnin=10,
    n.sample=50, AR=1)

Fit a spatio-temporal generalised linear mixed model to data, with a clustering of temporal trend functions and a temporally common spatial surface.

Description

Fit a spatio-temporal generalised linear mixed model to areal unit data, where the response variable can be binomial or Poisson. The linear predictor is modelled by known covariates, a temoporally common spatial surface, and a mixture of temporal trend functions. The spatial component is modelled by the conditional autoregressive (CAR) prior proposed by Leroux et al. (2000). The temporal trend functions are user-specified and are fixed parametric forms (e.g. linear, step-change) or constrained shapes (e.g. monotonically increasing). Further details are given in Napier et al. (2018) and in the vignette accompanying this package. Inference is conducted in a Bayesian setting using Metropolis coupled Markov chain Monte Carlo (MCMCMC) simulation.

Usage

    ST.CARclustrends(formula, family, data=NULL, trials=NULL, W, burnin, n.sample,
    thin=1, trends=NULL, changepoint=NULL, knots=NULL, prior.mean.beta=NULL,
    prior.var.beta=NULL, prior.mean.gamma=NULL, prior.var.gamma=NULL,
    prior.lambda=NULL, prior.tau2=NULL, Nchains=4, verbose=TRUE)

Arguments

formula

A formula using the syntax of the lm() function. However, due to identifiability issues covariates are not allowed. So the only elements allowed on the right side of the formula are an intercept term and an offset term using the offset() function. The response variable and the offset (if specified) should be vectors of length (KN)*1, where K is the number of spatial units and N is the number of time periods. Each vector should be ordered so that the first K data points are the set of all K spatial locations at time 1, the next K are the set of spatial locations for time 2 and so on.

family

One of either "binomial" or"poisson", which respectively specify a binomial likelihood model with a logistic link function, or a Poisson likelihood model with a log link function.

data

An optional data.frame containing the variables in the formula.

trials

A vector the same length and in the same order as the response containing the total number of trials for each area and time period. Only used if family="binomial".

W

A non-negative K by K neighbourhood matrix (where K is the number of spatial units). Typically a binary specification is used, where the jkth element equals one if areas (j, k) are spatially close (e.g. share a common border) and is zero otherwise. The matrix can be non-binary, but each row must contain at least one non-zero entry.

burnin

The number of MCMC samples to discard as the burn-in period.

n.sample

The number of MCMC samples to generate.

thin

The level of thinning to apply to the MCMC samples to reduce their temporal autocorrelation. Defaults to 1 (no thinning).

trends

A vector containing the temporal trend functions to include in the model, which include: constant ("Constant""); linear decreasing ("LD"); linear increasing ("LI"); Known change point, where the trend can increase towards the change point before subsequently decreasing ("CP"); or decrease towards the change point before subsequently increasing ("CT"); and monotonic cubic splines which are decreasing ("MD") or increasing ("MI"). At least two trends have to be selected, with the constant trend always included. To avoid identifiability problems only one of "LI" or "MI" can be included at a given time (similarily for "LD" and "MD").

changepoint

A scalar indicating the position of the change point should one of the change point trend functions be included in the trends vector, i.e. if "CP" or "CT" is specified.

knots

A scalar indicating the number of knots to use should one of the monotonic cubic splines trend functions be included in the trends vector, i.e. if "MD" or "MI" is specified.

prior.mean.beta

A vector of prior means for the regression parameters beta (Gaussian priors are assumed). Defaults to a vector of zeros.

prior.var.beta

A vector of prior variances for the regression parameters beta (Gaussian priors are assumed). Defaults to a vector with values 100,000.

prior.mean.gamma

A vector of prior means for the temporal trend parameters (Gaussian priors are assumed). Defaults to a vector of zeros.

prior.var.gamma

A vector of prior variances for the temporal trend parameters (Gaussian priors are assumed). Defaults to a vector with values 100,000.

prior.lambda

A vector of prior samples sizes for the Dirichlet prior controlling the probabilities that each trend function is chosen. The vector should be the same length as the trends vector and defaults to a vector of ones.

prior.tau2

The prior shape and scale in the form of c(shape, scale) for an Inverse-Gamma(shape, scale) prior for the random effect variances tau2. Defaults to c(1, 0.01).

Nchains

The number of parallel Markov chains to be used in the Metropolis coupled Markov chain Monte Carlo (MCMCMC) simulations. Defaults to 4.

verbose

Logical, should the function update the user on its progress.

Value

summary.results

A summary table of the parameters.

samples

A list containing the MCMC samples from the model.

fitted.values

A vector of fitted values for each area and time period.

residuals

A matrix with 2 columns where each column is a type of residual and each row relates to an area and time period. The types are "response" (raw), and "pearson".

modelfit

Model fit criteria including the Deviance Information Criterion (DIC) and its corresponding estimated effective number of parameters (p.d), the Log Marginal Predictive Likelihood (LMPL), the Watanabe-Akaike Information Criterion (WAIC) and its corresponding estimated number of effective parameters (p.w), and the loglikelihood.

accept

The acceptance probabilities for the parameters.

localised.structure

A list containing two elements. The first is "trends", which is a vector the same length and in the same order as the number of areas. The kth element specifies which trend area k has been allocated to based on the posterior mode. The second element is "trend.probs", which is a matrix containing the probabilities associated with each trend for each areal unit.

formula

The formula (as a text string) for the response, covariate and offset parts of the model.

model

A text string describing the model fit.

X

The design matrix of covariates.

Author(s)

Gary Napier

References

Leroux, B., Lei, X., and Breslow, N. (2000). Estimation of disease rates in small areas: A new mixed model for spatial dependence, Chapter Statistical Models in Epidemiology, the Environment and Clinical Trials, Halloran, M and Berry, D (eds), pp. 135-178. Springer-Verlag, New York.

Napier, G., Lee, D., Robertson, C., and Lawson, A. (2019). A Bayesian space-time model for clustering areal units based on their disease trends, Biostatistics, 20, 681-697.

Examples

##################################################
#### Run the model on simulated data on a lattice
##################################################
#### Load libraries
library(truncnorm)
library(gtools)


#### set up the regular lattice    
x.easting <- 1:10
x.northing <- 1:10
Grid <- expand.grid(x.easting, x.northing)
K <- nrow(Grid)
N <- 10
N.all <- N * K

#### set up spatial neighbourhood matrix W
distance <- as.matrix(dist(Grid))
W <-array(0, c(K,K))
W[distance==1] <-1 	


#### Create the spatial covariance matrix
Q.W <- 0.99 * (diag(apply(W, 2, sum)) - W) + 0.01 * diag(rep(1,K))
Q.W.inv <- solve(Q.W)
  
           
#### Simulate the elements in the linear predictor and the data
beta <- 0.01
gamma <- 0.7
phi <- mvrnorm(n=1, mu=rep(0,K), Sigma=(0.01 * Q.W.inv))

lambda <- rep(1/2, 2)
w <- t(rmultinom(K, 1, lambda))

Y <- matrix(NA, nrow = K, ncol = N)
for (i in 1:N)
{
  LP <- beta + w[, 2] * (gamma * i) + phi
  mean <- exp(LP)
  Y[, i] <- rpois(n=K, lambda=mean)
 }
Y <- as.numeric(Y)


#### Run the model
## Not run: model <- ST.CARclustrends(formula=Y~1, family="poisson", W=W, burnin=10000, 
n.sample=50000, trends=c("Constant", "LI"))
## End(Not run)


#### Toy example for checking
model <- ST.CARclustrends(formula=Y~1, family="poisson", W=W, burnin=10, 
n.sample=50, trends=c("Constant", "LI"))

Fit a spatio-temporal generalised linear mixed model to data, where the spatial units have linear time trends with spatially varying intercepts and slopes.

Description

Fit a spatio-temporal generalised linear mixed model to areal unit data, where the response variable can be binomial, Gaussian or Poisson. The linear predictor is modelled by known covariates and an area specific linear time trend. The area specific intercepts and slopes are spatially autocorrelated and modelled by the conditional autoregressive (CAR) prior proposed by Leroux et al. (2000). The model is similar to that proposed by Bernardinelli et al. (1995) and further details are given in the vignette accompanying this package. Missing values are allowed in the response in this model, and are sampled from in the model using data augmentation. Inference is conducted in a Bayesian setting using Markov chain Monte Carlo (MCMC) simulation.

Usage

ST.CARlinear(formula, family, data=NULL,  trials=NULL, W, burnin, n.sample, thin=1, 
n.chains=1,  n.cores=1, prior.mean.beta=NULL, prior.var.beta=NULL, 
prior.mean.alpha=NULL, prior.var.alpha=NULL, prior.nu2=NULL, prior.tau2=NULL, 
rho.slo=NULL, rho.int=NULL, MALA=TRUE, verbose=TRUE)

Arguments

formula

A formula for the covariate part of the model using the syntax of the lm() function. Offsets can be included here using the offset() function. The response and each covariate should be vectors of length (KN)*1, where K is the number of spatial units and N is the number of time periods. Each vector should be ordered so that the first K data points are the set of all K spatial locations at time 1, the next K are the set of spatial locations for time 2 and so on. The response can contain missing (NA) values.

family

One of either "binomial", "gaussian" or "poisson", which respectively specify a binomial likelihood model with a logistic link function, a Gaussian likelihood model with an identity link function, or a Poisson likelihood model with a log link function.

data

An optional data.frame containing the variables in the formula.

trials

A vector the same length and in the same order as the response containing the total number of trials for each area and time period. Only used if family="binomial".

W

A non-negative K by K neighbourhood matrix (where K is the number of spatial units). Typically a binary specification is used, where the jkth element equals one if areas (j, k) are spatially close (e.g. share a common border) and is zero otherwise. The matrix can be non-binary, but each row must contain at least one non-zero entry.

burnin

The number of MCMC samples to discard as the burn-in period.

n.sample

The number of MCMC samples to generate.

thin

The level of thinning to apply to the MCMC samples to reduce their temporal autocorrelation. Defaults to 1 (no thinning).

n.chains

The number of MCMC chains to run when fitting the model. Defaults to 1.

n.cores

The number of computer cores to run the MCMC chains on. Must be less than or equal to n.chains. Defaults to 1.

prior.mean.beta

A vector of prior means for the regression parameters beta (Gaussian priors are assumed). Defaults to a vector of zeros.

prior.var.beta

A vector of prior variances for the regression parameters beta (Gaussian priors are assumed). Defaults to a vector with values 100,000.

prior.mean.alpha

The prior mean for the average slope of the linear time trend alpha (a Gaussian prior is assumed). Defaults to zero.

prior.var.alpha

The prior variance for the average slope of the linear time trend alpha (a Gaussian prior is assumed). Defaults to 100,000.

prior.nu2

The prior shape and scale in the form of c(shape, scale) for an Inverse-Gamma(shape, scale) prior for nu2. Defaults to c(1, 0.01) and only used if family="Gaussian".

prior.tau2

The prior shape and scale in the form of c(shape, scale) for an Inverse-Gamma(shape, scale) prior for tau2. Defaults to c(1, 0.01).

rho.slo

The value in the interval [0, 1] that the spatial dependence parameter for the slope of the linear time trend, rho.slo, is fixed at if it should not be estimated. If this arugment is NULL then rho.slo is estimated in the model.

rho.int

The value in the interval [0, 1] that the spatial dependence parameter for the intercept of the linear time trend, rho.int, is fixed at if it should not be estimated. If this arugment is NULL then rho.int is estimated in the model.

MALA

Logical, should the function use Metropolis adjusted Langevin algorithm (MALA) updates (TRUE, default) or simple random walk (FALSE) updates for the regression parameters. Not applicable if family="gaussian".

verbose

Logical, should the function update the user on its progress.

Value

summary.results

A summary table of the parameters.

samples

A list containing the MCMC samples from the model.

fitted.values

A vector of fitted values for each area and time period.

residuals

A matrix with 2 columns where each column is a type of residual and each row relates to an area and time period. The types are "response" (raw), and "pearson".

modelfit

Model fit criteria including the Deviance Information Criterion (DIC) and its corresponding estimated effective number of parameters (p.d), the Log Marginal Predictive Likelihood (LMPL), the Watanabe-Akaike Information Criterion (WAIC) and its corresponding estimated number of effective parameters (p.w), and the loglikelihood.

accept

The acceptance probabilities for the parameters.

localised.structure

NULL, for compatability with the other models.

formula

The formula (as a text string) for the response, covariate and offset parts of the model.

model

A text string describing the model fit.

mcmc.info

A vector giving details of the numbers of MCMC samples generated.

X

The design matrix of covariates.

Author(s)

Duncan Lee

References

Bernardinelli, L., D. Clayton, C.Pascuto, C.Montomoli, M.Ghislandi, and M. Songini (1995). Bayesian analysis of space-time variation in disease risk. Statistics in Medicine, 14, 2433-2443.

Leroux, B., X. Lei, and N. Breslow (2000). Estimation of disease rates in small areas: A new mixed model for spatial dependence, Chapter Statistical Models in Epidemiology, the Environment and Clinical Trials, Halloran, M and Berry, D (eds), pp. 135-178. Springer-Verlag, New York.

Examples

#################################################
#### Run the model on simulated data on a lattice
#################################################
#### set up the regular lattice    
x.easting <- 1:10
x.northing <- 1:10
Grid <- expand.grid(x.easting, x.northing)
K <- nrow(Grid)
N <- 10
N.all <- K * N


#### set up spatial neighbourhood matrix W
distance <- as.matrix(dist(Grid))
W <-array(0, c(K,K))
W[distance==1] <-1 	


#### Simulate the elements in the linear predictor and the data
x <- rnorm(n=N.all, mean=0, sd=1)
beta <- 0.1
Q.W <- 0.99 * (diag(apply(W, 2, sum)) - W) + 0.01 * diag(rep(1,K))
Q.W.inv <- solve(Q.W)
phi <- mvrnorm(n=1, mu=rep(0,K), Sigma=(0.1 * Q.W.inv))
delta <- mvrnorm(n=1, mu=rep(0,K), Sigma=(0.1 * Q.W.inv))
trend <- array(NA, c(K, N))
time <-(1:N - mean(1:N))/N
    for(i in 1:K)
    {
    trend[i, ] <- phi[i] + delta[i] * time        
    }
trend.vec <- as.numeric(trend)
LP <- 4 + x * beta + trend.vec
mean <- exp(LP)
Y <- rpois(n=N.all, lambda=mean)


#### Run the model
## Not run: model <- ST.CARlinear(formula=Y~x, family="poisson", W=W, burnin=10000, 
n.sample=50000)
## End(Not run)


#### Toy example for checking 
model <- ST.CARlinear(formula=Y~x, family="poisson", W=W, burnin=10, 
n.sample=50)

Fit a spatio-temporal generalised linear mixed model to data, with a spatio-temporal autoregressive process and a piecewise constant intercept term.

Description

Fit a spatio-temporal generalised linear mixed model to areal unit data, where the response variable can be binomial or Poisson. The linear predictor is modelled by known covariates, a vector of random effects and a piecewise constant intercept process. The random effects follow the multivariate first order autoregressive time series process proposed by Rushworth et al.(2014), which is the same as that used in the ST.CARar() function. The piecewise constant intercept component allows neighbouring areal units to have very different values if they are assigned to a different intercept component. This model allows for localised smoothness, because some pairs of neighbouring areas or time periods can have similar values (same intercept) while other neighbouring pairs have very different values (different intercepts). Furter details are given in Lee and Lawson (2016) and in the vignette accompanying this package. Inference is conducted in a Bayesian setting using Markov chain Monte Carlo (MCMC) simulation.

Usage

ST.CARlocalised(formula, family, data=NULL,  G, trials=NULL, W, burnin, n.sample, 
thin=1, n.chains=1,  n.cores=1, prior.mean.beta=NULL, prior.var.beta=NULL, 
prior.delta=NULL, prior.tau2=NULL, MALA=TRUE, verbose=TRUE)

Arguments

formula

A formula for the covariate part of the model using the syntax of the lm() function. Offsets can be included here using the offset() function. The response and each covariate should be vectors of length (KN)*1, where K is the number of spatial units and N is the number of time periods. Each vector should be ordered so that the first K data points are the set of all K spatial locations at time 1, the next K are the set of spatial locations for time 2 and so on. The response must NOT contain missing (NA) values.

family

One of either "binomial", or "poisson", which respectively specify a binomial likelihood model with a logistic link function, or a Poisson likelihood model with a log link function.

data

An optional data.frame containing the variables in the formula.

G

The maximum number of distinct intercept terms (clusters) to allow in the model.

trials

A vector the same length and in the same order as the response containing the total number of trials for each area and time period. Only used if family="binomial".

W

A non-negative K by K neighbourhood matrix (where K is the number of spatial units). Typically a binary specification is used, where the jkth element equals one if areas (j, k) are spatially close (e.g. share a common border) and is zero otherwise. The matrix can be non-binary, but each row must contain at least one non-zero entry.

burnin

The number of MCMC samples to discard as the burn-in period.

n.sample

The number of MCMC samples to generate.

thin

The level of thinning to apply to the MCMC samples to reduce their temporal autocorrelation. Defaults to 1 (no thinning).

n.chains

The number of MCMC chains to run when fitting the model. Defaults to 1.

n.cores

The number of computer cores to run the MCMC chains on. Must be less than or equal to n.chains. Defaults to 1.

prior.mean.beta

A vector of prior means for the regression parameters beta (Gaussian priors are assumed). Defaults to a vector of zeros.

prior.var.beta

A vector of prior variances for the regression parameters beta (Gaussian priors are assumed). Defaults to a vector with values 100,000.

prior.delta

The prior maximum M, in a Uniform(0,M) prior, for the intercept process smoothing parameter delta. Defaults to 10.

prior.tau2

The prior shape and scale in the form of c(shape, scale) for an Inverse-Gamma(shape, scale) prior for tau2. Defaults to c(1, 0.01).

MALA

Logical, should the function use Metropolis adjusted Langevin algorithm (MALA) updates (TRUE, default) or simple random walk (FALSE) updates for the regression parameters. Not applicable if family="gaussian".

verbose

Logical, should the function update the user on its progress.

Value

summary.results

A summary table of the parameters.

samples

A list containing the MCMC samples from the model.

fitted.values

A vector of fitted values for each area and time period.

residuals

A matrix with 2 columns where each column is a type of residual and each row relates to an area and time period. The types are "response" (raw), and "pearson".

modelfit

Model fit criteria including the Deviance Information Criterion (DIC) and its corresponding estimated effective number of parameters (p.d), the Log Marginal Predictive Likelihood (LMPL), the Watanabe-Akaike Information Criterion (WAIC) and its corresponding estimated number of effective parameters (p.w), and the loglikelihood.

accept

The acceptance probabilities for the parameters.

localised.structure

A vector giving the posterior mean of which intercept component (cluster) each data point is in.

formula

The formula (as a text string) for the response, covariate and offset parts of the model.

model

A text string describing the model fit.

mcmc.info

A vector giving details of the numbers of MCMC samples generated.

X

The design matrix of covariates.

Author(s)

Duncan Lee

References

Lee, D and Lawson, C (2016). Quantifying the spatial inequality and temporal trends in maternal smoking rates in Glasgow, Annals of Applied Statistics, 10, 1427-1446.

Examples

#################################################
#### Run the model on simulated data on a lattice
#################################################
#### set up the regular lattice    
x.easting <- 1:10
x.northing <- 1:10
Grid <- expand.grid(x.easting, x.northing)
K <- nrow(Grid)
N <- 10
N.all <- N * K
   
    
#### set up spatial neighbourhood matrix W
distance <- as.matrix(dist(Grid))
W <-array(0, c(K,K))
W[distance==1] <-1 	


#### Simulate the elements in the linear predictor and the data
Q.W <- 0.99 * (diag(apply(W, 2, sum)) - W) + 0.01 * diag(rep(1,K))
Q.W.inv <- solve(Q.W)
phi.temp <- mvrnorm(n=1, mu=rep(0,K), Sigma=(0.1 * Q.W.inv))
phi <- phi.temp
    for(i in 2:N)
    {
    phi.temp2 <- mvrnorm(n=1, mu=(0.8 * phi.temp), Sigma=(0.1 * Q.W.inv))
    phi.temp <- phi.temp2
    phi <- c(phi, phi.temp)
    }
jump <- rep(c(rep(2, 70), rep(4, 30)),N)
LP <- jump + phi
fitted <- exp(LP)
Y <- rpois(n=N.all, lambda=fitted)


#### Run the model     
## Not run: model <- ST.CARlocalised(formula=Y~1, family="poisson", G=3, W=W, burnin=10000,
n.sample=50000)
## End(Not run)


#### Toy example for checking
model <- ST.CARlocalised(formula=Y~1, family="poisson", G=3, W=W, burnin=10,
n.sample=50)

Fit a spatio-temporal generalised linear mixed model to data, with a common temporal main effect and separate spatial surfaces with individual variances.

Description

Fit a spatio-temporal generalised linear mixed model to areal unit data, where the response variable can be binomial or Poisson. The linear predictor is modelled by known covariates and two sets of random effects. These include a common temporal main effect, and separate time period specific spatial effects with a common spatial dependence parameter but separate variance parameters. Each component is modelled by the conditional autoregressive (CAR) prior proposed by Leroux et al. (2000). Further details are given in Napier et al. (2016) and in the vignette accompanying this package. Inference is conducted in a Bayesian setting using Markov chain Monte Carlo (MCMC) simulation.

Usage

ST.CARsepspatial(formula, family, data=NULL,  trials=NULL, W, burnin, n.sample,
thin=1, n.chains=1,  n.cores=1, prior.mean.beta=NULL, prior.var.beta=NULL, 
prior.tau2=NULL, rho.S=NULL, rho.T=NULL, MALA=TRUE, verbose=TRUE)

Arguments

formula

A formula for the covariate part of the model using the syntax of the lm() function. Offsets can be included here using the offset() function. The response and each covariate should be vectors of length (KN)*1, where K is the number of spatial units and N is the number of time periods. Each vector should be ordered so that the first K data points are the set of all K spatial locations at time 1, the next K are the set of spatial locations for time 2 and so on. The response must NOT contain missing (NA) values.

family

One of either "binomial" or "poisson", which respectively specify a binomial likelihood model with a logistic link function, or a Poisson likelihood model with a log link function.

data

An optional data.frame containing the variables in the formula.

trials

A vector the same length and in the same order as the response containing the total number of trials for each area and time period. Only used if family="binomial".

W

A non-negative K by K neighbourhood matrix (where K is the number of spatial units). Typically a binary specification is used, where the jkth element equals one if areas (j, k) are spatially close (e.g. share a common border) and is zero otherwise. The matrix can be non-binary, but each row must contain at least one non-zero entry.

burnin

The number of MCMC samples to discard as the burn-in period.

n.sample

The number of MCMC samples to generate.

thin

The level of thinning to apply to the MCMC samples to reduce their temporal autocorrelation. Defaults to 1 (no thinning).

n.chains

The number of MCMC chains to run when fitting the model. Defaults to 1.

n.cores

The number of computer cores to run the MCMC chains on. Must be less than or equal to n.chains. Defaults to 1.

prior.mean.beta

A vector of prior means for the regression parameters beta (Gaussian priors are assumed). Defaults to a vector of zeros.

prior.var.beta

A vector of prior variances for the regression parameters beta (Gaussian priors are assumed). Defaults to a vector with values 100,000.

prior.tau2

The prior shape and scale in the form of c(shape, scale) for an Inverse-Gamma(shape, scale) prior for tau2. Defaults to c(1, 0.01).

rho.S

The value in the interval [0, 1] that the spatial dependence parameter rho.S is fixed at if it should not be estimated. If this arugment is NULL then rho.S is estimated in the model.

rho.T

The value in the interval [0, 1] that the temporal dependence parameter rho.T is fixed at if it should not be estimated. If this arugment is NULL then rho.T is estimated in the model.

MALA

Logical, should the function use Metropolis adjusted Langevin algorithm (MALA) updates (TRUE, default) or simple random walk (FALSE) updates for the regression parameters. Not applicable if family="gaussian".

verbose

Logical, should the function update the user on its progress.

Value

summary.results

A summary table of the parameters.

samples

A list containing the MCMC samples from the model.

fitted.values

A vector of fitted values for each area and time period.

residuals

A matrix with 2 columns where each column is a type of residual and each row relates to an area and time period. The types are "response" (raw), and "pearson".

modelfit

Model fit criteria including the Deviance Information Criterion (DIC) and its corresponding estimated effective number of parameters (p.d), the Log Marginal Predictive Likelihood (LMPL), the Watanabe-Akaike Information Criterion (WAIC) and its corresponding estimated number of effective parameters (p.w), and the loglikelihood.

accept

The acceptance probabilities for the parameters.

localised.structure

NULL, for compatability with the other models.

formula

The formula (as a text string) for the response, covariate and offset parts of the model.

model

A text string describing the model fit.

mcmc.info

A vector giving details of the numbers of MCMC samples generated.

X

The design matrix of covariates.

Author(s)

Gary Napier

References

Napier, G, D. Lee, C. Robertson, A. Lawson, and K. Pollock (2016). A model to estimate the impact of changes in MMR vaccination uptake on inequalities in measles susceptibility in Scotland, Statistical Methods in Medical Research, 25, 1185-1200.

Examples

#################################################
#### Run the model on simulated data on a lattice
#################################################
#### set up the regular lattice    
x.easting <- 1:10
x.northing <- 1:10
Grid <- expand.grid(x.easting, x.northing)
K <- nrow(Grid)
N <- 5
N.all <- N * K
  
        
#### set up spatial neighbourhood matrix W
distance <- as.matrix(dist(Grid))
W <-array(0, c(K,K))
W[distance==1] <-1 	


#### Create the spatial covariance matrix
Q.W <- 0.99 * (diag(apply(W, 2, sum)) - W) + 0.01 * diag(rep(1,K))
Q.W.inv <- solve(Q.W)
  
           
#### Simulate the elements in the linear predictor and the data
x <- rnorm(n=N.all, mean=0, sd=1)
beta <- 0.1

phi1 <- mvrnorm(n=1, mu=rep(0,K), Sigma=(0.01 * Q.W.inv))
phi2 <- mvrnorm(n=1, mu=rep(0,K), Sigma=(0.01 * Q.W.inv))
phi3 <- mvrnorm(n=1, mu=rep(0,K), Sigma=(0.01 * Q.W.inv))
phi4 <- mvrnorm(n=1, mu=rep(0,K), Sigma=(0.05 * Q.W.inv))
phi5 <- mvrnorm(n=1, mu=rep(0,K), Sigma=(0.05 * Q.W.inv))
  
delta <- c(0, 0.5, 0, 0.5, 0)
phi.long <- c(phi1, phi2, phi3, phi4, phi5)
delta.long <- kronecker(delta, rep(1,K))
LP <- 4 +  x * beta + phi.long +  delta.long
mean <- exp(LP)
Y <- rpois(n=N.all, lambda=mean)
  
                
#### Run the model
## Not run: model <- ST.CARsepspatial(formula=Y~x, family="poisson", W=W, burnin=10000, 
n.sample=50000)
## End(Not run)


#### Toy example for checking
model <- ST.CARsepspatial(formula=Y~x, family="poisson", W=W, burnin=10, 
n.sample=50)

Estimate an appropriate neighbourhood matrix for a set of spatial data using a baseline neighbourhood matrix and a graph based optimisation algorithm.

Description

Estimate an appropriate neighbourhood matrix (W.est) for a given set of spatial data (spdata) from a baseline neighbourhood matrix (W) using the graph-based optimisation algorithm proposed by Lee, Meeks and Pettersson (2021). The baseline neighbourhood matrix W should be binary and based on a simple geographical specification such as the border sharing rule. The estimated neighbourhood matrix is constructed by removing neighbour relations (i.e. setting w_ij = w_ji = 0) if they are not appropriate for the data. Note, new edges not in the initial W matrix cannot be added when creating W.est.

Usage

W.estimate(W, spdata, add=FALSE, remove=TRUE, remove_first=FALSE)

Arguments

W

A binary K * K neighbourhood matrix, where K is the number of spatial units. A typical specification would have the jkth element equalling one if areas (j, k) are spatially close (e.g. share a common border) and zero otherwise. Each row must contain at least one non-zero entry.

spdata

A K * 1 vector of spatial data that you wish to optimise the neighbourhood matrix for. The kth element of this vector must correspond to the area represented by the kth row of the W matrix.

add

Allow the optimiser to add edges back in to the graph if they have previously been removed. Defaults to FALSE.

remove

Allow the optimiser to remove edges from the graph if they have previously been added in. Defaults to TRUE.

remove_first

If only one of add or remove are TRUE, then this option has no effect. If both add and remove are TRUE, then this option determines whether the optimiser first tries to add or remove edges (before alternating between the two phases). Defaults to FALSE, meaning that the optimiser tries to add edges first.

Value

W.est

An optimised K by K neighbourhood matrix.

Author(s)

William Pettersson (william.pettersson@glasgow.ac.uk) and Kitty Meeks

References

Lee, D and Meeks, K (2020). Improved inference for areal unit count data using graph-based optimisation. arXiv:2010.10893.

Examples

####################################################
#### Run the optmiser on simulated data on a lattice
####################################################
#### Load other libraries required
library(MASS)

#### Set up a square lattice region
x.easting <- 1:10
x.northing <- 1:10
Grid <- expand.grid(x.easting, x.northing)
K <- nrow(Grid)

#### Split the area into two groups between which there will be a boundary.
groups <-rep(1, K) 
groups[Grid$Var1>5] <- 2

#### set up distance and neighbourhood (W, based on sharing a common border) matrices
distance <- as.matrix(dist(Grid))
W <-array(0, c(K,K))
W[distance==1] <-1 	

#### Generate the response data
phi <- mvrnorm(n=1, mu=groups, Sigma=0.05 * exp(-0.1 * distance))
lp <- 4 + phi
Y <- rpois(n=K, lambda =exp(lp))


# Compute the spdata object on the linear predictor scale
spdata <- log(Y) - mean(log(Y))

#### Run W matrix optmiser
W.est <- W.estimate(W, spdata)

Extract the regression coefficients from a model.

Description

This function takes a CARBayesST object and returns the vector of estimated regression coefficients (posterior means).

Usage

    ## S3 method for class 'CARBayesST'
coef(object, ...)

Arguments

object

A CARBayesST fitted model object.

...

Ignored.

Author(s)

Duncan Lee


Extract the fitted values from a model.

Description

This function takes a CARBayesST object and returns the vector of fitted values (posterior means).

Usage

    ## S3 method for class 'CARBayesST'
fitted(object, ...)

Arguments

object

A CARBayesST fitted model object.

...

Ignored.

Author(s)

Duncan Lee


Extract the estimated loglikelihood from a fitted model.

Description

This function takes a CARBayesST object and returns the estimated loglikelihood (posterior means).

Usage

    ## S3 method for class 'CARBayesST'
logLik(object, ...)

Arguments

object

A CARBayesST fitted model object.

...

Ignored.

Author(s)

Duncan Lee


Extract the model (design) matrix from a model.

Description

This function takes a CARBayesST object and returns the design matrix.

Usage

    ## S3 method for class 'CARBayesST'
model.matrix(object, ...)

Arguments

object

A CARBayesST fitted model object.

...

Ignored.

Author(s)

Duncan Lee


Print a summary of the fitted model to the screen.

Description

This function takes a CARBayesST object and returns a summary of the fitted model. The summary includes, for selected parameters, posterior means and 95 percent credible intervals, the effective number of independent samples and the Geweke convergence diagnostic in the form of a Z-score.

Usage

## S3 method for class 'CARBayesST'
print(x, ...)

Arguments

x

A CARBayesST fitted model object.

...

Ignored.

Author(s)

Duncan Lee


Extract the residuals from a model.

Description

This function takes a CARBayesST object and returns a set of residuals. The allowable types of residual are "response" (raw), and "pearson" (the default). In each case the fitted values are based on posterior means.

Usage

    ## S3 method for class 'CARBayesST'
residuals(object, type, ...)

Arguments

object

A CARBayesST fitted model object.

type

A text string and one of c("response", "pearson"). If this argument is omitted the default is "pearson".

...

Ignored.

Author(s)

Duncan Lee