Type: | Package |
Title: | Scalable Spike-and-Slab |
Version: | 1.0 |
Date: | 2022-05-13 |
Description: | A scalable Gibbs sampling implementation for high dimensional Bayesian regression with the continuous spike-and-slab prior. Niloy Biswas, Lester Mackey and Xiao-Li Meng, "Scalable Spike-and-Slab" (2022) <doi:10.48550/arXiv.2204.01668>. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Imports: | Rcpp, stats, TruncatedNormal |
LinkingTo: | Rcpp, RcppEigen |
RoxygenNote: | 7.1.2 |
NeedsCompilation: | yes |
Packaged: | 2022-05-17 20:40:23 UTC; niloybiswas |
Author: | Niloy Biswas |
Maintainer: | Niloy Biswas <niloy_biswas@g.harvard.edu> |
Depends: | R (≥ 3.5.0) |
Repository: | CRAN |
Date/Publication: | 2022-05-18 17:00:07 UTC |
Riboflavin GWAS dataset
Description
Dataset of riboflavin production by Bacillus subtilis containing n = 71 observations of a one-dimensional response (riboflavin production) and p = 4088 predictors (gene expressions). The one-dimensional response corresponds to riboflavin production.
Usage
data(riboflavin)
Format
A data frame containing a vector y of length 71 (responses) and a matrix X of dimension 71 by 4088 (gene expressions)
Details
The processed dataset is the same as in the R packages qut and hdi.
References
Buhlmann, P., Kalisch, M. and Meier, L. (2014) High-dimensional statistics with a view towards applications in biology. Annual Review of Statistics and its Applications 1, 255–278
Examples
data(riboflavin)
y <- as.vector(riboflavin$y)
X <- as.matrix(riboflavin$x)
spike_slab_linear
Description
Generates Markov chain targeting the posterior corresponding to Bayesian linear regression with spike and slab priors
Usage
spike_slab_linear(
chain_length,
X,
y,
tau0,
tau1,
q,
a0 = 1,
b0 = 1,
rinit = NULL,
verbose = FALSE,
burnin = 0,
store = TRUE,
Xt = NULL,
XXt = NULL,
tau0_inverse = NULL,
tau1_inverse = NULL
)
Arguments
chain_length |
Markov chain length |
X |
matrix of length n by p |
y |
Response |
tau0 |
prior hyperparameter (non-negative real) |
tau1 |
prior hyperparameter (non-negative real) |
q |
prior hyperparameter (strictly between 0 and 1) |
a0 |
prior hyperparameter (non-negative real) |
b0 |
prior hyperparameter (non-negative real) |
rinit |
initial distribution of Markov chain (default samples from the prior) |
verbose |
print iteration of the Markov chain (boolean) |
burnin |
chain burnin (non-negative integer) |
store |
store chain trajectory (boolean) |
Xt |
Pre-calculated transpose of X |
XXt |
Pre-calculated matrix X*transpose(X) (n by n matrix) |
tau0_inverse |
Pre-calculated matrix inverse(I + tau0^2*XXt) (n by n matrix) |
tau1_inverse |
Pre-calculated matrix inverse(I + tau1^2*XXt) (n by n matrix) |
Value
Output from Markov chain targeting the posterior corresponding to Bayesian linear regression with spike and slab priors
Examples
# Synthetic dataset
syn_data <- synthetic_data(n=100,p=200,s0=5,error_std=2,type='linear')
X <- syn_data$X
y <- syn_data$y
# Hyperparamters
params <- spike_slab_params(n=nrow(X),p=ncol(X))
# Run S^3
sss_chain <- spike_slab_linear(chain_length=4e3,burnin=1e3,X=X,y=y,
tau0=params$tau0,tau1=params$tau1,q=params$q,a0=params$a0,b0=params$b0,
verbose=FALSE,store=FALSE)
# Use posterior probabilities for variable selection
sss_chain$z_ergodic_avg[1:10]
spike_slab_logistic
Description
Generates Markov chain targeting the posterior corresponding to Bayesian logistic regression with spike and slab priors
Usage
spike_slab_logistic(
chain_length,
X,
y,
tau0,
tau1,
q,
rinit = NULL,
verbose = FALSE,
burnin = 0,
store = TRUE,
Xt = NULL,
XXt = NULL
)
Arguments
chain_length |
Markov chain length |
X |
matrix of length n by p |
y |
Response |
tau0 |
prior hyperparameter (non-negative real) |
tau1 |
prior hyperparameter (non-negative real) |
q |
prior hyperparameter (strictly between 0 and 1) |
rinit |
initial distribution of Markov chain (default samples from the prior) |
verbose |
print iteration of the Markov chain (boolean) |
burnin |
chain burnin (non-negative integer) |
store |
store chain trajectory (boolean) |
Xt |
Pre-calculated transpose of X |
XXt |
Pre-calculated matrix X*transpose(X) (n by n matrix) |
Value
Output from Markov chain targeting the posterior corresponding to Bayesian logistic regression with spike and slab priors
Examples
# Synthetic dataset
syn_data <- synthetic_data(n=100,p=200,s0=5,error_std=2,type='logistic')
X <- syn_data$X
y <- syn_data$y
# Hyperparamters
params <- spike_slab_params(n=nrow(X),p=ncol(X))
# Run S^3
sss_chain <- spike_slab_logistic(chain_length=4e3,burnin=1e3,X=X,y=y,
tau0=params$tau0,tau1=params$tau1,q=params$q,verbose=FALSE,store=FALSE)
# Use posterior probabilities for variable selection
sss_chain$z_ergodic_avg[1:10]
spike_slab_params
Description
Generates hyperparameters for spike-and-slab
Usage
spike_slab_params(n, p)
Arguments
n |
number of observations |
p |
number of covariates |
Value
spike-and-slab hyperparameters q, tau0, tau1, a0, b0
Examples
hyper_params <- spike_slab_params(n=100,p=200)
print(hyper_params)
spike_slab_probit
Description
Generates Markov chain targeting the posterior corresponding to Bayesian probit regression with spike and slab priors
Usage
spike_slab_probit(
chain_length,
X,
y,
tau0,
tau1,
q,
rinit = NULL,
verbose = FALSE,
burnin = 0,
store = TRUE,
Xt = NULL,
XXt = NULL,
tau0_inverse = NULL,
tau1_inverse = NULL
)
Arguments
chain_length |
Markov chain length |
X |
matrix of length n by p |
y |
Response |
tau0 |
prior hyperparameter (non-negative real) |
tau1 |
prior hyperparameter (non-negative real) |
q |
prior hyperparameter (strictly between 0 and 1) |
rinit |
initial distribution of Markov chain (default samples from the prior) |
verbose |
print iteration of the Markov chain (boolean) |
burnin |
chain burnin (non-negative integer) |
store |
store chain trajectory (boolean) |
Xt |
Pre-calculated transpose of X |
XXt |
Pre-calculated matrix X*transpose(X) (n by n matrix) |
tau0_inverse |
Pre-calculated matrix inverse(I + tau0^2*XXt) (n by n matrix) |
tau1_inverse |
Pre-calculated matrix inverse(I + tau1^2*XXt) (n by n matrix) |
Value
Output from Markov chain targeting the posterior corresponding to Bayesian logistic regression with spike and slab priors
Examples
# Synthetic dataset
syn_data <- synthetic_data(n=100,p=200,s0=5,error_std=2,type='probit')
X <- syn_data$X
Xt <- t(X)
y <- syn_data$y
# Hyperparamters
params <- spike_slab_params(n=nrow(X),p=ncol(X))
# Run S^3
sss_chain <- spike_slab_probit(chain_length=4e3,burnin=1e3,X=X,y=y,
tau0=params$tau0,tau1=params$tau1,q=params$q,verbose=FALSE,store=FALSE)
# Use posterior probabilities for variable selection
sss_chain$z_ergodic_avg[1:10]
synthetic_data
Description
Generates synthetic linear and logistic regression data
Usage
synthetic_data(
n,
p,
s0,
error_std,
type = "linear",
scale = TRUE,
signal = "constant"
)
Arguments
n |
number of observations |
p |
number of covariates |
s0 |
sparsity (number of non-zero components of the true signal) |
error_std |
Standard deviation of the Gaussian noise (linear regression only) |
type |
dataset type ('linear' or 'logistic') |
scale |
design matrix X has columns mean zero and standard deviation 1 (TRUE or FALSE) |
signal |
non-zero components of the true signal ('constant' or 'deacy') |
Value
Design matrix, response and true signal vector for linear and logistic regression
Examples
syn_data <- synthetic_data(n=100,p=200,s0=5,error_std=2)
# syn_data$X is an n by p design matrix
dim(syn_data$X)
# syn_data$y is a length n response vector
length(syn_data$y)
# syn_data$true_beta is a length n response vector with only the first s0 entries non-zero
all(syn_data$true_beta[1:5]!=0)
all(syn_data$true_beta[-c(1:5)]==0)