Title: Gaussian Processes for Estimating Causal Exposure Response Curves
Version: 0.2.4
Maintainer: Boyu Ren <bren@mgb.org>
Description: Provides a non-parametric Bayesian framework based on Gaussian process priors for estimating causal effects of a continuous exposure and detecting change points in the causal exposure response curves using observational data. Ren, B., Wu, X., Braun, D., Pillai, N., & Dominici, F.(2021). "Bayesian modeling for exposure response curve via gaussian processes: Causal effects of exposure to air pollution on health outcomes." arXiv preprint <doi:10.48550/arXiv.2105.03454>.
License: GPL (≥ 3)
Language: en-US
URL: https://github.com/NSAPH-Software/GPCERF
BugReports: https://github.com/NSAPH-Software/GPCERF/issues
Copyright: Harvard University
Imports: parallel, xgboost, stats, MASS, spatstat.geom, logger, Rcpp, RcppArmadillo, ggplot2, cowplot, rlang, Rfast, SuperLearner, wCorr
Encoding: UTF-8
RoxygenNote: 7.2.3
Depends: R (≥ 3.5.0)
Suggests: rmarkdown, knitr, testthat (≥ 3.0.0)
Config/testthat/edition: 3
VignetteBuilder: knitr
LinkingTo: RcppArmadillo, Rcpp
NeedsCompilation: yes
Packaged: 2024-04-15 13:52:18 UTC; Boyu
Author: Naeem Khoshnevis ORCID iD [aut] (AFFILIATION: HUIT), Boyu Ren ORCID iD [aut, cre] (AFFILIATION: McLean Hospital), Tanujit Dey ORCID iD [ctb] (AFFILIATION: HMS), Danielle Braun ORCID iD [aut] (AFFILIATION: HSPH)
Repository: CRAN
Date/Publication: 2024-04-15 14:30:02 UTC

The 'GPCERF' package.

Description

Provides a non-parametric Bayesian framework based on Gaussian process priors for estimating causal effects of a continuous exposure and detecting change points in the causal exposure response curves using observational data.

Author(s)

Naeem Khoshnevis

Boyu Ren

Danielle Braun

References

Ren, B., Wu, X., Braun, D., Pillai, N. and Dominici, F., 2021. Bayesian modeling for exposure response curve via gaussian processes: Causal effects of exposure to air pollution on health outcomes. arXiv preprint arXiv:2105.03454.


A helper function for cerf_gp object

Description

A helper function to plot cerf_gp object using ggplot2 package.

Usage

## S3 method for class 'cerf_gp'
autoplot(object, ...)

Arguments

object

A cerf_gp object.

...

Additional arguments passed to customize the plot.

Value

Returns a ggplot object.


A helper function for cerf_nngp object

Description

A helper function to plot cerf_nngp object using ggplot2 package.

Usage

## S3 method for class 'cerf_nngp'
autoplot(object, ...)

Arguments

object

A cerf_nngp object.

...

Additional arguments passed to customize the plot.

Value

Returns a ggplot object.


Calculate derivatives of CERF for nnGP

Description

Calculates the posterior mean of the derivative of CERF at a given exposure level with nnGP.

Usage

compute_deriv_nn(
  w,
  w_obs,
  gps_m,
  y_obs,
  hyperparam,
  n_neighbor,
  block_size,
  kernel_fn = function(x) exp(-x),
  kernel_deriv_fn = function(x) -exp(-x)
)

Arguments

w

A scalar of exposure level of interest.

w_obs

A vector of observed exposure levels of all samples.

gps_m

An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FALSE

y_obs

A vector of observed outcome values.

hyperparam

A vector of hyper-parameters in the GP model.

n_neighbor

The number of nearest neighbors on one side.

block_size

The number of samples included in a computation block. Mainly used to balance the speed and memory requirement. Larger block_size is faster, but requires more memory.

kernel_fn

The covariance function. The input is the square of Euclidean distance.

kernel_deriv_fn

The partial derivative of the covariance function. The input is the square of Euclidean distance.

Value

A scalar of estimated derivative of CERF at w in nnGP.


Calculate derivatives of CERF

Description

Calculates the weights assigned to each observed outcome when deriving the posterior mean of the first derivative of CERF at a given exposure level.

Usage

compute_deriv_weights_gp(
  w,
  w_obs,
  gps_m,
  hyperparam,
  kernel_fn = function(x) exp(-x),
  kernel_deriv_fn = function(x) -exp(-x)
)

Arguments

w

A scalar of exposure level of interest.

w_obs

A vector of observed exposure levels of all samples.

gps_m

An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FALSE

hyperparam

A vector of hyper-parameters in the GP model.

kernel_fn

The covariance function.

kernel_deriv_fn

The partial derivative of the covariance function.

Value

A vector of weights for all samples, based on which the posterior mean of the derivative of CERF at the exposure level of interest is calculated.


Compute matrix inverse for a covariate matrix

Description

Computes inverse of a covariate matrix using Choleski decomposition.

Usage

compute_inverse(mtrx)

Arguments

mtrx

An n * n covariate matrix

Value

A matrix that represent the inverse of the input matrix.


Compute mean, credible interval, and covariate balance in standard Gaussian process (GP)

Description

Calculates the induced covariate balance associated with one hyper-parameter configuration in standard GP.

Usage

compute_m_sigma(
  hyperparam,
  outcome_data,
  treatment_data,
  covariates_data,
  w,
  gps_m,
  tuning,
  kernel_fn = function(x) exp(-x^2)
)

Arguments

hyperparam

A vector of values of hyper-parameters.

  • First element: alpha

  • Second element: beta

  • Third element: g_sigma (gamma / sigma)

outcome_data

A vector of outcome data.

treatment_data

A vector of treatment data.

covariates_data

A data frame of covariates data.

w

A vector of exposure levels at which the CERF is estimated.

gps_m

An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FLASE

tuning

The function is used for parameter tuning (default = TRUE) or estimation (FALSE)

kernel_fn

The covariance function of GP.

Value

A list containing two elements:


Calculate posterior means for nnGP model

Description

Calculates the posterior mean of a point on the CERF based on the nnGP model. This function also returns the weights assigned to all nearest neighbors when calculating the posterior mean.

Usage

compute_posterior_m_nn(
  hyperparam,
  w,
  gps_w,
  obs_ord,
  y_obs_ord,
  kernel_fn = function(x) exp(-x^2),
  n_neighbor = 10,
  block_size = 10000
)

Arguments

hyperparam

A set of hyperparameters in the GP model.

w

A scaler representing the exposure level for the point of interest on the CERF.

gps_w

The GPS for all samples when their exposure levels are set at w.

obs_ord

A matrix of two columns. First column is the observed exposure levels of all samples; second is the GPS at the observed exposure levels. The rows are in ascending order for the first column.

y_obs_ord

A vector of observed outcome values. The vector is ordered as obs_ord.

kernel_fn

The covariance function of the GP.

n_neighbor

The number of nearest neighbors on one side.

block_size

Number of samples included in a computation block. Mainly used to balance the speed and memory requirement. Larger block_size is faster, but requires more memory.

Value

TODO: The first column is the selected index and the second column is weight. A two-column matrix. The first column is the weights assigned to each nearest neighbor. The second column is the corresponding observed outcome value. The weight in the last row of this matrix is NA and the observed outcome value is the estimated posterior mean of the CERF at point w, which is the weighted sum of all observed outcome values of the neighbors.


Calculate posterior standard deviations for nnGP model

Description

Calculates the posterior standard deviation of a point on the CERF based on the nnGP model.

Usage

compute_posterior_sd_nn(
  hyperparam,
  w,
  gps_w,
  obs_ord,
  sigma2,
  kernel_fn = function(x) exp(-x^2),
  n_neighbor = 10,
  block_size = 10000
)

Arguments

hyperparam

The values of hyperparameters in the GP model.

w

The exposure level for the point of interest on the CERF.

gps_w

The GPS for all samples when their exposure levels are set at w.

obs_ord

A matrix of two columns. The first column is the observed exposure levels of all samples; the second is the GPS at the observed exposure levels. The rows are in ascending order for the first column.

sigma2

A scaler representing sigma^2.

kernel_fn

The covariance function of the GP.

n_neighbor

Number of nearest neighbors on one side.

block_size

Number of samples included in a computation block. Mainly used to balance the speed and memory requirement. Larger block_size is faster, but requires more memory.

Value

The posterior standard deviation of the estimated CERF at w.


Detect change-point in standard GP

Description

Calculates the posterior mean of the difference between left- and right-derivatives at an exposure level for the detection of change points.

Usage

compute_rl_deriv_gp(
  w,
  w_obs,
  y_obs,
  gps_m,
  hyperparam,
  kernel_fn = function(x) exp(-x),
  kernel_deriv_fn = function(x) -exp(-x)
)

Arguments

w

A scalar of exposure level of interest.

w_obs

A vector of observed exposure levels of all samples.

y_obs

A vector of observed outcome values of all samples.

gps_m

An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FLASE

hyperparam

A vector of hyper-parameters in the GP model.

kernel_fn

The covariance function.

kernel_deriv_fn

The partial derivative of the covariance function.

Value

A numeric value of the posterior mean of the difference between two one-sided derivatives.

Examples


set.seed(847)
data <- generate_synthetic_data(sample_size = 100)
gps_m <- estimate_gps(cov_mt = data[,-(1:2)],
                      w_all = data$treat,
                      sl_lib = c("SL.xgboost"),
                      dnorm_log = FALSE)

wi <- 8.6

val <- compute_rl_deriv_gp(w = wi,
                           w_obs = data$treat,
                           y_obs = data$Y,
                           gps_m = gps_m,
                           hyperparam = c(1,1,2))


Calculate right minus left derivatives for change-point detection in nnGP

Description

Calculates the posterior mean of the difference between left- and right-derivatives at an exposure level for the detection of change points. nnGP approximation is used.

Usage

compute_rl_deriv_nn(
  w,
  w_obs,
  gps_m,
  y_obs,
  hyperparam,
  n_neighbor,
  block_size,
  kernel_fn = function(x) exp(-x),
  kernel_deriv_fn = function(x) -exp(-x)
)

Arguments

w

A scalar of exposure level of interest.

w_obs

A vector of observed exposure levels of all samples.

gps_m

An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FLASE

y_obs

A vector of observed outcome values.

hyperparam

A vector of hyper-parameters in the GP model.

n_neighbor

The number of nearest neighbors on one side.

block_size

The number of samples included in a computation block. Mainly used to balance the speed and memory requirement. Larger block_size is faster, but requires more memory.

kernel_fn

The covariance function. The input is the square of Euclidean distance.

kernel_deriv_fn

The partial derivative of the covariance function. The input is the square of Euclidean distance.

Value

A numeric value of the posterior mean of the difference between two one-sided derivatives.

Examples


set.seed(325)
data <- generate_synthetic_data(sample_size = 200)
gps_m <- estimate_gps(cov_mt = data[,-(1:2)],
                      w_all = data$treat,
                      sl_lib = c("SL.xgboost"),
                      dnorm_log = FALSE)

wi <- 12.2

deriv_val <- compute_rl_deriv_nn(w = wi,
                                 w_obs = data$treat,
                                 gps_m = gps_m,
                                 y_obs = data$Y,
                                 hyperparam = c(0.2,0.4,1.2),
                                 n_neighbor = 20,
                                 block_size = 10)


Compute posterior credible interval

Description

Computes posterior credible interval for requested exposure level.

Usage

compute_sd_gp(
  w,
  scaled_obs,
  hyperparam,
  sigma,
  gps_m,
  kernel_fn = function(x) exp(-x^2)
)

Arguments

w

A scalar of exposure level of interest.

scaled_obs

A matrix of two columns.

  • First column is the scaled GPS value of all samples (GPS * 1/sqrt(alpha))

  • Second column is the scaled exposure value of all samples (w * 1/sqrt(beta))

hyperparam

A vector of hyper-parameters for the GP.

  • First element: alpha

  • Second element: beta

  • Third element: gamma/sigma

sigma

A scaler that represents noise.

gps_m

An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FLASE

kernel_fn

The covariance function of GP.

Value

Posterior credible interval (scaler) for the requested exposure level (w).


Compute weighted covariate balance

Description

Computes weighted covariate balance for given data sets.

Usage

compute_w_corr(w, covariate, weight)

Arguments

w

A vector of observed continuous exposure variable.

covariate

A data frame of observed covariates variable.

weight

A vector of weights.

Value

The function returns a list saved the measure related to covariate balance absolute_corr: the absolute correlations for each pre-exposure covairates; mean_absolute_corr: the average absolute correlations for all pre-exposure covairates.

Examples

set.seed(639)
n <- 100
mydata <- generate_synthetic_data(sample_size=100)
year <- sample(x=c("2001","2002","2003","2004","2005"),size = n,
 replace = TRUE)
region <- sample(x=c("North", "South", "East", "West"),size = n,
 replace = TRUE)
mydata$year <- as.factor(year)
mydata$region <- as.factor(region)
mydata$cf5 <- as.factor(mydata$cf5)
cor_val <- compute_w_corr(mydata[,2],
                          mydata[, 3:length(mydata)],
                          runif(n))

print(cor_val$mean_absolute_corr)


Calculate weights for estimation of a point on CERF

Description

Calculates the weights of observed outcomes which is then used to estimate the posterior mean of CERF at a given exposure level.

Usage

compute_weight_gp(
  w,
  w_obs,
  scaled_obs,
  hyperparam,
  inv_sigma_obs,
  gps_m,
  est_sd = FALSE,
  kernel_fn = function(x) exp(-x^2)
)

Arguments

w

A scalar of exposure level of interest.

w_obs

A vector of observed exposure levels of all samples.

scaled_obs

A matrix of two columns.

  • First column is the scaled GPS value of all samples (GPS * 1 / sqrt(alpha))

  • Second column is the scaled exposure value of all samples (w * 1/sqrt(beta))

hyperparam

A vector of hyper-parameters for the GP.

  • First element: alpha

  • Second element: beta

  • Third element: gamma/sigma

inv_sigma_obs

Inverse of the covariance matrix between observed samples.

gps_m

An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FLASE

est_sd

Should the posterior se be computed (default=FALSE)

kernel_fn

The covariance function of GP.

Value

A list of two elements, weight and standard deviation.


Estimate the conditional exposure response function using Gaussian process

Description

Estimates the conditional exposure response function (cerf) using Gaussian Process (gp). The function tune the best match (the lowest covariate balance) for the provided set of hyperparameters.

Usage

estimate_cerf_gp(
  data,
  w,
  gps_m,
  params,
  outcome_col,
  treatment_col,
  covariates_col,
  nthread = 1,
  kernel_fn = function(x) exp(-x^2)
)

Arguments

data

A data.frame of observation data.

w

A vector of exposure level to compute CERF (please also see the notes).

gps_m

An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FLASE

params

A list of parameters that is required to run the process. These parameters include:

  • alpha: A scaling factor for the GPS value.

  • beta: A scaling factor for the exposure value.

  • g_sigma: A scaling factor for kernel function (gamma/sigma).

  • tune_app: A tuning approach. Available approaches:

    • all: try all combinations of hyperparameters. alpha, beta, and g_sigma can be a vector of parameters.

outcome_col

An outcome column name in data.

treatment_col

A treatment column name in data.

covariates_col

Covariates columns name in data.

nthread

An integer value that represents the number of threads to be used by internal packages.

kernel_fn

A kernel function. A default value is a Gaussian Kernel.

Value

A cerf_gp object that includes the following values:

Note

Please note that w is a vector representing a grid of exposure levels at which the CERF is to be estimated. This grid can include both observed and hypothetical values of the exposure variable. The purpose of defining this grid is to provide a structured set of points across the exposure spectrum for estimating the CERF. This approach is essential in nonparametric models like Gaussian Processes (GPs), where the CERF is evaluated at specific points to understand the relationship between the exposure and outcome variables across a continuum. It facilitates a comprehensive analysis by allowing practitioners to examine the effect of varying exposure levels, including those not directly observed in the dataset.

Examples


set.seed(129)
data <- generate_synthetic_data(sample_size = 100, gps_spec = 3)


# Estimate GPS function
gps_m <- estimate_gps(cov_mt = data[,-(1:2)],
                      w_all = data$treat,
                      sl_lib = c("SL.xgboost"),
                      dnorm_log = FALSE)

# exposure values
w_all <- seq(0,10,1)


cerf_gp_obj <- estimate_cerf_gp(data,
                                w_all,
                                gps_m,
                                params = list(alpha = c(0.1),
                                              beta=0.2,
                                              g_sigma = 1,
                                              tune_app = "all"),
                                outcome_col = "Y",
                                treatment_col = "treat",
                                covariates_col = paste0("cf", seq(1,6)),
                                nthread = 1)



Estimate the conditional exposure response function using nearest neighbor Gaussian process

Description

Estimates the conditional exposure response function (cerf) using the nearest neighbor (nn) Gaussian Process (gp). The function tune the best match (the lowest covariate balance) for the provided set of hyperparameters.

Usage

estimate_cerf_nngp(
  data,
  w,
  gps_m,
  params,
  outcome_col,
  treatment_col,
  covariates_col,
  kernel_fn = function(x) exp(-x^2),
  nthread = 1
)

Arguments

data

A data.frame of observation data.

w

A vector of exposure level to compute CERF (please also see the notes).

gps_m

An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FLASE

params

A list of parameters that is required to run the process. These parameters include:

  • alpha: A scaling factor for the GPS value.

  • beta: A scaling factor for the exposure value.

  • g_sigma: A scaling factor for kernel function (gamma/sigma).

  • tune_app: A tuning approach. Available approaches:

    • all: try all combinations of hyperparameters.

  • n_neighbor: Number of nearest neighbors on one side.

  • block_size: Number of samples included in a computation block. Mainly used to balance the speed and memory requirement. Larger block_size is faster, but requires more memory. alpha, beta, and g_sigma can be a vector of parameters.

outcome_col

An outcome column name in data.

treatment_col

A treatment column name in data.

covariates_col

Covariates columns name in data.

kernel_fn

A kernel function. A default value is a Gaussian Kernel.

nthread

An integer value that represents the number of threads to be used by internal packages.

Value

A cerf_nngp object that includes the following values:

Note

Please note that w is a vector representing a grid of exposure levels at which the CERF is to be estimated. This grid can include both observed and hypothetical values of the exposure variable. The purpose of defining this grid is to provide a structured set of points across the exposure spectrum for estimating the CERF. This approach is essential in nonparametric models like Gaussian Processes (GPs), where the CERF is evaluated at specific points to understand the relationship between the exposure and outcome variables across a continuum. It facilitates a comprehensive analysis by allowing practitioners to examine the effect of varying exposure levels, including those not directly observed in the dataset.

Examples



set.seed(19)
data <- generate_synthetic_data(sample_size = 120, gps_spec = 3)
# Estimate GPS function
gps_m <- estimate_gps(cov_mt = data[,-(1:2)],
                      w_all = data$treat,
                      sl_lib = c("SL.xgboost"),
                      dnorm_log = FALSE)
# exposure values
w.all <- seq(0,20,2)
cerf_nngp_obj <- estimate_cerf_nngp(data,
                                    w.all,
                                    gps_m,
                                    params = list(alpha = c(0.1),
                                                  beta = 0.2,
                                                  g_sigma = 1,
                                                  tune_app = "all",
                                                  n_neighbor = 20,
                                                  block_size = 1e4),
                                    outcome_col = "Y",
                                    treatment_col = "treat",
                                    covariates_col = paste0("cf", seq(1,6)),
                                    nthread = 1)



Estimate a model for generalized propensity score

Description

Estimates a model for generalized propensity score (GPS) using parametric approach.

Usage

estimate_gps(cov_mt, w_all, sl_lib, dnorm_log)

Arguments

cov_mt

A covariate matrix containing all covariates. Each row is a data sample and each column is a covariate.

w_all

A vector of observed exposure levels.

sl_lib

A vector of SuperLearner's package libraries.

dnorm_log

Logical, if TRUE, probabilities p are given as log(p).

Value

A data.frame that includes:

Examples


data <- generate_synthetic_data(sample_size = 200)
gps_m <- estimate_gps(cov_mt = data[,-(1:2)],
                      w_all = data$treat,
                      sl_lib = c("SL.xgboost"),
                      dnorm_log = FALSE)


Estimate the CERF with the nnGP model

Description

Estimates the posterior mean of the conditional exposure response function at specified exposure levels with nnGP.

Usage

estimate_mean_sd_nn(
  hyperparam,
  sigma2,
  w_obs,
  w,
  y_obs,
  gps_m,
  kernel_fn = function(x) exp(-x^2),
  n_neighbor = 50,
  block_size = 2000,
  nthread = 1
)

Arguments

hyperparam

A set of hyperparameters for the nnGP.

sigma2

A scaler representing sigma^2.

w_obs

A vector of observed exposure levels.

w

A vector of exposure levels at which the CERF is estimated.

y_obs

A vector of observed outcome values.

gps_m

An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FLASE

kernel_fn

The covariance function of the GP.

n_neighbor

The number of nearest neighbors on one side.

block_size

The number of samples included in a computation block. Mainly used to balance the speed and memory requirement. Larger block_size is faster, but requires more memory.

nthread

An integer value that represents the number of threads to be used by internal packages.

Value

A vector of returned value from compute_posterior_sd_nn.


Estimate the standard deviation of the nugget term in standard Gaussian process

Description

Estimates the standard deviations of the nugget term in standard GP by calculating the standard deviations of the residuals.

Usage

estimate_noise_gp(data, sigma_obs, inv_sigma_obs)

Arguments

data

A vector of outcome data.

sigma_obs

Covariance matrix between observed covariates.

inv_sigma_obs

Inverse of the covariance matrix between observed covariates.

Value

A scalar of estimated standard deviation of the nugget term in standard GP.


Estimate the standard deviation (noise) of the nugget term in nnGP

Description

Estimates the standard deviations of the nugget term (noise) in nnGP by calculating the standard deviations of the residuals.

Usage

estimate_noise_nn(
  hyperparam,
  w_obs,
  GPS_obs,
  y_obs,
  n_neighbor,
  nthread,
  kernel_fn = function(x) exp(-x^2)
)

Arguments

hyperparam

A vector of hyper-parameter values.

w_obs

A vector of observed exposure levels.

GPS_obs

A vector of estimated GPS evaluated at the observed exposure levels.

y_obs

A vector of observed outcomes.

n_neighbor

A number of nearest neighbors on one side.

nthread

A number of cores used in the estimation.

kernel_fn

The covariance function of the GP.

Value

A scalar of estimated standard deviation of the nugget term in nnGP.


Find the optimal hyper-parameter for the nearest neighbor Gaussian process

Description

Computes covariate balance for each combination of provided hyper-parameters and selects the hyper-parameter values that minimizes the covariate balance.

Usage

find_optimal_nn(
  w_obs,
  w,
  y_obs,
  gps_m,
  design_mt,
  hyperparams = expand.grid(seq(0.5, 4.5, 1), seq(0.5, 4.5, 1), seq(0.5, 4.5, 1)),
  kernel_fn = function(x) exp(-x^2),
  n_neighbor = 50,
  block_size = 2000,
  nthread = 1
)

Arguments

w_obs

A vector of the observed exposure levels.

w

A vector of exposure levels at which CERF will be estimated.

y_obs

A vector of observed outcomes

gps_m

An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FLASE

design_mt

The covariate matrix of all samples (intercept excluded).

hyperparams

A matrix of candidate values of the hyper-parameters, each row contains a set of values of all hyper-parameters.

kernel_fn

The covariance function of the GP.

n_neighbor

The number of nearest neighbors on one side.

block_size

The number of samples included in a computation block. Mainly used to balance the speed and memory requirement. Larger block_size is faster, but requires more memory.

nthread

An integer value that represents the number of threads to be used by internal packages.

Value

Estimated covariate balance scores for the grid of hyper-parameter values considered in hyperparams.


Generate synthetic data for the GPCERF package

Description

Generates synthetic data set based on different GPS models and covariates.

Usage

generate_synthetic_data(
  sample_size = 1000,
  outcome_sd = 10,
  gps_spec = 1,
  cova_spec = 1
)

Arguments

sample_size

A number of data samples.

outcome_sd

Standard deviation used to generate the outcome in the synthetic data set.

gps_spec

A numeric value (1-6) that indicates the GPS model used to generate the continuous exposure.

cova_spec

A numeric value (1-2) to modify the covariates.

Value

A data frame of the synthetic data. Outcome is labeled as Y, exposure as w, and covariates cf1-6.

Examples


set.seed(351)
data <- generate_synthetic_data(sample_size = 200)


Get logger settings

Description

Returns current logger settings.

Usage

get_logger()

Value

Returns a list that includes logger_file_path and logger_level.

Examples


set_logger("mylogger.log", "INFO")
log_meta <- get_logger()


Log system information

Description

Logs system related information into the log file.

Usage

log_system_info()

Value

No return value. This function is called for side effects.


Extend generic plot functions for cerf_gp class

Description

A wrapper function to extend generic plot functions for cerf_gp class.

Usage

## S3 method for class 'cerf_gp'
plot(x, ...)

Arguments

x

A cerf_gp object.

...

Additional arguments passed to customize the plot.

Value

Returns a ggplot2 object, invisibly. This function is called for side effects.


Extend generic plot functions for cerf_nngp class

Description

A wrapper function to extend generic plot functions for cerf_nngp class.

Usage

## S3 method for class 'cerf_nngp'
plot(x, ...)

Arguments

x

A cerf_nngp object.

...

Additional arguments passed to customize the plot.

Value

Returns a ggplot2 object, invisibly. This function is called for side effects.


Extend print function for cerf_gp object

Description

Extend print function for cerf_gp object

Usage

## S3 method for class 'cerf_gp'
print(x, ...)

Arguments

x

A cerf_gp object.

...

Additional arguments passed to customize the results.

Value

No return value. This function is called for side effects.


Extend print function for cerf_nngp object

Description

Extend print function for cerf_nngp object

Usage

## S3 method for class 'cerf_nngp'
print(x, ...)

Arguments

x

A cerf_nngp object.

...

Additional arguments passed to customize the results.

Value

No return value. This function is called for side effects.


Set logger settings

Description

Updates logger settings, including log level and location of the file.

Usage

set_logger(logger_file_path = "GPCERF.log", logger_level = "INFO")

Arguments

logger_file_path

A path (including file name) to log the messages. (Default: GPCERF.log)

logger_level

The log level. Available levels include:

  • TRACE

  • DEBUG

  • INFO (Default)

  • SUCCESS

  • WARN

  • ERROR

  • FATAL

Value

No return value. This function is called for side effects.

Examples


set_logger("mylogger.log", "INFO")


print summary of cerf_gp object

Description

print summary of cerf_gp object

Usage

## S3 method for class 'cerf_gp'
summary(object, ...)

Arguments

object

A cerf_gp object.

...

Additional arguments passed to customize the results.

Value

Returns summary of data


print summary of cerf_nngp object

Description

print summary of cerf_nngp object

Usage

## S3 method for class 'cerf_nngp'
summary(object, ...)

Arguments

object

A cerf_nngp object.

...

Additional arguments passed to customize the results.

Value

Returns summary of data.