Title: | Gaussian Processes for Estimating Causal Exposure Response Curves |
Version: | 0.2.4 |
Maintainer: | Boyu Ren <bren@mgb.org> |
Description: | Provides a non-parametric Bayesian framework based on Gaussian process priors for estimating causal effects of a continuous exposure and detecting change points in the causal exposure response curves using observational data. Ren, B., Wu, X., Braun, D., Pillai, N., & Dominici, F.(2021). "Bayesian modeling for exposure response curve via gaussian processes: Causal effects of exposure to air pollution on health outcomes." arXiv preprint <doi:10.48550/arXiv.2105.03454>. |
License: | GPL (≥ 3) |
Language: | en-US |
URL: | https://github.com/NSAPH-Software/GPCERF |
BugReports: | https://github.com/NSAPH-Software/GPCERF/issues |
Copyright: | Harvard University |
Imports: | parallel, xgboost, stats, MASS, spatstat.geom, logger, Rcpp, RcppArmadillo, ggplot2, cowplot, rlang, Rfast, SuperLearner, wCorr |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
Depends: | R (≥ 3.5.0) |
Suggests: | rmarkdown, knitr, testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
VignetteBuilder: | knitr |
LinkingTo: | RcppArmadillo, Rcpp |
NeedsCompilation: | yes |
Packaged: | 2024-04-15 13:52:18 UTC; Boyu |
Author: | Naeem Khoshnevis |
Repository: | CRAN |
Date/Publication: | 2024-04-15 14:30:02 UTC |
The 'GPCERF' package.
Description
Provides a non-parametric Bayesian framework based on Gaussian process priors for estimating causal effects of a continuous exposure and detecting change points in the causal exposure response curves using observational data.
Author(s)
Naeem Khoshnevis
Boyu Ren
Danielle Braun
References
Ren, B., Wu, X., Braun, D., Pillai, N. and Dominici, F., 2021. Bayesian modeling for exposure response curve via gaussian processes: Causal effects of exposure to air pollution on health outcomes. arXiv preprint arXiv:2105.03454.
A helper function for cerf_gp object
Description
A helper function to plot cerf_gp object using ggplot2 package.
Usage
## S3 method for class 'cerf_gp'
autoplot(object, ...)
Arguments
object |
A cerf_gp object. |
... |
Additional arguments passed to customize the plot. |
Value
Returns a ggplot object.
A helper function for cerf_nngp object
Description
A helper function to plot cerf_nngp object using ggplot2 package.
Usage
## S3 method for class 'cerf_nngp'
autoplot(object, ...)
Arguments
object |
A cerf_nngp object. |
... |
Additional arguments passed to customize the plot. |
Value
Returns a ggplot object.
Calculate derivatives of CERF for nnGP
Description
Calculates the posterior mean of the derivative of CERF at a given exposure level with nnGP.
Usage
compute_deriv_nn(
w,
w_obs,
gps_m,
y_obs,
hyperparam,
n_neighbor,
block_size,
kernel_fn = function(x) exp(-x),
kernel_deriv_fn = function(x) -exp(-x)
)
Arguments
w |
A scalar of exposure level of interest. |
w_obs |
A vector of observed exposure levels of all samples. |
gps_m |
An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FALSE |
y_obs |
A vector of observed outcome values. |
hyperparam |
A vector of hyper-parameters in the GP model. |
n_neighbor |
The number of nearest neighbors on one side. |
block_size |
The number of samples included in a computation block.
Mainly used to balance the speed and memory requirement. Larger
|
kernel_fn |
The covariance function. The input is the square of Euclidean distance. |
kernel_deriv_fn |
The partial derivative of the covariance function. The input is the square of Euclidean distance. |
Value
A scalar of estimated derivative of CERF at w
in nnGP.
Calculate derivatives of CERF
Description
Calculates the weights assigned to each observed outcome when deriving the posterior mean of the first derivative of CERF at a given exposure level.
Usage
compute_deriv_weights_gp(
w,
w_obs,
gps_m,
hyperparam,
kernel_fn = function(x) exp(-x),
kernel_deriv_fn = function(x) -exp(-x)
)
Arguments
w |
A scalar of exposure level of interest. |
w_obs |
A vector of observed exposure levels of all samples. |
gps_m |
An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FALSE |
hyperparam |
A vector of hyper-parameters in the GP model. |
kernel_fn |
The covariance function. |
kernel_deriv_fn |
The partial derivative of the covariance function. |
Value
A vector of weights for all samples, based on which the posterior mean of the derivative of CERF at the exposure level of interest is calculated.
Compute matrix inverse for a covariate matrix
Description
Computes inverse of a covariate matrix using Choleski decomposition.
Usage
compute_inverse(mtrx)
Arguments
mtrx |
An |
Value
A matrix that represent the inverse of the input matrix.
Compute mean, credible interval, and covariate balance in standard Gaussian process (GP)
Description
Calculates the induced covariate balance associated with one hyper-parameter configuration in standard GP.
Usage
compute_m_sigma(
hyperparam,
outcome_data,
treatment_data,
covariates_data,
w,
gps_m,
tuning,
kernel_fn = function(x) exp(-x^2)
)
Arguments
hyperparam |
A vector of values of hyper-parameters.
|
outcome_data |
A vector of outcome data. |
treatment_data |
A vector of treatment data. |
covariates_data |
A data frame of covariates data. |
w |
A vector of exposure levels at which the CERF is estimated. |
gps_m |
An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FLASE |
tuning |
The function is used for parameter tuning (default = TRUE) or estimation (FALSE) |
kernel_fn |
The covariance function of GP. |
Value
A list containing two elements:
A vector of absolute weighted correlation of each covariate to the exposure, which is the metric for covariate balance
An estimated CERF at
w_all
based on the hyper-parameter values inparam
.
Calculate posterior means for nnGP model
Description
Calculates the posterior mean of a point on the CERF based on the nnGP model. This function also returns the weights assigned to all nearest neighbors when calculating the posterior mean.
Usage
compute_posterior_m_nn(
hyperparam,
w,
gps_w,
obs_ord,
y_obs_ord,
kernel_fn = function(x) exp(-x^2),
n_neighbor = 10,
block_size = 10000
)
Arguments
hyperparam |
A set of hyperparameters in the GP model. |
w |
A scaler representing the exposure level for the point of interest on the CERF. |
gps_w |
The GPS for all samples when their exposure levels are set
at |
obs_ord |
A matrix of two columns. First column is the observed exposure levels of all samples; second is the GPS at the observed exposure levels. The rows are in ascending order for the first column. |
y_obs_ord |
A vector of observed outcome values. The vector is ordered
as |
kernel_fn |
The covariance function of the GP. |
n_neighbor |
The number of nearest neighbors on one side. |
block_size |
Number of samples included in a computation block.
Mainly used to balance the speed and memory requirement.
Larger |
Value
TODO: The first column is the selected index and the second column is weight.
A two-column matrix. The first column is the weights assigned to each
nearest neighbor. The second column is the corresponding observed outcome
value. The weight in the last row of this matrix is NA and the observed
outcome value is the estimated posterior mean of the CERF at point w
,
which is the weighted sum of all observed outcome values of the neighbors.
Calculate posterior standard deviations for nnGP model
Description
Calculates the posterior standard deviation of a point on the CERF based on the nnGP model.
Usage
compute_posterior_sd_nn(
hyperparam,
w,
gps_w,
obs_ord,
sigma2,
kernel_fn = function(x) exp(-x^2),
n_neighbor = 10,
block_size = 10000
)
Arguments
hyperparam |
The values of hyperparameters in the GP model. |
w |
The exposure level for the point of interest on the CERF. |
gps_w |
The GPS for all samples when their exposure levels are set
at |
obs_ord |
A matrix of two columns. The first column is the observed exposure levels of all samples; the second is the GPS at the observed exposure levels. The rows are in ascending order for the first column. |
sigma2 |
A scaler representing |
kernel_fn |
The covariance function of the GP. |
n_neighbor |
Number of nearest neighbors on one side. |
block_size |
Number of samples included in a computation block.
Mainly used to balance the speed and memory requirement.
Larger |
Value
The posterior standard deviation of the estimated CERF at w
.
Detect change-point in standard GP
Description
Calculates the posterior mean of the difference between left- and right-derivatives at an exposure level for the detection of change points.
Usage
compute_rl_deriv_gp(
w,
w_obs,
y_obs,
gps_m,
hyperparam,
kernel_fn = function(x) exp(-x),
kernel_deriv_fn = function(x) -exp(-x)
)
Arguments
w |
A scalar of exposure level of interest. |
w_obs |
A vector of observed exposure levels of all samples. |
y_obs |
A vector of observed outcome values of all samples. |
gps_m |
An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FLASE |
hyperparam |
A vector of hyper-parameters in the GP model. |
kernel_fn |
The covariance function. |
kernel_deriv_fn |
The partial derivative of the covariance function. |
Value
A numeric value of the posterior mean of the difference between two one-sided derivatives.
Examples
set.seed(847)
data <- generate_synthetic_data(sample_size = 100)
gps_m <- estimate_gps(cov_mt = data[,-(1:2)],
w_all = data$treat,
sl_lib = c("SL.xgboost"),
dnorm_log = FALSE)
wi <- 8.6
val <- compute_rl_deriv_gp(w = wi,
w_obs = data$treat,
y_obs = data$Y,
gps_m = gps_m,
hyperparam = c(1,1,2))
Calculate right minus left derivatives for change-point detection in nnGP
Description
Calculates the posterior mean of the difference between left- and right-derivatives at an exposure level for the detection of change points. nnGP approximation is used.
Usage
compute_rl_deriv_nn(
w,
w_obs,
gps_m,
y_obs,
hyperparam,
n_neighbor,
block_size,
kernel_fn = function(x) exp(-x),
kernel_deriv_fn = function(x) -exp(-x)
)
Arguments
w |
A scalar of exposure level of interest. |
w_obs |
A vector of observed exposure levels of all samples. |
gps_m |
An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FLASE |
y_obs |
A vector of observed outcome values. |
hyperparam |
A vector of hyper-parameters in the GP model. |
n_neighbor |
The number of nearest neighbors on one side. |
block_size |
The number of samples included in a computation block.
Mainly used to balance the speed and memory requirement. Larger
|
kernel_fn |
The covariance function. The input is the square of Euclidean distance. |
kernel_deriv_fn |
The partial derivative of the covariance function. The input is the square of Euclidean distance. |
Value
A numeric value of the posterior mean of the difference between two one-sided derivatives.
Examples
set.seed(325)
data <- generate_synthetic_data(sample_size = 200)
gps_m <- estimate_gps(cov_mt = data[,-(1:2)],
w_all = data$treat,
sl_lib = c("SL.xgboost"),
dnorm_log = FALSE)
wi <- 12.2
deriv_val <- compute_rl_deriv_nn(w = wi,
w_obs = data$treat,
gps_m = gps_m,
y_obs = data$Y,
hyperparam = c(0.2,0.4,1.2),
n_neighbor = 20,
block_size = 10)
Compute posterior credible interval
Description
Computes posterior credible interval for requested exposure level.
Usage
compute_sd_gp(
w,
scaled_obs,
hyperparam,
sigma,
gps_m,
kernel_fn = function(x) exp(-x^2)
)
Arguments
w |
A scalar of exposure level of interest. |
scaled_obs |
A matrix of two columns.
|
hyperparam |
A vector of hyper-parameters for the GP.
|
sigma |
A scaler that represents noise. |
gps_m |
An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FLASE |
kernel_fn |
The covariance function of GP. |
Value
Posterior credible interval (scaler) for the requested exposure level (w).
Compute weighted covariate balance
Description
Computes weighted covariate balance for given data sets.
Usage
compute_w_corr(w, covariate, weight)
Arguments
w |
A vector of observed continuous exposure variable. |
covariate |
A data frame of observed covariates variable. |
weight |
A vector of weights. |
Value
The function returns a list saved the measure related to covariate balance
absolute_corr
: the absolute correlations for each pre-exposure
covairates;
mean_absolute_corr
: the average absolute correlations for all
pre-exposure covairates.
Examples
set.seed(639)
n <- 100
mydata <- generate_synthetic_data(sample_size=100)
year <- sample(x=c("2001","2002","2003","2004","2005"),size = n,
replace = TRUE)
region <- sample(x=c("North", "South", "East", "West"),size = n,
replace = TRUE)
mydata$year <- as.factor(year)
mydata$region <- as.factor(region)
mydata$cf5 <- as.factor(mydata$cf5)
cor_val <- compute_w_corr(mydata[,2],
mydata[, 3:length(mydata)],
runif(n))
print(cor_val$mean_absolute_corr)
Calculate weights for estimation of a point on CERF
Description
Calculates the weights of observed outcomes which is then used to estimate the posterior mean of CERF at a given exposure level.
Usage
compute_weight_gp(
w,
w_obs,
scaled_obs,
hyperparam,
inv_sigma_obs,
gps_m,
est_sd = FALSE,
kernel_fn = function(x) exp(-x^2)
)
Arguments
w |
A scalar of exposure level of interest. |
w_obs |
A vector of observed exposure levels of all samples. |
scaled_obs |
A matrix of two columns.
|
hyperparam |
A vector of hyper-parameters for the GP.
|
inv_sigma_obs |
Inverse of the covariance matrix between observed samples. |
gps_m |
An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FLASE |
est_sd |
Should the posterior se be computed (default=FALSE) |
kernel_fn |
The covariance function of GP. |
Value
A list of two elements, weight and standard deviation.
Estimate the conditional exposure response function using Gaussian process
Description
Estimates the conditional exposure response function (cerf) using Gaussian Process (gp). The function tune the best match (the lowest covariate balance) for the provided set of hyperparameters.
Usage
estimate_cerf_gp(
data,
w,
gps_m,
params,
outcome_col,
treatment_col,
covariates_col,
nthread = 1,
kernel_fn = function(x) exp(-x^2)
)
Arguments
data |
A data.frame of observation data. |
w |
A vector of exposure level to compute CERF (please also see the notes). |
gps_m |
An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FLASE |
params |
A list of parameters that is required to run the process. These parameters include:
|
outcome_col |
An outcome column name in |
treatment_col |
A treatment column name in |
covariates_col |
Covariates columns name in |
nthread |
An integer value that represents the number of threads to be used by internal packages. |
kernel_fn |
A kernel function. A default value is a Gaussian Kernel. |
Value
A cerf_gp object that includes the following values:
w, the vector of exposure levels.
pst_mean, Computed mean for the w vector.
pst_sd, Computed credible interval for the w vector.
Note
Please note that w
is a vector representing a grid of exposure levels at
which the CERF is to be estimated. This grid can include both observed and
hypothetical values of the exposure variable. The purpose of defining this
grid is to provide a structured set of points across the exposure spectrum
for estimating the CERF. This approach is essential in nonparametric models
like Gaussian Processes (GPs), where the CERF is evaluated at specific points
to understand the relationship between the exposure and outcome variables
across a continuum. It facilitates a comprehensive analysis by allowing
practitioners to examine the effect of varying exposure levels, including
those not directly observed in the dataset.
Examples
set.seed(129)
data <- generate_synthetic_data(sample_size = 100, gps_spec = 3)
# Estimate GPS function
gps_m <- estimate_gps(cov_mt = data[,-(1:2)],
w_all = data$treat,
sl_lib = c("SL.xgboost"),
dnorm_log = FALSE)
# exposure values
w_all <- seq(0,10,1)
cerf_gp_obj <- estimate_cerf_gp(data,
w_all,
gps_m,
params = list(alpha = c(0.1),
beta=0.2,
g_sigma = 1,
tune_app = "all"),
outcome_col = "Y",
treatment_col = "treat",
covariates_col = paste0("cf", seq(1,6)),
nthread = 1)
Estimate the conditional exposure response function using nearest neighbor Gaussian process
Description
Estimates the conditional exposure response function (cerf) using the nearest neighbor (nn) Gaussian Process (gp). The function tune the best match (the lowest covariate balance) for the provided set of hyperparameters.
Usage
estimate_cerf_nngp(
data,
w,
gps_m,
params,
outcome_col,
treatment_col,
covariates_col,
kernel_fn = function(x) exp(-x^2),
nthread = 1
)
Arguments
data |
A data.frame of observation data. |
w |
A vector of exposure level to compute CERF (please also see the notes). |
gps_m |
An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FLASE |
params |
A list of parameters that is required to run the process. These parameters include:
|
outcome_col |
An outcome column name in |
treatment_col |
A treatment column name in |
covariates_col |
Covariates columns name in |
kernel_fn |
A kernel function. A default value is a Gaussian Kernel. |
nthread |
An integer value that represents the number of threads to be used by internal packages. |
Value
A cerf_nngp object that includes the following values:
w, the vector of exposure levels.
pst_mean, the computed mean for the w vector.
pst_sd, the computed credible interval for the w vector.
Note
Please note that w
is a vector representing a grid of exposure levels at
which the CERF is to be estimated. This grid can include both observed and
hypothetical values of the exposure variable. The purpose of defining this
grid is to provide a structured set of points across the exposure spectrum
for estimating the CERF. This approach is essential in nonparametric models
like Gaussian Processes (GPs), where the CERF is evaluated at specific points
to understand the relationship between the exposure and outcome variables
across a continuum. It facilitates a comprehensive analysis by allowing
practitioners to examine the effect of varying exposure levels, including
those not directly observed in the dataset.
Examples
set.seed(19)
data <- generate_synthetic_data(sample_size = 120, gps_spec = 3)
# Estimate GPS function
gps_m <- estimate_gps(cov_mt = data[,-(1:2)],
w_all = data$treat,
sl_lib = c("SL.xgboost"),
dnorm_log = FALSE)
# exposure values
w.all <- seq(0,20,2)
cerf_nngp_obj <- estimate_cerf_nngp(data,
w.all,
gps_m,
params = list(alpha = c(0.1),
beta = 0.2,
g_sigma = 1,
tune_app = "all",
n_neighbor = 20,
block_size = 1e4),
outcome_col = "Y",
treatment_col = "treat",
covariates_col = paste0("cf", seq(1,6)),
nthread = 1)
Estimate a model for generalized propensity score
Description
Estimates a model for generalized propensity score (GPS) using parametric approach.
Usage
estimate_gps(cov_mt, w_all, sl_lib, dnorm_log)
Arguments
cov_mt |
A covariate matrix containing all covariates. Each row is a data sample and each column is a covariate. |
w_all |
A vector of observed exposure levels. |
sl_lib |
A vector of SuperLearner's package libraries. |
dnorm_log |
Logical, if TRUE, probabilities p are given as log(p). |
Value
A data.frame that includes:
a vector of estimated GPS at the observed exposure levels;
a vector of estimated conditional means of exposure levels when the covariates are fixed at the observed values;
estimated standard deviation of exposure levels
a vector of observed exposure levels.
Examples
data <- generate_synthetic_data(sample_size = 200)
gps_m <- estimate_gps(cov_mt = data[,-(1:2)],
w_all = data$treat,
sl_lib = c("SL.xgboost"),
dnorm_log = FALSE)
Estimate the CERF with the nnGP model
Description
Estimates the posterior mean of the conditional exposure response function at specified exposure levels with nnGP.
Usage
estimate_mean_sd_nn(
hyperparam,
sigma2,
w_obs,
w,
y_obs,
gps_m,
kernel_fn = function(x) exp(-x^2),
n_neighbor = 50,
block_size = 2000,
nthread = 1
)
Arguments
hyperparam |
A set of hyperparameters for the nnGP. |
sigma2 |
A scaler representing |
w_obs |
A vector of observed exposure levels. |
w |
A vector of exposure levels at which the CERF is estimated. |
y_obs |
A vector of observed outcome values. |
gps_m |
An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FLASE |
kernel_fn |
The covariance function of the GP. |
n_neighbor |
The number of nearest neighbors on one side. |
block_size |
The number of samples included in a computation block.
Mainly used to balance the speed and memory requirement. Larger
|
nthread |
An integer value that represents the number of threads to be used by internal packages. |
Value
A vector of returned value from compute_posterior_sd_nn
.
Estimate the standard deviation of the nugget term in standard Gaussian process
Description
Estimates the standard deviations of the nugget term in standard GP by calculating the standard deviations of the residuals.
Usage
estimate_noise_gp(data, sigma_obs, inv_sigma_obs)
Arguments
data |
A vector of outcome data. |
sigma_obs |
Covariance matrix between observed covariates. |
inv_sigma_obs |
Inverse of the covariance matrix between observed covariates. |
Value
A scalar of estimated standard deviation of the nugget term in standard GP.
Estimate the standard deviation (noise) of the nugget term in nnGP
Description
Estimates the standard deviations of the nugget term (noise) in nnGP by calculating the standard deviations of the residuals.
Usage
estimate_noise_nn(
hyperparam,
w_obs,
GPS_obs,
y_obs,
n_neighbor,
nthread,
kernel_fn = function(x) exp(-x^2)
)
Arguments
hyperparam |
A vector of hyper-parameter values. |
w_obs |
A vector of observed exposure levels. |
GPS_obs |
A vector of estimated GPS evaluated at the observed exposure levels. |
y_obs |
A vector of observed outcomes. |
n_neighbor |
A number of nearest neighbors on one side. |
nthread |
A number of cores used in the estimation. |
kernel_fn |
The covariance function of the GP. |
Value
A scalar of estimated standard deviation of the nugget term in nnGP.
Find the optimal hyper-parameter for the nearest neighbor Gaussian process
Description
Computes covariate balance for each combination of provided hyper-parameters and selects the hyper-parameter values that minimizes the covariate balance.
Usage
find_optimal_nn(
w_obs,
w,
y_obs,
gps_m,
design_mt,
hyperparams = expand.grid(seq(0.5, 4.5, 1), seq(0.5, 4.5, 1), seq(0.5, 4.5, 1)),
kernel_fn = function(x) exp(-x^2),
n_neighbor = 50,
block_size = 2000,
nthread = 1
)
Arguments
w_obs |
A vector of the observed exposure levels. |
w |
A vector of exposure levels at which CERF will be estimated. |
y_obs |
A vector of observed outcomes |
gps_m |
An S3 gps object including: gps: A data.frame of GPS vectors. - Column 1: GPS - Column 2: Prediction of exposure for covariate of each data sample (e_gps_pred). - Column 3: Standard deviation of e_gps (e_gps_std) used_params: - dnorm_log: TRUE or FLASE |
design_mt |
The covariate matrix of all samples (intercept excluded). |
hyperparams |
A matrix of candidate values of the hyper-parameters, each row contains a set of values of all hyper-parameters. |
kernel_fn |
The covariance function of the GP. |
n_neighbor |
The number of nearest neighbors on one side. |
block_size |
The number of samples included in a computation block.
Mainly used to balance the speed and memory requirement. Larger
|
nthread |
An integer value that represents the number of threads to be used by internal packages. |
Value
Estimated covariate balance scores for the grid of hyper-parameter values
considered in hyperparams
.
Generate synthetic data for the GPCERF package
Description
Generates synthetic data set based on different GPS models and covariates.
Usage
generate_synthetic_data(
sample_size = 1000,
outcome_sd = 10,
gps_spec = 1,
cova_spec = 1
)
Arguments
sample_size |
A number of data samples. |
outcome_sd |
Standard deviation used to generate the outcome in the synthetic data set. |
gps_spec |
A numeric value (1-6) that indicates the GPS model used to generate the continuous exposure. |
cova_spec |
A numeric value (1-2) to modify the covariates. |
Value
A data frame of the synthetic data. Outcome is labeled as Y, exposure as w, and covariates cf1-6.
Examples
set.seed(351)
data <- generate_synthetic_data(sample_size = 200)
Get logger settings
Description
Returns current logger settings.
Usage
get_logger()
Value
Returns a list that includes logger_file_path and logger_level.
Examples
set_logger("mylogger.log", "INFO")
log_meta <- get_logger()
Log system information
Description
Logs system related information into the log file.
Usage
log_system_info()
Value
No return value. This function is called for side effects.
Extend generic plot functions for cerf_gp class
Description
A wrapper function to extend generic plot functions for cerf_gp class.
Usage
## S3 method for class 'cerf_gp'
plot(x, ...)
Arguments
x |
A cerf_gp object. |
... |
Additional arguments passed to customize the plot. |
Value
Returns a ggplot2 object, invisibly. This function is called for side effects.
Extend generic plot functions for cerf_nngp class
Description
A wrapper function to extend generic plot functions for cerf_nngp class.
Usage
## S3 method for class 'cerf_nngp'
plot(x, ...)
Arguments
x |
A cerf_nngp object. |
... |
Additional arguments passed to customize the plot. |
Value
Returns a ggplot2 object, invisibly. This function is called for side effects.
Extend print function for cerf_gp object
Description
Extend print function for cerf_gp object
Usage
## S3 method for class 'cerf_gp'
print(x, ...)
Arguments
x |
A cerf_gp object. |
... |
Additional arguments passed to customize the results. |
Value
No return value. This function is called for side effects.
Extend print function for cerf_nngp object
Description
Extend print function for cerf_nngp object
Usage
## S3 method for class 'cerf_nngp'
print(x, ...)
Arguments
x |
A cerf_nngp object. |
... |
Additional arguments passed to customize the results. |
Value
No return value. This function is called for side effects.
Set logger settings
Description
Updates logger settings, including log level and location of the file.
Usage
set_logger(logger_file_path = "GPCERF.log", logger_level = "INFO")
Arguments
logger_file_path |
A path (including file name) to log the messages. (Default: GPCERF.log) |
logger_level |
The log level. Available levels include:
|
Value
No return value. This function is called for side effects.
Examples
set_logger("mylogger.log", "INFO")
print summary of cerf_gp object
Description
print summary of cerf_gp object
Usage
## S3 method for class 'cerf_gp'
summary(object, ...)
Arguments
object |
A cerf_gp object. |
... |
Additional arguments passed to customize the results. |
Value
Returns summary of data
print summary of cerf_nngp object
Description
print summary of cerf_nngp object
Usage
## S3 method for class 'cerf_nngp'
summary(object, ...)
Arguments
object |
A cerf_nngp object. |
... |
Additional arguments passed to customize the results. |
Value
Returns summary of data.