Type: Package
Title: Monte Carlo Standard Errors
Version: 0.1.1
Author: Dennis Boos, Kevin Matthew, Jason Osborne
Maintainer: Dennis Boos <boos@ncsu.edu>
Description: Computes Monte Carlo standard errors for summaries of Monte Carlo output. Summaries and their standard errors are based on columns of Monte Carlo simulation output. Dennis D. Boos and Jason A. Osborne (2015) <doi:10.1111/insr.12087>.
License: GPL-3
Encoding: UTF-8
RoxygenNote: 7.2.3
Suggests: knitr, rmarkdown
VignetteBuilder: knitr
NeedsCompilation: no
Packaged: 2023-04-06 00:20:56 UTC; boos
Repository: CRAN
Date/Publication: 2023-04-06 11:00:10 UTC

Monte.Carlo.se: A package for computing standard errors of Monte Carlo estimates

Description

The Monte.Carlo.se package has two main R functions mc.se.vector and mc.se.matrix that are built on two basic functions jack.se and boot.se for computing jackknife and bootstrap standard errors, respectively.

Details

A Monte Carlo Study often results in an N by k matrix X of estimates based on N independent samples of simulation data. X might contain parameter estimates (like the mean or median) or test results (say 0 for "accept the null hypothesis" or 1 for "reject the null"). Summary calculations from X, such as means or variances, are then displayed in a table or plot.

The purpose of the Monte.Carlo.se package is to compute estimates of the standard deviation (called standard errors) of these table entries.

Examples are included in the documentation for mc.se.vector and mc.se.matrix. However, more extended examples may be found in the three vignettes found by clicking "index" at the bottom of this page and then on "User guides, package vignettes and other documentation."

Author(s)

Dennis Boos, Kevin Matthew, Jason Osborne


Auxiliary Functions

Description

Auxiliary functions to be used in the Monte.Carlo.se package, mainly with mc.se.matrix. Scroll down to the Examples Section to see the actual code.

Usage

ratio.var(index, xdata)

ratio.sd(index, xdata)

ratio.mse(index, xdata, true)

ratio.mean.vhat.var(index, xdata)

ratio.mean.sdhat.sd(index, xdata)

corr(index, xdata)

cv(x)

varn(x, n)

jack.var(x, theta, ...)

boot.var(x, B, theta, ...)

Arguments

index

index = usually of the form 1:N

xdata

actual data

true

true parameter value when computing mean squared error

x

Input vector in calls to jack.var and boot.var

n

sample size

theta

theta = function in calls to jack.var and boot.var

...

Additional arguments to be passed

B

Bootstrap reps in calls to boot.var

Author(s)

Dennis Boos, Kevin Matthew, Jason Osborne

Examples

# These are extra functions included in the MCse package
# The following functions are to be used with mc.se.matrix

ratio.var <- function(index,xdata)         # ratio of variances
 {var(xdata[index,1])/var(xdata[index,2])}

# The above function is for the ratio of the sample variance of column 1 to
# the sample variance  of column 2 of xdata.
# Note that the actual data goes into xdata, the second argument of ratio.var.
# Example call for 10,000 means and medians:
# ratio.var(1:10000,xdata=cbind(out.m.15,out.med.15))

ratio.sd<-function(index,xdata){           # ratio of standard deviations
 sd(xdata[index,1])/sd(xdata[index,2])}

ratio.mse<-function(index,xdata,true){     # ratio of mean squared errors
 mean((xdata[index,1]-true)^2)/mean((xdata[index,2]-true)^2)}

ratio.mean.vhat.var<-function(index,xdata){# estimates in col 1, vhats in col. 2
 mean(xdata[index,2])/var(xdata[index,1])}

ratio.mean.sdhat.sd<-function(index,xdata){# estimates in col 1, SEs in col. 2
 mean(xdata[index,2])/sd(xdata[index,1])}

corr<-function(index,xdata){               # simple correlation
 cor(xdata[index,1],xdata[index,2])}

# These next two functions correspond to jack.se and boot.se.
# x is a data vector, and theta is a function applied to x.
# Each returns a variance estimate for theta(x).

jack.var <- function(x, theta, ...){  # jackknife estimate of variance
 n <- length(x)
 u <- rep(0, n)
 for(i in 1:n){u[i] <- theta(x[ - i], ...)}
 jack.var <-((n-1)/n)* sum((u-mean(u))^2)
 return(jack.var)}

boot.var <- function(x,B,theta, ...){ # bootstrap estimate of variance
 n <- length(x)
 bootsam <- matrix(sample(x,size = n*B,replace=T), nrow=B)
 thetastar <- apply(bootsam,1,theta,...)
 boot.var <- var(thetastar)
 return(boot.var)}


Bootstrap Standard Error

Description

boot.se – gives a bootstrap standard error (SE=estimated standard deviation)

Usage

boot.se(x, B, theta, ...)

Arguments

x

vector of data

B

number of bootstrap resamples (replications)

theta

function (statistic) applied to the data (e.g., mean, median, var)

...

Additional arguments to be passed

Details

The code was modified from code associated with the appendix of Efron and Tibshirani (1993)

Value

Returns the bootstrap SE of theta(x)

Author(s)

Dennis Boos, Kevin Matthew, Jason Osborne

References

Efron and Tibshirani (1993), _An Introduction to the Bootstrap_.

Boos, D. D., and Osborne, J. A. (2015), "Assessing Variability of Complex Descriptive Statistics in Monte Carlo Studies using Resampling Methods," International Statistical Review, 25, 775-792.

See Also

jack.semc.se.vectormc.se.matrix

Examples


# simple example, data from Boos and Osborne (2105, Table 3)
# using theta=coefficient of variation mean/sd

x=c(1,2,79,5,17,11,2,15,85)
cv=function(x){sd(x)/mean(x)}

cv(x)
# [1] 1.383577

# bootstrap SE using B=1000

set.seed(384)
boot.se(x,B=1000,theta=cv)
# [1] 0.3416897

# More complex example using two samples, se for ratio of means
# data from Higgins (2003, problem 4.4, p. 142), LDH readings on 7 patients

before=c(89,90,87,98,120,85,97)
after=c(76,101,84,86,105,84,93)

# requires function using row index as "data", real data is extra parameter xdata
ratio.means <- function(index,xdata)
 {mean(xdata[index,1])/mean(xdata[index,2])}

ratio.means(index=1:7,xdata=data.frame(before,after))
# [1] 1.058824

# boostrap SE for ratio of means

set.seed(2917)
boot.se(x=1:7,B=1000,theta=ratio.means,xdata=data.frame(before,after))
# [1] 0.03576659

# To illustrate use with Monte Carlo output, first create some sample data
# 10,000 samples of size 15 from the Laplace (double exp) distribution

N<-10000
set.seed(450)
z1 <- matrix(rexp(N*15),nrow=N)
z2 <- matrix(rexp(N*15),nrow=N)
z<-(z1-z2)/sqrt(2)              # subtract standard exponentials
out.m.15   <- apply(z,1,mean)
out.t20.15 <- apply(z,1,mean,trim=0.20)
out.med.15 <- apply(z,1,median)


# The three datasets (out.m.15,out.t20.15,out.med.15) each contain 10000 values.
# If we want use the variance of each column in a table, then to get
# the Monte Carlo standard error of those 3 variances,

set.seed(250)
boot.se(out.m.15,B=1000,theta = var)
# [1] 0.0009373835

boot.se(out.t20.15,B=1000,theta = var)
# [1] 0.0007086057

boot.se(out.med.15,B=1000,theta = var)
# [1] 0.0008307258
 # ends donttest
# Function Code

boot.se<-function(x, B, theta, ...){
  call <- match.call()
  n <- length(x)
  bootsam <- matrix(sample(x, size = n * B, replace = T), nrow = B)
  thetastar <- apply(bootsam, 1, theta, ...)
  se <- sd(thetastar)
  return(se)
  }


Jackknife Standard Error

Description

jack.se — gives jackknife standard error (SE=estimated standard deviation)

Usage

jack.se(x, theta, ...)

Arguments

x

vector of data

theta

— function (statistic) applied to the data (e.g., mean, median, var)

...

Additional arguments to be passed

Details

The code was modified from code associated with the appendix of Efron and Tibshirani (1993)

Value

Returns the jackknife standard error of theta(x)

Author(s)

Dennis Boos, Kevin Matthew, Jason Osborne

References

Efron and Tibshirani (1993), *An Introduction to the Bootstrap*.

Boos, D. D., and Osborne, J. A. (2015), "Assessing Variability of Complex Descriptive Statistics in Monte Carlo Studies using Resampling Methods," International Statistical Review, 25, 775-792.

See Also

boot.semc.se.vectormc.se.matrix

Examples


# simple example, data from Boos and Osborne (2015, Table 3)
# using theta = coefficient of variation = mean/sd

x=c(1,2,79,5,17,11,2,15,85)
cv=function(x){sd(x)/mean(x)}
cv(x)
# [1] 1.383577

jack.se(x,theta=cv)
# [1] 0.3435321

# More complex example using two samples, se for ratio of means
# data from Higgins (2003, problem 4.4, p. 142), LDH readings on 7 patients

before=c(89,90,87,98,120,85,97)
after=c(76,101,84,86,105,84,93)

# requires function using row index as "data,"
# real data is extra parameter xdata

ratio.means <- function(index,xdata){
 mean(xdata[index,1])/mean(xdata[index,2])}

# ratio of means for before-after data

ratio.means(index=1:7,xdata=data.frame(before,after))
# [1] 1.058824

# jackknife SE for ratio of means

jack.se(x=1:7,theta=ratio.means,xdata=data.frame(before,after))
# [1] 0.03913484

# To illustrate use with Monte Carlo output, first create some sample data
# 10,000 samples of size 15 from the Laplace (double exp) distribution

N<-10000
set.seed(450)
z1 <- matrix(rexp(N*15),nrow=N)
z2 <- matrix(rexp(N*15),nrow=N)
z<-(z1-z2)/sqrt(2)              # subtract standard exponentials
out.m.15   <- apply(z,1,mean)
out.t20.15 <- apply(z,1,mean,trim=0.20)
out.med.15 <- apply(z,1,median)

# The three datasets (out.m.15,out.t20.15,out.med.15) each contain 10,000 values.
# If we want use the variance of each column in a table, then to get
# the Monte Carlo standard error of those 3 variances,

jack.se(out.m.15,theta = var)
# [1] 0.0009612314

jack.se(out.t20.15,theta = var)
# [1] 0.0007008859

jack.se(out.med.15,theta = var)
# [1] 0.0008130531

# Function Code

jack.se=function(x, theta, ...){
 call <- match.call()
 n <- length(x)
 u <- rep(0, n)
 for(i in 1:n) {u[i] <- theta(x[ - i], ...)}
 jack.se <- sqrt(((n - 1)/n) * sum((u - mean(u))^2))
 return(jack.se)}


Standard Errors for Summaries Based on 2 or More Vectors of Monte Carlo Output

Description

mc.se.matrix — jackknife and bootstrap SEs for statistics based on k correlated samples

Usage

mc.se.matrix(x, xcol, B = 0, seed = NULL, summary.f, ...)

Arguments

x

N by k matrix or data frame of MC output, k>1

xcol

vector specifying columns of x to use

B

B=0 (default) means use jackknife, B>0 means use bootstrap with B resamples, If B>0, then a seed must be given to start the bootstrap resampling

seed

seed=NULL (default) used with jackknife, otherwise needs positive integer

summary.f

summary function with arguments (index,xdata), x goes into xdata

...

Additional arguments to be passed

Details

Suppose that k columns of Monte Carlo output are produced from N independent Monte Carlo samples, and summary statistics involving 2 or more columns are to be reported in tables. mc.se.matrix gives Monte Carlo standard errors (SEs) for these summary statistics (e.g., the ratio of 2 column means or variances). The vignette vignette("Example2", package = "Monte.Carlo.se") is a detailed account of using mc.se.vector.

Value

Returns data frame of summary.f, Monte Carlo (MC) SE of summary.f, MC sample size N, method (jackknife or bootstrap), B and seed if bootstrap is used

Author(s)

Dennis Boos, Kevin Matthew, Jason Osborne

References

Boos, D. D., and Osborne, J. A. (2015), "Assessing Variability of Complex Descriptive Statistics in Monte Carlo Studies using Resampling Methods," International Statistical Review, 25, 775-792.

Examples


# First create MC output used in Table 9.1, p. 367, of Boos and Stefanski (2013).
# norm15 holds 10,000 sample means, 20% trimmed means, and medians
# for normal samples of size 15

N <- 10000
set.seed(346)                   # sets the random number seed
z <- matrix(rnorm(N*15),nrow=N) # N rows of N(0,1) samples, n=15
out.m.15 <- apply(z,1,mean)     # mean for each sample
out.t20.15 <- apply(z,1,mean,trim=0.20)   # 20% trimmed mean for each sample
out.med.15 <- apply(z,1,median)           # median for each sample

# Save all 1000 blocks of 3 estimators in a data frame
norm15 <- data.frame(mean=out.m.15,trim20=out.t20.15,median=out.med.15)

# Pearson correlation based on 2 vectors of MC output
# Note that the 2 vectors are in xdata, index is for the rows of xdata
corr<-function(index,xdata){cor(xdata[index,1],xdata[index,2])}

# Compute jackknife SE for summary.f=corr

mc.se.matrix(norm15,xcol=c(1,2),summary.f=corr)
#      summary          se     N    method
#  1 0.9367602 0.001256079 10000 Jackknife

# Compute bootstrap SE for summary.f=corr

mc.se.matrix(norm15,xcol=c(1,2),summary.f=corr,B=1000,seed=3928)
#      summary          se     N    method    B seed
#  1 0.9367602 0.001287065 10000 Bootstrap 1000 3928

# Rerun with B=5000

mc.se.matrix(norm15,xcol=c(1,2),summary.f=corr,B=5000,seed=3928)
#      summary          se     N    method    B seed
#  1 0.9367602 0.001266177 10000 Bootstrap 5000 3928

# Compute jackknife SE for summary.f=ratio.var
# = ratio of variances of the two columns
# A ratio of 2 variances facilitates comparison of the variances

ratio.var <- function(index,xdata)
{var(xdata[index,1])/var(xdata[index,2])}

# ratio of column 1 variance to column 2 variance

mc.se.matrix(norm15,xcol=c(1,2),summary.f=ratio.var)
#     summary          se     N    method
# 1 0.8895367 0.006263652 10000 Jackknife
# Coupled with SE=0.006, the ratio=0.89 shows the second variance is larger than the fitst

# ratio of column 2 variance to column 1 variance
# Same conclusion as for the previous ratio

mc.se.matrix(norm15,xcol=c(2,1),summary.f=ratio.var)
#    summary          se     N    method
# 1 1.124181 0.007915667 10000 Jackknife


# Function Code

mc.se.matrix <- function(x,xcol,B=0,seed=NULL,summary.f,...){
 # x is an n by k matrix or data frame of MC output, k>1
 # xcol is a vector specifying columns of x to use
 # summary.f is the summary function with arguments (index,xdata)
 # ... is for additional arguments to summary.f
 # B=0 means use jackkife, B>0 means use bootstrap with B resamples
 # If B>0, then a seed must be given to start the bootstrap resampling
 N=nrow(x)
 x=x[,xcol]   # use columns selected
 if(B>0){
   if(is.null(seed))stop('If B>0, then seed for the bootstrap must be given in the call')
   se=boot.se(1:N,B=B,theta = summary.f,xdata=x,...)}
 else{se=jack.se(1:N,theta = summary.f,xdata=x,...)}
 summ = summary.f(1:N,xdata=x,...)
 if(B>0) {out=data.frame(summary=summ,se,N,method="Bootstrap",B,seed)}
 else{out=data.frame(summary=summ,se,N,method="Jackknife")}
 return(out)}
 

Standard Errors for Monte Carlo Output Summaries

Description

mc.se.vector — gives jackknife and bootstrap SEs for a vector of MC output (say N estimates)

Usage

mc.se.vector(x, summary.f, B = 0, seed = NULL, ...)

Arguments

x

vector from N independent Monte Carlo replications

summary.f

summary function computed from x (e.g., mean, median, var)

B

B=0 means use jackknife (default), B>0 means use bootstrap with B resamples, If B>0, then a seed must be given to start the bootstrap resampling

seed

seed=NULL (default) used with jackknife, otherwise needs a positive integer

...

Additional arguments to be passed

Details

Suppose that an N-vector of Monte Carlo output (thus, a sample of size N) is produced from N independent Monte Carlo samples, and a summary statistic like the mean or variance is to be reported in a table. mc.se.vector gives Monte Carlo standard errors (SEs) for these summary statistics. The vignette vignette("Example1", package = "Monte.Carlo.se") is a detailed account of using mc.se.vector.

Value

Returns data frame of summary.f , MC standard error of summary.f, MC sample size N, method (jackknife or bootstrap), B and seed if bootstrap is used

Author(s)

Dennis Boos, Kevin Matthew, Jason Osborne

References

Boos, D. D., and Osborne, J. A. (2015), "Assessing Variability of Complex Descriptive Statistics in Monte Carlo Studies using Resampling Methods," International Statistical Review, 25, 775-792.

Examples


# First create MC output used for Table 9.1, p. 367, of Boos and Stefanski (2013)
# norm15 holds 10,000 sample means, 20% trimmed means, and medians
# computed from normal samples of size 15

N<-10000
set.seed(346)                   # sets the random number seed
z <- matrix(rnorm(N*15),nrow=N) # N rows of N(0,1) samples, n=15
out.m.15 <- apply(z,1,mean)     # mean for each sample
out.t20.15 <- apply(z,1,mean,trim=0.20)   # 20% trimmed mean for each sample
out.med.15 <- apply(z,1,median)           # median for each sample

# Save all 1000 blocks of 3 estimators in a data frame

norm15 <- data.frame(mean=out.m.15,trim20=out.t20.15,median=out.med.15)

# Compute the jackknife Standard Error (SE) for summary.f=mean
# This summary is useful for estimating the bias of estimators.

mc.se.vector(norm15[,1],summary.f=mean)
#        summary         se     N    method
# 1 -0.001920642 0.00256714 10000 Jackknife

# Compute a bootstrap SE for summary.f=mean

mc.se.vector(norm15[,1],B=1000,seed=4822,summary.f=mean)
#        summary          se     n    method    B seed
# 1 -0.001920642 0.002573516 10000 Bootstrap 1000 4822

# compare to basic R

mean(norm15[,1])
# [1] -0.001920642

sd(norm15[,1])/sqrt(10000)
# [1] 0.00256714

# Illustrate use with summaries having additional parameters.
# Compute the jackknife SE for summary.f=varn
# Multiplying the variance by sample size n allows comparison for different n

varn=function(x,n){n*var(x)}

# The additional parameter n replaces the ... in the mc.se.vector definition

mc.se.vector(norm15[,1],summary.f=varn,n=15)
#     summary         se     N    method
# 1 0.9885314 0.01407249 10000 Jackknife

# Compute a bootstrap SE for summary.f=varn

mc.se.vector(norm15[,1],B=1000,seed=3029,summary.f=varn,n=15)
#     summary         se     n    method    B seed
# 1 0.9885314 0.01408173 10000 Bootstrap 1000 3029


# Function Code

mc.se.vector <- function(x,summary.f,B=0,seed=NULL,...){
# x is a vector from N independent Monte Carlo replications
# summary.f is a summary function computed from MC output
# ... is for additional arguments to summary.f
# B=0 means use jackkife, B>0 means use bootstrap with B resamples
 if (B>0){
  if(is.null(seed))stop('If B>0, then seed for the bootstrap must be given in the call')
  set.seed(seed);se=boot.se(x,B=B,theta = summary.f,...)}
 else{se = jack.se(x,theta = summary.f,...)}
summ = summary.f(x,...)
if(B>0){out=data.frame(summary=summ,se,n=length(x),method="Bootstrap",B,seed)}
 else{out=data.frame(summary=summ,se,N=length(x),method="Jackknife")}
return(out)}


Standard Errors for Paired Comparisons of Monte Carlo Output Summaries

Description

pairwise.se — gives jackknife and bootstrap SEs for all the pairwise difference or ratios of Monte Carlo summaries.

Usage

pairwise.se(
  x,
  xcol,
  diff = TRUE,
  digits = 4,
  B = 0,
  seed = NULL,
  summary.f,
  ...
)

Arguments

x

vector from N independent Monte Carlo replications

xcol

columns of x to be used

diff

If TRUE (default), uses differences; if diff=F, uses ratios

digits

number of digits to retain in output data frame

B

B=0 means use jackknife (default), B>0 means use bootstrap with B resamples, If B>0, then a seed must be given to start the bootstrap resampling

seed

seed=NULL (default) used with jackknife, otherwise needs a positive integer

summary.f

summary function computed from x (e.g., mean, median, var)

...

Additional arguments to be passed

Details

Suppose that an N-vector of Monte Carlo output (thus, a sample of size N) is produced from N independent Monte Carlo samples, and a summary statistic like the mean or variance is to be reported in a table. pairwise.se gives Monte Carlo standard errors (SEs) for all pairwise differences or ratios of these summary statistics. The vignette vignette("Example3", package = "Monte.Carlo.se") is a detailed account of using pairwise.se.

Value

Returns a data frame of the indiviual ith and jth column summaries (summi and summj), the differences or ratios of those summaries (summary), MC standard error of the difference or ratio, MC sample size N, method (jackknife or bootstrap), B and seed if bootstrap is used

Author(s)

Dennis Boos, Kevin Matthew, Jason Osborne

References

Boos, D. D., and Osborne, J. A. (2015), "Assessing Variability of Complex Descriptive Statistics in Monte Carlo Studies using Resampling Methods," International Statistical Review, 25, 775-792.

Examples


# Using the output data matrix hold generated in vignette Example3, 
# calculate jackknife and bootstrap standard errors
# for the differences and ratios of the CV estimates.

# Jackknife SE of Differences of CVs

# pairwise.se(hold,xcol=10:12,summary.f=cv)
# elem  summi  summj summary     se  t.stat    N    method
# 1 10 11 0.6884 0.7030 -0.0146 0.0299 -0.4877 1000 Jackknife
# 2 10 12 0.6884 0.6489  0.0395 0.0195  2.0274 1000 Jackknife
# 3 11 12 0.7030 0.6489  0.0541 0.0311  1.7374 1000 Jackknife

# Jackknife SE of Ratios of CVs

# pairwise.se(hold,xcol=10:12,diff=FALSE,summary.f=cv)
# elem  summi  summj summary     se  t.stat    N    method
# 1 10 11 0.6884 0.7030  0.9792 0.0429 -0.4833 1000 Jackknife
# 2 10 12 0.6884 0.6489  1.0608 0.0321  1.8972 1000 Jackknife
# 3 11 12 0.7030 0.6489  1.0833 0.0475  1.7531 1000 Jackknife

# Bootstrap SE of Differences of CVs

# pairwise.se(hold,xcol=10:12,B=1000,seed=770,summary.f=cv)
# elem  summi  summj summary     se  t.stat    B seed    N    method
# 1 10 11 0.6884 0.7030 -0.0146 0.0278 -0.5250 1000  770 1000 Bootstrap
# 2 10 12 0.6884 0.6489  0.0395 0.0182  2.1671 1000  770 1000 Bootstrap
# 3 11 12 0.7030 0.6489  0.0541 0.0303  1.7844 1000  770 1000 Bootstrap

# Bootstrap SE of Ratios of CVs

# pairwise.se(hold,xcol=10:12,diff=FALSE,B=1000,seed=770,summary.f=cv)
# elem  summi  summj summary     se  t.stat    B seed    N    method
# 1 10 11 0.6884 0.7030  0.9792 0.0390 -0.5316 1000  770 1000 Bootstrap
# 2 10 12 0.6884 0.6489  1.0608 0.0292  2.0797 1000  770 1000 Bootstrap
# 3 11 12 0.7030 0.6489  1.0833 0.0430  1.9372 1000  770 1000 Bootstrap



Code for Generating Random Samples

Description

sim.samp — generates N samples of size n from distribution DIST

Usage

sim.samp(nrep, n, DIST, ...)

Arguments

nrep

= number of random data sets

n

= sample size of each data set

DIST

= distribution to generate from, e.g., runif, rnorm, additional parameters in ...

...

Additional arguments to be passed

Value

N by n matrix, each row is a data set of size n

Author(s)

Dennis Boos, Kevin Matthew, Jason Osborne

Examples

N<-100
 set.seed(346)            # sets the random number seed
 sim.samp(N,15,rnorm)->z  # 100 N(0,1) samples of size n=15
 sim.samp(N,40,rnorm,mean=10,sd=5)->z  # 100 N(10,25) samples of size n=40

# Function Code

sim.samp <- function(nrep,n,DIST,...){
 data <- matrix(DIST(n * nrep, ...), ncol = n, nrow = nrep)}