Type: | Package |
Title: | Calculate Confidence Intervals |
Version: | 0.1.0 |
Description: | This calculates a variety of different CIs for proportions and difference of proportions that are commonly used in the pharmaceutical industry including Wald, Wilson, Clopper-Pearson, Agresti-Coull and Jeffreys for proprotions. And Miettinen-Nurminen (1985) <doi:10.1002/sim.4780040211>, Wald, Haldane, and Mee https://www.lexjansen.com/wuss/2016/127_Final_Paper_PDF.pdf for difference in proportions. |
License: | Apache License (≥ 2) |
URL: | https://gsk-biostatistics.github.io/cicalc/ |
Depends: | R (≥ 4.1.0) |
Imports: | broom, cli, dplyr, forcats, glue, purrr, rlang, tidyr |
Suggests: | testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2025-07-17 09:28:42 UTC; christinafillmore |
Author: | Christina Fillmore
|
Maintainer: | Christina Fillmore <christina.e.fillmore@gsk.com> |
Repository: | CRAN |
Date/Publication: | 2025-07-21 08:50:06 UTC |
Agresti-Coull CI
Description
Calculates the Agresti-Coull
interval (created by Alan Agresti
and Brent Coull
) by
(for 95% CI) adding two successes and two failures to the data and then using the Wald formula to construct a CI.
Usage
ci_prop_agresti_coull(x, conf.level = 0.95, data = NULL)
Arguments
x |
( |
conf.level |
( |
data |
( |
Details
\left( \frac{\tilde{p} + z^2_{\alpha/2}/2}{n + z^2_{\alpha/2}} \pm
z_{\alpha/2} \sqrt{\frac{\tilde{p}(1 - \tilde{p})}{n} +
\frac{z^2_{\alpha/2}}{4n^2}} \right)
Value
An object containing the following components:
n |
Number of responses |
N |
Total number |
estimate |
The point estimate of the proportion |
conf.low |
Lower bound of the confidence interval |
conf.high |
Upper bound of the confidence interval |
conf.level |
The confidence level used |
method |
Type of method used |
Clopper-Pearson CI
Description
Calculates the Clopper-Pearson interval by calling stats::binom.test()
.
Also referred to as the exact
method.
Usage
ci_prop_clopper_pearson(x, conf.level = 0.95, data = NULL)
Arguments
x |
( |
conf.level |
( |
data |
( |
Details
\left( \frac{k}{n} \pm z_{\alpha/2} \sqrt{\frac{\frac{k}{n}(1-\frac{k}{n})}{n} +
\frac{z^2_{\alpha/2}}{4n^2}} \right)
/ \left( 1 + \frac{z^2_{\alpha/2}}{n} \right)
Value
An object containing the following components:
n |
Number of responses |
N |
Total number |
estimate |
The point estimate of the proportion |
conf.low |
Lower bound of the confidence interval |
conf.high |
Upper bound of the confidence interval |
conf.level |
The confidence level used |
method |
Type of method used |
Haldane Confidence Interval for Difference in Proportions
Description
Haldane Confidence Interval for Difference in Proportions
Usage
ci_prop_diff_haldane(x, by, conf.level = 0.95, data = NULL)
Arguments
x |
( |
by |
( |
conf.level |
( |
data |
( |
Details
The confidence interval is calculated by \theta^* \pm w
where:
\theta^* = \frac{(\hat{p}_1 - \hat{p}_2) + z^2v(1-2\hat{\psi})}{1+z^2u}
where
w = \frac{z}{1+z^2u}\sqrt{u\{4\hat{\psi}(1-\hat{\psi})-(\hat{p}_1 - \hat{p}_2)^2\}+2v(1-2\hat{\psi})(\hat{p}_1-\hat{p}_2)
+4z^2v^2(1-2\hat{\psi})^2
}
\hat{\psi} = \frac{\hat{p}_1 + \hat{p}_2}{2}
u = \frac{1/n_1 + 1/n_2}{4}
v = \frac{1/n_1 - 1/n_2}{4}
Value
An object containing the following components:
n |
The number of responses for each group |
N |
The total number in each group |
estimate |
The point estimate of the difference in proportions (theta*) |
conf.low |
Lower bound of the confidence interval |
conf.high |
Upper bound of the confidence interval |
conf.level |
The confidence level used |
method |
Haldane Confidence Interval |
References
Constructing Confidence Intervals for the Differences of Binomial Proportions in SAS
Examples
responses <- expand(c(9, 3), c(10, 10))
arm <- rep(c("treat", "control"), times = c(10, 10))
# Calculate 95% confidence interval for difference in proportions
ci_prop_diff_haldane(x = responses, by = arm)
Jeffreys-Perks Confidence Interval for Difference in Proportions
Description
Jeffreys-Perks Confidence Interval for Difference in Proportions
Usage
ci_prop_diff_jp(x, by, conf.level = 0.95, data = NULL)
Arguments
x |
( |
by |
( |
conf.level |
( |
data |
( |
Details
The confidence interval is calculated by \theta^* \pm w
where:
\theta^* = \frac{(\hat{p}_1 - \hat{p}_2) + z^2v(1-2\hat{\psi})}{1+z^2u}
where
w = \frac{z}{1+z^2u}\sqrt{u\{4\hat{\psi}(1-\hat{\psi})-(\hat{p}_1 - \hat{p}_2)^2\}+2v(1-2\hat{\psi})(\hat{p}_1-\hat{p}_2)
+4z^2v^2(1-2\hat{\psi})^2
}
\hat{\psi} = \frac{1}{2}\left(\frac{x_1 + 1/2}{n_1+1}+\frac{x_2 + 1/2}{n_2+1}\right)
u = \frac{1/n_1 + 1/n_2}{4}
v = \frac{1/n_1 - 1/n_2}{4}
Value
An object containing the following components:
n |
The number of responses for each group |
N |
The total number in each group |
estimate |
The point estimate of the difference in proportions (theta*) |
conf.low |
Lower bound of the confidence interval |
conf.high |
Upper bound of the confidence interval |
conf.level |
The confidence level used |
method |
Jeffreys-Perks Confidence Interval |
References
Constructing Confidence Intervals for the Differences of Binomial Proportions in SAS
Examples
responses <- expand(c(9, 3), c(10, 10))
arm <- rep(c("treat", "control"), times = c(10, 10))
# Calculate 95% confidence interval for difference in proportions
ci_prop_diff_jp(x = responses, by = arm)
Mee Confidence Interval for Difference in Proportions
Description
Mee Confidence Interval for Difference in Proportions
Usage
ci_prop_diff_mee(x, by, conf.level = 0.95, delta = NULL, data = NULL)
Arguments
x |
( |
by |
( |
conf.level |
( |
delta |
( |
data |
( |
Details
The confidence interval is calculated by \theta^* \pm w
where:
\theta^* = \frac{(\hat{p}_1 - \hat{p}_2) + z^2v(1-2\hat{\psi})}{1+z^2u}
where
w = \frac{z}{1+z^2u}\sqrt{u\{4\hat{\psi}(1-\hat{\psi})-(\hat{p}_1 - \hat{p}_2)^2\}+2v(1-2\hat{\psi})(\hat{p}_1-\hat{p}_2)
+4z^2v^2(1-2\hat{\psi})^2
}
\hat{\psi} = \frac{1}{2}\left(\frac{x_1 + 1/2}{n_1+1}+\frac{x_2 + 1/2}{n_2+1}\right)
u = \frac{1/n_1 + 1/n_2}{4}
v = \frac{1/n_1 - 1/n_2}{4}
Value
An object containing the following components:
n |
The number of responses for each group |
N |
The total number in each group |
estimate |
The point estimate of the difference in proportions (p1-p2) |
conf.low |
Lower bound of the confidence interval |
conf.high |
Upper bound of the confidence interval |
conf.level |
The confidence level used |
method |
Mee Confidence Interval |
References
Constructing Confidence Intervals for the Differences of Binomial Proportions in SAS
Examples
responses <- expand(c(9, 3), c(10, 10))
arm <- rep(c("treat", "control"), times = c(10, 10))
# Calculate 95% confidence interval for difference in proportions
ci_prop_diff_mee(x = responses, by = arm)
Miettinen-Nurminen Confidence Interval for Difference in Proportions
Description
Calculates the Miettinen-Nurminen (MN) confidence interval for the difference between two proportions. This method can be more accurate than traditional methods, especially with small sample sizes or proportions close to 0 or 1.
Usage
ci_prop_diff_mn(x, by, conf.level = 0.95, delta = NULL, data = NULL)
Arguments
x |
( |
by |
( |
conf.level |
( |
delta |
( |
data |
( |
Details
The function implements the Miettinen-Nurminen method to compute confidence intervals for the difference between two proportions. This approach:
Calculates the Miettinen-Nurminen score test statistic for different possible values of the proportion difference (delta)
Identifies the delta values where the test statistic equals the critical value corresponding to the desired confidence level
Returns these boundary values as the confidence interval limits
The method uses a score test with a small-sample correction factor, making it more accurate than normal approximation methods, especially for small samples or extreme proportions. The equation for the test statistics is as follows:
H_0: \hat{d}-\delta <= 0 \qquad \text{vs.} \qquad H_1: \hat{d}-\delta > 0
T_\delta = \frac{\hat{p_x} - \hat{p_y} - \delta}{\sigma_{mn}(\delta)}
where \hat{p_*} = s_*/n_*
represent the observed number of successes
divided by the number of participant in that group. The \sigma_{mn}(\delta)
is a
function of the delta values and is create with the following equation"
\tilde{p_*}
represent the MLE of the proportions.
\sigma_{mn}(\delta) = \sqrt{\left[\frac{\tilde{p_y}(1-\tilde{p_y})}{n_x}+\frac{\tilde{p_x}(1-\tilde{p_x})}{n_y} \right]\left(\frac{N}{N-1}\right)}
\tilde{p_x} = 2p\cdot{cos(a)} - \frac{L_2}{3L_3}
and \tilde{p_y} = \tilde{p_x} + \delta
where:
-
p = \pm \sqrt{\frac{L_2^2}{(3L_3)^2} - \frac{L_1}{3L_3}}
-
a = 1/3[\pi + cos^{-1}(q/p^3)]
-
q = \frac{L_2^3}{(3L_3)^3} - \frac{L_1L_2}{6L_3^2} + \frac{L_0}{2L_3}
-
L_3 = n_x + n_y
-
L_2 = (n_x + 2 n_y)\delta - N - (s_x + s_y)
-
L_1 = (n_y\delta - L_3 - 2s_y)\delta + s_x + s_y
-
L_0 = s_y\delta(1-\delta)
For more information about these equations see Miettinen (1985)
Value
An object containing the following components:
estimate |
The point estimate of the difference in proportions (p_x - p_y) |
conf.low |
Lower bound of the confidence interval |
conf.high |
Upper bound of the confidence interval |
conf.level |
The confidence level used |
delta |
delta value(s) used |
statistic |
Z-Statistic under the null hypothesis based on the given 'delta' |
p.value |
p-value under the null hypothesis based on the given 'delta' |
method |
Description of the method used ("Miettinen-Nurminen Confidence Interval") |
If delta
is not provided statistic and p.value will be NULL
References
Miettinen, O. S., & Nurminen, M. (1985). Comparative analysis of two rates. Statistics in Medicine, 4(2), 213-226.
Examples
# Generate binary samples
responses <- expand(c(9, 3), c(10, 10))
arm <- rep(c("treat", "control"), times = c(10, 10))
# Calculate 95% confidence interval for difference in proportions
ci_prop_diff_mn(x = responses, by = arm)
# Calculate 99% confidence interval
ci_prop_diff_mn(x = responses, by = arm, conf.level = 0.99)
# Calculate the p-value under the null hypothesis delta = -0.1
ci_prop_diff_mn(x = responses, by = arm, delta = -0.1)
# Calculate from a data.frame
data <- data.frame(responses, arm)
ci_prop_diff_mn(x = responses, by = arm, data = data)
Stratified Miettinen-Nurminen Confidence Interval for Difference in Proportions
Description
Calculates Stratified Miettinen-Nurminen (MN) confidence intervals and corresponding point estimates for the difference between two proportions
Usage
ci_prop_diff_mn_strata(
x,
by,
strata,
method = c("score", "summary score"),
conf.level = 0.95,
delta = NULL,
data = NULL
)
Arguments
x |
( |
by |
( |
strata |
( |
method |
( |
conf.level |
( |
delta |
( |
data |
( |
Details
The function implements the stratified Miettinen-Nurminen method to compute confidence intervals for the difference between two proportions across multiple strata.
H_0: \hat{d}-\delta <= 0 \qquad \text{vs.} \qquad H_1: \hat{d}-\delta > 0
The "score" method is a weighted MN score first described in the original 1985 paper. The formula is:
Calculates weights for each stratum as
w_i = \frac{n_{xi} \cdot n_{yi}}{n_{xi} + n_{yi}}
Computes the overall weighted difference
\hat{d} = \frac{\sum w_i \hat{p}_{xi}}{\sum w_i} - \frac{\sum w_i \hat{p}_{yi}}{\sum w_i}
Uses the stratified test statistic:
Z_{\delta} = \frac{\hat{d} - \delta} {\sqrt{\sum_{i=1}^k \left(\frac{w_i}{\sum w_i}\right)^2 \cdot \hat{\sigma}_{mn}^2({d})}}
Finds the range of all values of
\delta
for which the stratified test statistic (Z_\delta
) falls in the acceptance region\{ Z_\delta < z_{\alpha/2}\}
The \hat{\sigma}_{mn}^2(\hat{d})
is the Miettinen-Nurminen variance estimate.
See the details of ci_prop_diff_mn()
for how \hat{\sigma}_{mn}^2(\delta)
is calculated.
The "summary score" method follows the meta-analyses proposed in Agresti 2013 and is consistent with the "Summary Score Confidence Limits" method used in SAS. The formula is:
The point estimate of the stratified risk difference is a weighted average of the midpoints of the within-stratum MN confidence intervals:
\hat{d}_{\text{S}} = \sum_i \hat{d}_i w_i
Define
s_i
as the width of the CI for thei
th stratum divided by2 \times z_{\alpha/2}
and then stratum weights are given byw_i = \left( \frac{1}{s_i^2} \right) \bigg/ \sum_i \left( \frac{1}{s_i^2} \right)
The variance of
\hat{d}_{\text{S}}
is computed as\widehat{\text{Var}}(\hat{d}_{\text{S}}) = \frac{1}{\sum_i \left( \frac{1}{s_i^2} \right) }
Confidence limits for the stratified risk difference estimate are
\hat{d}_{\text{S}} \pm \left( z_{\alpha /2} \times \widehat{\text{Var}}(\hat{d}_{\text{S}}) \right)
Value
An object containing the following components:
estimate |
The point estimate of the difference in proportions (p_x - p_y) |
conf.low |
Lower bound of the confidence interval |
conf.high |
Upper bound of the confidence interval |
conf.level |
The confidence level used |
delta |
delta value(s) used |
statistic |
Z-Statistic under the null hypothesis based on the given 'delta' |
p.value |
p-value under the null hypothesis based on the given 'delta' |
method |
Description of the method used ("Stratified {method} Miettinen-Nurminen Confidence Interval") |
If delta
is not provided statistic and p.value will be NULL
References
Miettinen, O. S., & Nurminen, M. (1985). Comparative analysis of two rates. Statistics in Medicine, 4(2), 213-226.
Common Risk Difference :: Base SAS(R) 9.4 Procedures Guide: Statistical Procedures, Third Edition
Agresti, A. (2013). Categorical Data Analysis. 3rd Edition. John Wiley & Sons, Hoboken, NJ
Examples
# Generate binary samples with strata
responses <- expand(c(9, 3, 7, 2), c(10, 10, 10, 10))
arm <- rep(c("treat", "control"), 20)
strata <- rep(c("stratum1", "stratum2"), times = c(20, 20))
# Calculate stratified confidence interval for difference in proportions
ci_prop_diff_mn_strata(x = responses, by = arm, strata = strata)
# Using the summary score method
ci_prop_diff_mn_strata(x = responses, by = arm, strata = strata,
method = "summary score")
# Calculate 99% confidence interval
ci_prop_diff_mn_strata(x = responses, by = arm, strata = strata,
conf.level = 0.99)
# Calculate p-value under null hypothesis delta = 0.2
ci_prop_diff_mn_strata(x = responses, by = arm, strata = strata,
delta = 0.2)
Wald Confidence Interval for Difference in Proportions
Description
Calculates the Wald interval by following the usual textbook definition for a difference in proportions confidence interval using the normal approximation.
Usage
ci_prop_diff_wald(x, by, conf.level = 0.95, correct = FALSE, data = NULL)
Arguments
x |
( |
by |
( |
conf.level |
( |
correct |
( |
data |
( |
Details
(\hat{p}_1 - \hat{p}_2) \pm z_{\alpha/2}
\sqrt{\frac{\hat{p}_1(1 - \hat{p}_1)}{n_1}+\frac{\hat{p}_2(1 - \hat{p}_2)}{n_2}}
Value
An object containing the following components:
n |
Number of responses in each by group |
N |
Total number in each by group |
estimate |
The point estimate of the difference in proportions (p_1 - p_2) |
conf.low |
Lower bound of the confidence interval |
conf.high |
Upper bound of the confidence interval |
conf.level |
The confidence level used |
method |
Type of method used |
Examples
responses <- expand(c(9, 3), c(10, 10))
arm <- rep(c("treat", "control"), times = c(10, 10))
# Calculate 95% confidence interval for difference in proportions
ci_prop_diff_wald(x = responses, by = arm)
Jeffreys CI
Description
Calculates the Jeffreys interval, an equal-tailed interval based on the non-informative Jeffreys prior for a binomial proportion.
Usage
ci_prop_jeffreys(x, conf.level = 0.95, data = NULL)
Arguments
x |
( |
conf.level |
( |
data |
( |
Details
\left( \text{Beta}\left(\frac{k}{2} + \frac{1}{2}, \frac{n - k}{2} + \frac{1}{2}\right)_\alpha,
\text{Beta}\left(\frac{k}{2} + \frac{1}{2}, \frac{n - k}{2} + \frac{1}{2}\right)_{1-\alpha} \right)
Value
An object containing the following components:
n |
Number of responses |
N |
Total number |
estimate |
The point estimate of the proportion |
conf.low |
Lower bound of the confidence interval |
conf.high |
Upper bound of the confidence interval |
conf.level |
The confidence level used |
method |
Type of method used |
Wald CI
Description
Calculates the Wald interval by following the usual textbook definition for a single proportion confidence interval using the normal approximation.
Usage
ci_prop_wald(x, conf.level = 0.95, correct = FALSE, data = NULL)
Arguments
x |
( |
conf.level |
( |
correct |
( |
data |
( |
Details
\hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}
Value
An object containing the following components:
n |
Number of responses |
N |
Total number |
estimate |
The point estimate of the proportion |
conf.low |
Lower bound of the confidence interval |
conf.high |
Upper bound of the confidence interval |
conf.level |
The confidence level used |
method |
Type of method used |
Examples
# example code
x <- c(
TRUE, TRUE, TRUE, TRUE, TRUE,
FALSE, FALSE, FALSE, FALSE, FALSE
)
ci_prop_wald(x, conf.level = 0.9)
Wilson CI
Description
Calculates the Wilson interval by calling stats::prop.test()
.
Also referred to as Wilson score interval.
Usage
ci_prop_wilson(x, conf.level = 0.95, correct = FALSE, data = NULL)
Arguments
x |
( |
conf.level |
( |
correct |
( |
data |
( |
Details
\frac{\hat{p} +
\frac{z^2_{\alpha/2}}{2n} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1 - \hat{p})}{n} +
\frac{z^2_{\alpha/2}}{4n^2}}}{1 + \frac{z^2_{\alpha/2}}{n}}
Value
An object containing the following components:
n |
Number of responses |
N |
Total number |
estimate |
The point estimate of the proportion |
conf.low |
Lower bound of the confidence interval |
conf.high |
Upper bound of the confidence interval |
conf.level |
The confidence level used |
method |
Type of method used |
Stratified Wilson CI
Description
Calculates the stratified Wilson confidence interval for unequal proportions as described in Xin YA, Su XG. Stratified Wilson and Newcombe confidence intervals for multiple binomial proportions. Statistics in Biopharmaceutical Research. 2010;2(3).
Usage
ci_prop_wilson_strata(
x,
strata,
weights = NULL,
conf.level = 0.95,
max.iterations = 10L,
correct = FALSE,
data = NULL
)
Arguments
x |
( |
strata |
( |
weights |
( |
conf.level |
( |
max.iterations |
(positive |
correct |
(scalar |
data |
( |
Details
\frac{\hat{p}_j + \frac{z^2_{\alpha/2}}{2n_j} \pm
z_{\alpha/2} \sqrt{\frac{\hat{p}_j(1 - \hat{p}_j)}{n_j} +
\frac{z^2_{\alpha/2}}{4n_j^2}}}{1 + \frac{z^2_{\alpha/2}}{n_j}}
Value
An object containing the following components:
n |
Number of responses |
N |
Total number |
estimate |
The point estimate of the proportion |
conf.low |
Lower bound of the confidence interval |
conf.high |
Upper bound of the confidence interval |
conf.level |
The confidence level used |
weights |
Weights of each strata, will be the same as the input unless unspecified, then it will be the dynamically calculated weights. |
method |
Type of method used |
Examples
# Stratified Wilson confidence interval with unequal probabilities
set.seed(1)
rsp <- sample(c(TRUE, FALSE), 100, TRUE)
strata_data <- data.frame(
x = sample(c(TRUE, FALSE), 100, TRUE),
"f1" = sample(c("a", "b"), 100, TRUE),
"f2" = sample(c("x", "y", "z"), 100, TRUE),
stringsAsFactors = TRUE
)
strata <- interaction(strata_data)
n_strata <- ncol(table(rsp, strata)) # Number of strata
ci_prop_wilson_strata(
x = rsp, strata = strata,
conf.level = 0.90
)
# Not automatic setting of weights
ci_prop_wilson_strata(
x = rsp, strata = strata,
weights = rep(1 / n_strata, n_strata),
conf.level = 0.90
)
Function to combine strata via interaction if strata is passed as a vector
Description
Function to combine strata via interaction if strata is passed as a vector
Usage
combine_strata(x, strata)
Expand Count Data into Binary Vectors
Description
Converts count data (number of successes and total sample size) into a binary vector of TRUE/FALSE values. This is useful for converting summary statistics back into raw data format for analysis functions that require individual-level data.
Usage
expand(x, n)
Arguments
x |
Integer (or vector of integers) representing the number of successes. |
n |
Integer (or vector of integers) representing the total number of participants. |
Details
For each pair of values in x
and n
, the function creates a vector with x
TRUE values
followed by n-x
FALSE values. If multiple pairs are provided, the resulting vectors are
concatenated in order.
Value
A logical vector where TRUE represents a success and FALSE represents a failure. The length of the vector equals the sum of all sample sizes.
Examples
# Convert 4 successes out of 13 participants to binary data
expand(4, 13)
# Convert multiple groups of data
# Group 1: 9 successes out of 10
# Group 2: 3 successes out of 10
expand(c(9, 3), c(10, 10))
To get the n's and response totals with out without strata
Description
To get the n's and response totals with out without strata
Usage
get_counts(x, by, strata = 1)