Title: | Various Methods for the Two Sample Problem |
Version: | 4.1.0 |
Description: | The routine twosample_test() in this package runs the two sample test using various test statistic. The p values are found via permutation or large sample theory. The routine twosample_power() allows the calculation of the power in various cases, and plot_power() draws the corresponding power graphs. The routine run.studies allows a user to quickly study the power of a new method and how it compares to some of the standard ones. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
LinkingTo: | Rcpp |
Imports: | Rcpp, parallel, shiny, ggplot2, stats, graphics, microbenchmark |
Suggests: | rmarkdown, knitr, testthat (≥ 3.0.0) |
VignetteBuilder: | knitr |
Depends: | R (≥ 3.5) |
LazyData: | true |
NeedsCompilation: | yes |
Packaged: | 2025-06-16 18:11:44 UTC; Wolfgang |
Author: | Wolfgang Rolke |
Maintainer: | Wolfgang Rolke <wolfgang.rolke@upr.edu> |
Repository: | CRAN |
Date/Publication: | 2025-06-16 18:30:06 UTC |
R2sample: Various Methods for the Two Sample Problem
Description
The routine twosample_test() in this package runs the two sample test using various test statistic. The p values are found via permutation or large sample theory. The routine twosample_power() allows the calculation of the power in various cases, and plot_power() draws the corresponding power graphs. The routine run.studies allows a user to quickly study the power of a new method and how it compares to some of the standard ones.
Author(s)
Maintainer: Wolfgang Rolke wolfgang.rolke@upr.edu (ORCID)
sort vector y by values in vector x
Description
sort vector y by values in vector x
Usage
Cpporder(y, x)
Arguments
y |
numeric vector |
x |
numeric vector |
Value
numeric vector
find test statistics for continuous data
Description
find test statistics for continuous data
Usage
TS_cont(x, y)
Arguments
x |
first continuous data set |
y |
second continuous data set |
Value
A vector of test statistics
find test statistics for discrete data
Description
find test statistics for discrete data
Usage
TS_disc(x, y, vals, ADweights = as.numeric(c(2)))
Arguments
x |
integer vector of data set 1 |
y |
integer vector of data set 2 |
vals |
numeric vector of values of discrete data set |
ADweights |
A vector of weights for AD method |
Value
A vector of test statistics
find test statistics for continuous data with weights
Description
find test statistics for continuous data with weights
Usage
TSw_cont(x, y, wx, wy)
Arguments
x |
first continuous data set |
y |
second continuous data set |
wx |
weights of x |
wy |
weights of y |
Value
A vector of test statistics
Find test statistics for weighted discrete data
Description
Find test statistics for weighted discrete data
Usage
TSw_disc(x, y, vals, wx, wy)
Arguments
x |
integer vector of counts |
y |
integer vector of counts |
vals |
A numeric vector with the values of the discrete rv. |
wx |
integer vector of weights |
wy |
integer vector of weights |
Value
A vector with test statistics
This function finds the p values of several tests based on large sample theory
Description
This function finds the p values of several tests based on large sample theory
Usage
asymptotic_pvalues(x, n, m)
Arguments
x |
a vector of test statistics |
n |
size of sample 1 |
m |
size of sample 2 |
Value
A vector of p values.
find counts in bins. Useful for power calculations. Replaces hist command from R.
Description
find counts in bins. Useful for power calculations. Replaces hist command from R.
Usage
bincounter(x, bins)
Arguments
x |
numeric vector |
bins |
numeric vector |
Value
Integer vector of counts
This function calculates the test statistics for continuous data
Description
This function calculates the test statistics for continuous data
Usage
calcTS(dta, TS, typeTS, TSextra)
Arguments
dta |
data set |
TS |
routine |
typeTS |
format of TS |
TSextra |
list passed to TS function |
Value
A vector of numbers
This function creates the functions needed to run the various case studies.
Description
This function creates the functions needed to run the various case studies.
Usage
case.studies(which, nsample = 500)
Arguments
which |
name of the case study. |
nsample |
=500, sample size. |
Value
a list of functions
This function runs the chi-square test for continuous or discrete data
Description
This function runs the chi-square test for continuous or discrete data
Usage
chi_power(
rxy,
alpha = 0.05,
B = 1000,
xparam,
yparam,
nbins = c(50, 10),
minexpcount = 5,
typeTS
)
Arguments
rxy |
a function to generate data |
alpha |
=0.05 type I error probability of test |
B |
=1000 number of simulation runs |
xparam |
vector of parameter values |
yparam |
vector of parameter values |
nbins |
=c(50, 10) number of desired bins |
minexpcount |
=5 smallest number of counts required in each bin |
typeTS |
type of problem, continuous/discrete, with/without weights |
Value
A matrix of power values
This function runs the chi-square test for continuous or discrete data
Description
This function runs the chi-square test for continuous or discrete data
Usage
chi_test(dta, nbins = c(50, 10), minexpcount = 5, typeTS, ponly = FALSE)
Arguments
dta |
a list with two elements for continuous data or three elements for discrete data, Can also include weights for continuous data |
nbins |
=c(50, 10) number of desired bins |
minexpcount |
=5 smallest number of counts required in each bin |
typeTS |
=5 type of problem, continuous/discrete, with/without weights |
ponly |
Should the p value alone be returned? |
Value
A list with the test statistics, the p value and the degree of freedom for each test
simulate continuous data without weights
Description
simulate continuous data without weights
Usage
gen_cont_noweights(x, y, TSextra)
Arguments
x |
first data set |
y |
second data set |
TSextra |
extra stuff |
Value
A list of permuted vectors
simulate continuous data with weights
Description
simulate continuous data with weights
Usage
gen_cont_weights(x, y, wx, wy, TSextra)
Arguments
x |
first data set |
y |
second data set |
wx |
weights of first data set |
wy |
weights of second data set |
TSextra |
extra stuff |
Value
A list of permuted vectors
simulate new discrete data
Description
simulate new discrete data
Usage
gen_disc(dtax, dtay, vals, TSextra)
Arguments
dtax |
first data set, counts |
dtay |
second data set, counts |
vals |
values of discrete random variable |
TSextra |
extra stuff |
Value
A list of permuted vectors
simulate continuous data without weights
Description
simulate continuous data without weights
Usage
gen_sim_data(dta, TSextra)
Arguments
dta |
data set |
TSextra |
extra stuff |
Value
A list of permuted vectors
a local function needed for the vignette
Description
a local function needed for the vignette
Usage
myTS2(x, y, vals)
Arguments
x |
An integer vector. |
y |
An integer vector. |
vals |
A numeric vector with the values of the discrete rv. |
Value
A vector with test statistics
This function draws the power graph, with curves sorted by the mean power and smoothed for easier reading.
Description
This function draws the power graph, with curves sorted by the mean power and smoothed for easier reading.
Usage
plot_power(pwr, xname = " ", title = " ", Smooth = TRUE, span = 0.25)
Arguments
pwr |
a matrix of power values, usually from the twosample_power command |
xname |
Name of variable on x axis |
title |
(Optional) title of graph |
Smooth |
=TRUE lines are smoothed for easier reading |
span |
=0.25bandwidth of smoothing method |
Value
plt, an object of class ggplot.
Find the power of various continuous tests via simutation or permutation.
Description
Find the power of various continuous tests via simutation or permutation.
Usage
powerC(rxy, xparam, yparam, TS, typeTS, TSextra, B = 1000L)
Arguments
rxy |
a function that generates x and y data. |
xparam |
arguments for r1. |
yparam |
arguments for r2. |
TS |
routine to calculate test statistics for non-chi-square tests |
typeTS |
indicator for type of test statistics |
TSextra |
additional info passed to TS, if necessary |
B |
=1000 number of simulation runs |
Value
A list values of test statistics
Find the power of two sample tests using Rcpp and parallel computing.
Description
Find the power of two sample tests using Rcpp and parallel computing.
Usage
powerR(
rxy,
xparam,
yparam,
TS,
typeTS,
TSextra,
alpha = 0.05,
B = 1000,
SuppressMessages,
maxProcessor
)
Arguments
rxy |
function to generate a list with data sets x, y and (optional) vals, weights |
xparam |
first argument passed to rxy |
yparam |
second argument passed to rxy |
TS |
test statistic |
typeTS |
which format has TS? |
TSextra |
list of items passed TS |
alpha |
=0.05, the level of the hypothesis test |
B |
= 1000 number of simulation runs |
SuppressMessages |
= FALSE print informative messages? |
maxProcessor |
maximum number of cores to use. If maxProcessor=1 no parallel computing is used. |
Value
A numeric vector of power values.
Find the power of various discrete tests via permutation.
Description
Find the power of various discrete tests via permutation.
Usage
power_cont_LS(rxy, alpha = 0.05, B = 1000, xparam = 0, yparam = 0)
Arguments
rxy |
a function that generates x and y data. |
alpha |
A numeric constant |
B |
Number of simulation runs. |
xparam |
arguments for r1. |
yparam |
arguments for r2. |
Value
A numeric matrix of powers
Power for tests with p values
Description
This function estimates the power of test routines that calculate p value(s)
Usage
power_newtest(TS, f, param_alt, TSextra, alpha = 0.05, B = 1000)
Arguments
TS |
routine to calculate test statistics. |
f |
routine that generates data. |
param_alt |
values of parameter under the alternative hypothesis. |
TSextra |
list passed to TS. |
alpha |
=0.05 type I error. |
B |
= 1000 number of simulation runs to estimate the power. |
Value
A matrix of power values
power_studies_results
Description
the results of the included power studies
Usage
power_studies_results
Format
'power_studies_results'
A list of matrices with powers
pvaluecdf
Description
data to draw a graph in vignette
Usage
pvaluecdf
Format
'pvaluecdf'
A matrix
cpp version of R routine rep
Description
cpp version of R routine rep
Usage
repC(x, times)
Arguments
x |
numeric vector |
times |
integer vector |
Value
A numeric vector
Power Comparisons
Description
This function runs the case studies included in the package and compares the power of a new test to those included.
Usage
run.studies(
TS,
study,
TSextra,
With.p.value = FALSE,
BasicComparison = TRUE,
nsample = 500,
alpha = 0.05,
param_alt,
maxProcessor,
SuppressMessages = FALSE,
B = 1000
)
Arguments
TS |
routine to calculate test statistics. |
study |
either the name of the study, or its number. If missing all the studies are run. |
TSextra |
list passed to TS. |
With.p.value |
=FALSE does user supplied routine return p values? |
BasicComparison |
=TRUE if true compares tests on one default value of parameter of the alternative distribution. |
nsample |
= 500, desired sample size. |
alpha |
=0.05 type I error |
param_alt |
(list of) values of parameter under the alternative hypothesis. If missing included values are used. |
maxProcessor |
number of cores to use for parallel programming |
SuppressMessages |
= FALSE print informative messages? |
B |
= 1000 |
Details
For details consult vignette("R2sample","R2sample")
Value
A (list of ) matrices of power values.
Examples
#The new test is a simple chisquare test:
chitest = function(x, y, TSextra) {
nbins=TSextra$nbins
nx=length(x);ny=length(y);n=nx+ny
xy=c(x,y)
bins=quantile(xy, (0:nbins)/nbins)
Ox=hist(x, bins, plot=FALSE)$counts
Oy=hist(y, bins, plot=FALSE)$counts
tmp=sqrt(sum(Ox)/sum(Oy))
chi = sum((Ox/tmp-Oy*tmp)^2/(Ox+Oy))
pval=1-pchisq(chi, nbins-1)
out=ifelse(TSextra$statistic,chi,pval)
names(out)="ChiSquare"
out
}
TSextra=list(nbins=5,statistic=FALSE) # Use 5 bins and calculate p values
run.studies(chitest,TSextra=TSextra, With.p.value=TRUE, B=100)
Runs the shiny app associated with R2sample package
Description
Runs the shiny app associated with R2sample package
Usage
run_shiny()
Value
No return value, called for side effect of opening a shiny app
This function does some rounding to nice numbers
Description
This function does some rounding to nice numbers
Usage
## S3 method for class 'digits'
signif(x, d = 4)
Arguments
x |
a list of two vectors |
d |
=4 number of digits to round to |
Value
A list with rounded vectors
run test using either simulation or permutation.
Description
run test using either simulation or permutation.
Usage
testC(dta, TS, typeTS, TSextra, B = 5000L)
Arguments
dta |
a list with the data |
TS |
routine to calculate test statistics for non-chi-square tests |
typeTS |
type of a test statistic |
TSextra |
additional info passed to TS, if necessary |
B |
=5000, number of simulation runs. |
Value
A list with test statistics and p values
This function checks whether the correct methods have been requested
Description
This function checks whether the correct methods have been requested
Usage
test_methods(doMethods, Continuous, UseLargeSample, WithWeights)
Arguments
doMethods |
="all" Which methods should be included? |
Continuous |
is data continuous |
UseLargeSample |
should p values be found via large sample theory? |
WithWeights |
with weights? |
Value
TRUE or FALSE
test function
Description
test function
Usage
timecheck(dta, TS, typeTS, TSextra)
Arguments
dta |
data set |
TS |
test statistics |
typeTS |
format of TS |
TSextra |
additional info TS |
Value
Mean computation time
Power estimation for two-sample methods
Description
Find the power of various two sample tests using Rcpp and parallel computing.
Usage
twosample_power(
f,
...,
TS,
TSextra,
With.p.value = FALSE,
alpha = 0.05,
B = 1000,
nbins = c(50, 10),
minexpcount = 5,
UseLargeSample,
samplingmethod = "Binomial",
rnull,
SuppressMessages = FALSE,
maxProcessor
)
Arguments
f |
function to generate a list with data sets x, y and (optional) vals, weights |
... |
additional arguments passed to f, up to 2 |
TS |
routine to calculate test statistics for non-chi-square tests |
TSextra |
additional info passed to TS, if necessary |
With.p.value |
=FALSE does user supplied routine return p values? |
alpha |
=0.05, the level of the hypothesis test |
B |
=1000, number of simulation runs. |
nbins |
=c(50,10), number of bins for chi large and chi small. |
minexpcount |
=5 minimum required count for chi square tests |
UseLargeSample |
should p values be found via large sample theory if n,m>10000? |
samplingmethod |
="Binomial" or independence in discrete data case |
rnull |
a function that generates data from a model, possibly with parameter estimation. |
SuppressMessages |
= FALSE print informative messages? |
maxProcessor |
maximum number of cores to use. If maxProcessor=1 no parallel computing is used. |
Details
For details consult vignette("R2sample","R2sample")
This routine runs a number of different two-sample tests for univariate data, either discrete or continuous. The user can also provide their own test method.
Value
A numeric vector of power values.
Examples
# Power of standard normal vs. normal with mean mu.
f1=function(mu) list(x=rnorm(25), y=rnorm(25, mu))
#Power of uniform discrete distribution vs. with different probabilities.
twosample_power(f1, mu=c(0,2), B=100, maxProcessor = 1)
f2=function(n, p) list(x=table(sample(1:5, size=1000, replace=TRUE)),
y=table(sample(1:5, size=n, replace=TRUE,
prob=c(1, 1, 1, 1, p))), vals=1:5)
twosample_power(f2, n=c(1000, 2000), p=c(1, 1.5), B=100, maxProcessor = 1)
# Compare power of a new test with those in package:
myTS=function(x,y) {z=c(mean(x)-mean(y),sd(x)-sd(y));names(z)=c("M","S");z}
cbind(twosample_power(f1, mu=c(0,2), TS=myTS,B=100, maxProcessor = 1),
twosample_power(f1, mu=c(0,2), B=100, maxProcessor = 1))
# Power estimation if routine returns a p value
myTS2=function(x, y) {out=ks.test(x,y)$p.value; names(out)="KSp"; out}
twosample_power(f1, c(0,1), TS=myTS2, With.p.value = TRUE, B=100)
Tests for the univariate two-sample problem
Description
This function runs a number of two sample tests using Rcpp and parallel computing.
Usage
twosample_test(
x,
y,
vals = NA,
TS,
TSextra,
wx = rep(1, length(x)),
wy = rep(1, length(y)),
B = 5000,
nbins = c(50, 10),
minexpcount = 5,
maxProcessor,
UseLargeSample,
samplingmethod = "Binomial",
rnull,
SuppressMessages = FALSE,
doMethods = "all"
)
Arguments
x |
a vector of numbers if data is continuous or of counts if data is discrete or a list with the data |
y |
a vector of numbers if data is continuous or of counts if data is discrete. |
vals |
=NA, a vector of numbers, the values of a discrete random variable. NA if data is continuous data. |
TS |
routine to calculate test statistics for non-chi-square tests |
TSextra |
additional info passed to TS, if necessary |
wx |
A numeric vector of weights of x. |
wy |
A numeric vector of weights of y. |
B |
=5000, number of simulation runs for permutation test |
nbins |
=c(50,10), number of bins for chi square tests. |
minexpcount |
=5, minimum required expected counts for chi-square tests. |
maxProcessor |
maximum number of cores to use. If missing (the default) no parallel processing is used. |
UseLargeSample |
should p values be found via large sample theory if n,m>10000? |
samplingmethod |
="Binomial" or "independence" for discrete data |
rnull |
a function that generates data from a model, possibly with parameter estimation. |
SuppressMessages |
= FALSE print informative messages? |
doMethods |
="all" a vector of codes for the methods to include. If "all", all methods are used. |
Details
For details consult vignette("R2sample","R2sample")
Value
A list of two numeric vectors, the test statistics and the p values.
Examples
R2sample::twosample_test(rnorm(1000), rt(1000, 4), B=1000)
myTS=function(x,y) {z=c(mean(x)-mean(y),sd(x)-sd(y));names(z)=c("M","S");z}
R2sample::twosample_test(rnorm(1000), rt(1000, 4), TS=myTS, B=1000)
vals=1:5
x=table(sample(vals, size=100, replace=TRUE))
y=table(sample(vals, size=100, replace=TRUE, prob=c(1,1,3,1,1)))
R2sample::twosample_test(x, y, vals)
Adjusted p values for simultaneous testing in the two-sample problem.
Description
This function runs a number of two sample tests using Rcpp and parallel computing and then finds the correct p value for the combined tests.
Usage
twosample_test_adjusted_pvalue(
x,
y,
vals = NA,
TS,
TSextra,
wx = rep(1, length(x)),
wy = rep(1, length(y)),
B = c(5000, 1000),
nbins = c(50, 10),
minexpcount = 5,
samplingmethod = "independence",
rnull,
SuppressMessages = FALSE,
doMethods
)
Arguments
x |
a vector of numbers if data is continuous or of counts if data is discrete, or a list with the data. |
y |
a vector of numbers if data is continuous or of counts if data is discrete. |
vals |
=NA, a vector of numbers, the values of a discrete random variable. NA if data is continuous data. |
TS |
routine to calculate test statistics for non-chi-square tests |
TSextra |
additional info passed to TS, if necessary |
wx |
A numeric vector of weights of x. |
wy |
A numeric vector of weights of y. |
B |
=c(5000, 1000), number of simulation runs for permutation test |
nbins |
=c(50,10), number of bins for chi square tests. |
minexpcount |
= 5, minimum required expected counts for chi-square tests |
samplingmethod |
="independence" or "Binomial" for discrete data |
rnull |
routine for parametric bootstrap |
SuppressMessages |
= FALSE print informative messages? |
doMethods |
="all" a vector of codes for the methods to include. If "all", all methods are used. |
Details
For details consult vignette("R2sample","R2sample")
Value
A list of two numeric vectors, the test statistics and the p values.
Examples
x=rnorm(100)
y=rt(200, 4)
R2sample::twosample_test_adjusted_pvalue(x, y, B=c(500, 500))
vals=1:5
x=table(c(1:5, sample(1:5, size=100, replace=TRUE)))-1
y=table(c(1:5, sample(1:5, size=100, replace=TRUE, prob=c(1,1,3,1,1))))-1
R2sample::twosample_test_adjusted_pvalue(x, y, vals, B=c(500, 500))
Find counts and/or sum of weights in bins. Useful for power calculations. Replaces hist command from R.
Description
Find counts and/or sum of weights in bins. Useful for power calculations. Replaces hist command from R.
Usage
wbincounter(x, bins, w)
Arguments
x |
numeric vector |
bins |
numeric vector |
w |
numeric vector of weights |
Value
sum of weights in bins
find weights for several statistics for discrete data
Description
find weights for several statistics for discrete data
Usage
weights(dta)
Arguments
dta |
A list with vectors x, y and vals. |
Value
A vector of weights