% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/run-hierarchy.R
\name{run_hierarchy}
\alias{run_hierarchy}
\title{Hierarchical Testing}
\usage{
run_hierarchy(
  x,
  y,
  dendr,
  test.func,
  clvar = NULL,
  arg.all = list(NULL),
  arg.all.fix = NULL,
  compMOD.same = function(...) NULL,
  compMOD.changing = function(...) NULL,
  alpha = 0.05,
  global.test = TRUE,
  hier.adj = FALSE,
  mt.adj = c("dpBF", "none"),
  agg.method = c("Tippett", "Stouffer"),
  verbose = FALSE,
  sort.parallel = TRUE,
  parallel = c("no", "multicore", "snow"),
  ncpus = 1L,
  cl = NULL
)
}
\arguments{
\item{x}{a matrix or list of matrices for multiple data sets. The matrix or
matrices have to be of type numeric and are required to have column names
/ variable names. The rows and the columns represent the observations and
the variables, respectively.}

\item{y}{a vector, a matrix with one column, or list of the aforementioned
objects for multiple data sets. The vector, vectors, matrix, or matrices
have to be of type numeric.}

\item{dendr}{the output of one of the functions
\code{\link{cluster_vars}} or \code{\link{cluster_positions}}.}

\item{test.func}{a test function for groups of variables. See the 'Details' 
and 'Examples' sections.}

\item{clvar}{a matrix or list of matrices of control variables.}

\item{arg.all}{a list with arguments which is passed on to 
each call of the functions inputted to the arguments \code{test.func},
\code{compMOD.same}, and \code{compMOD.changing}. See the 'Details' section.}

\item{arg.all.fix}{a vector or list with arguments which is passed on to 
each call of the functions inputted to the arguments \code{test.func},
\code{compMOD.same}, and \code{compMOD.changing}. See the 'Details' section.}

\item{compMOD.same}{a function which is called once at the beginning of 
the hierarchical testing procedure and its output is passed on to each
call of \code{test.func}. See the 'Details' section.}

\item{compMOD.changing}{a function which is called once at the beginning of 
the hierarchical testing procedure and its purpose is to serve as a 
cache or memory if some output does not have to be recalculated for 
multiple group tests but the cache can be updated while testing. 
See the 'Details' section.}

\item{alpha}{the significant level at which the FWER is controlled.}

\item{global.test}{a logical value indicating whether the global test should
be performed.}

\item{hier.adj}{a logical value indicating whether the p-values can only 
increase when going down some given branch. Strong FWER control holds as 
well if the argument is set to FALSE which is the default option.}

\item{mt.adj}{type of multiple testing correction to be used; 
either \code{"dpBF"} (depth-wise Bonferroni multiple adjustment) or \code{"none"}
(no adjustment). See the 'Details' section.}

\item{agg.method}{a character string naming an aggregation method which
aggregates the p-values over the different data sets for a given cluster;
either \code{"Tippett"} (Tippett's rule) or \code{"Stouffer"}
(Stouffer's rule). This argument is only relevant if multiple data sets
are specified in the function call.}

\item{verbose}{a logical value indicating whether the progress of the computation
should be printed in the console.}

\item{sort.parallel}{a logical indicating whether the blocks should be sorted with respect to
the size of the block. This can reduce the run time for parallel computation.}

\item{parallel}{type of parallel computation to be used. See the 'Details' section.}

\item{ncpus}{number of processes to be run in parallel.}

\item{cl}{an optional \strong{parallel} or \strong{snow} cluster used if
\code{parallel = "snow"}. If not supplied, a cluster on the local machine is created.}
}
\value{
The returned value is an object of class \code{"hierBase"}, consisting 
a data.frame with the result of the hierarchical testing.

The data.frame has the following columns:
\item{block}{\code{NA} or the name of the block if the significant cluster
is a subcluster of the block or is the block itself.}
\item{p.value}{The p-value of the significant cluster.}
\item{significant.cluster}{The column names of the members of the significant
cluster.}

There is a \code{print} method for this class; see
\code{\link{print.hierBase}}.
}
\description{
Hierarchical testing for a given user-specified test function.
}
\details{
Hierarchical testing is performed by going top down through the 
hierarchical tree. Testing in some branch only continues if at least one 
child of a given cluster is significant. The function \code{\link{run_hierarchy}} 
requires the output of one of the functions \code{\link{cluster_vars}} or
\code{\link{cluster_positions}} as an input (argument \code{dendr}).

The user can specify which hierarchical multiple testing adjustment for the 
hierarchical procedure is applied. The default is \code{"dpBF"} (depth-wise 
Bonferroni multiple adjustment). Alternatively, the user can choose \code{"none"} 
(no adjustment). The hierarchical multiple testing adjustment \code{"dpBF"} 
guarantees strong family-wise error control if the group test, which is applied 
for testing a given group, controls the type I error. 

The group test function has to be specified by the user; see argument 
\code{test.func}. It is required to have the following 
arguments: \code{x, y, clvar, colnames.cluster, arg.all, 
arg.all.fix, mod.large, mod.small}. Although it is fine if not all of them 
are used inside the function. Most of the arguments are described in the 
list of arguments above for the function \code{run_hierarchy} since they 
are as well specified by the user. The argument \code{colnames.cluster} contains 
the column names of the group of variables which should be tested. The arguments 
\code{mod.large} and \code{mod.small} are used as a cache or memory since 
one can often reuse, say, one initial lasso fit for all the subsequent 
group tests. The argument \code{mod.large} corresponds to some result of 
some initial calculation which is reused later. The argument \code{mod.small} 
can be updated continuously through the hierarchical sequential testing 
procedure. The test function is required to output or return a list with 
two elements, i.e. the p-value ("pval") calculated for the given group and 
some object ("mod.small"), which serves as a cache or memory and is passed on 
to the next function calls in the same branch; see example below. The latter 
can just be \code{NULL} if this feature is not used. Compare as well with the 
above arguments \code{compMOD.same} and \code{compMOD.changing} for the function 
\code{run_hierarchy}, and see the example below. 

If the argument \code{block} was supplied for the building
of the hierarchical tree (i.e. in the function call of either
\code{\link{cluster_vars}} or
\code{\link{cluster_positions}}), i.e. the second level of the
hierarchical tree was given, the hierarchical testing step can be run in
parallel across the different blocks by specifying the arguments
\code{parallel} and \code{ncpus}. There is an optional argument \code{cl} if
\code{parallel = "snow"}. There are three possibilities to set the
argument \code{parallel}: \code{parallel = "no"} for serial evaluation
(default), \code{parallel = "multicore"} for parallel evaluation
using forking, and \code{parallel = "snow"} for parallel evaluation
using a parallel socket cluster. It is recommended to select
\code{\link{RNGkind}("L'Ecuyer-CMRG")} and set a seed to ensure that
the parallel computing of the package \code{hierbase} is reproducible.
This way each processor gets a different substream of the pseudo random
number generator stream which makes the results reproducible if the arguments
(as \code{sort.parallel} and \code{ncpus}) remain unchanged. See the vignette
or the reference for more details.

Note that if Tippett's aggregation method is applied for multiple data
sets, then very small p-values are set to machine precision. This is
due to rounding in floating point arithmetic.
}
\examples{
# low-dimensional example with user specified test function
n <- 200
p <- 100
library(MASS)
set.seed(3)
x <- mvrnorm(n, mu = rep(0, p), Sigma = diag(p))
colnames(x) <- paste0("Var", 1:p)
beta <- rep(0, p)
beta[c(5, 20, 46)] <- 1
y <- x \%*\% beta + rnorm(n)

dendr1 <- cluster_vars(x = x)

# Define own test function: partial F-test
# Many of the arguments of the function below might not be 
# used but the function is required to have those arguments.
my.test <- function(x, y, clvar, colnames.cluster, arg.all, 
                    arg.all.fix, mod.large, mod.small) {
                      
                       # generate design matrices
                       data.large <- cbind(clvar, x)
                       setdiff.cluster <- setdiff(colnames(x), colnames.cluster)
                       data.small <- cbind(clvar, x[, setdiff.cluster]) 
                       # Replace data.small if set of indices setdiff.cluster is empty.
                       if (ncol(data.small) == 0) {data.small <- rep(1, length(y))}
                       
                       # compare the models using a partial F test
                       pval <- anova(lm(y ~ data.small),
                                     lm(y ~ data.large),
                                     test = "F")$P[2]
                       
                       return(list("pval" = pval, "mod.small" = NULL))
                    }

# run hierarchical testing
set.seed(76)
sign.clusters1 <- run_hierarchy(x = x, y = y, dendr = dendr1,
                                test.func = my.test)

## With block
# I.e. second level of the hierarchical tree is specified by 
# the user. This would allow to run the code in parallel; see the 'Details'
# section.
# The column names of the data frame block are optional.
block <- data.frame("var.name" = paste0("Var", 1:p),
                    "block" = rep(c(1, 2), each = p/2))
dendr2 <- cluster_vars(x = x, block = block)
set.seed(76)
sign.clusters2 <- run_hierarchy(x = x, y = y, dendr = dendr2,
                                test.func = my.test)

# Access part of the return object or result
sign.clusters2[, "block"]
sign.clusters2[, "p.value"]
# Column names or variable names of the significant cluster in the first row.
sign.clusters2[[1, "significant.cluster"]]

}
\references{
Meinshausen, N. (2008). Hierarchical testing of variable importance. 
Biometrika, 95(2), 265-278.
Renaux, C., Buzdugan, L., Kalisch, M., and Bühlmann, P. (2020). Hierarchical inference for 
genome-wide association studies: a view on methodology with software. 
Computational Statistics, 35(1), 1-40.
}
\seealso{
\code{\link{cluster_vars}}, \code{\link{cluster_positions}}, 
and \code{\link{advance_hierarchy}}.
}
