Type: | Package |
Title: | Multi-Layer Group-Lasso |
Version: | 1.0.0 |
Date: | 2023-03-15 |
Copyright: | Inria |
Description: | It implements a new procedure of variable selection in the context of redundancy between explanatory variables, which holds true with high dimensional data (Grimonprez et al. (2023) <doi:10.18637/jss.v106.i03>). |
BugReports: | https://github.com/modal-inria/MLGL/issues |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Imports: | gglasso, MASS, Matrix, fastcluster, FactoMineR, parallelDist |
RoxygenNote: | 7.2.3 |
Encoding: | UTF-8 |
NeedsCompilation: | no |
Packaged: | 2023-03-15 10:09:58 UTC; quentin |
Author: | Quentin Grimonprez [aut, cre], Samuel Blanck [ctb], Alain Celisse [ths], Guillemette Marot [ths], Yi Yang [ctb], Hui Zou [ctb] |
Maintainer: | Quentin Grimonprez <quentingrim@yahoo.fr> |
Repository: | CRAN |
Date/Publication: | 2023-03-15 12:50:05 UTC |
MLGL
Description
This package presents a method combining Hierarchical Clustering and Group-lasso. Usually, a single partition of the covariates is used in the group-lasso. Here, we provide several partitions from the hierarchical tree.
A post-treatment method based on statistical test (with FWER and FDR control) for selecting the regularization parameter and the optimal group for this value is provided. This method can be applied for the classical group-lasso and our method.
Details
The MLGL function performs the hierarchical clustering and the group-lasso. The post-treatment method can be performed with hierarchicalFWER and selFWER functions. The whole process can be run with the fullProcess function.
Author(s)
Quentin Grimonprez
References
Grimonprez Q, Blanck S, Celisse A, Marot G (2023). "MLGL: An R Package Implementing Correlated Variable Selection by Hierarchical Clustering and Group-Lasso." Journal of Statistical Software, 106(3), 1-33. doi:10.18637/jss.v106.i03.
See Also
MLGL, cv.MLGL, fullProcess, hierarchicalFWER
Examples
# Simulate gaussian data with block-diagonal variance matrix containing 12 blocks of size 5
X <- simuBlockGaussian(50, 12, 5, 0.7)
# Generate a response variable
y <- X[, c(2, 7, 12)] %*% c(2, 2, -2) + rnorm(50, 0, 0.5)
# Apply MLGL method
res <- MLGL(X, y)
F-test
Description
Perform a F-test
Usage
Ftest(X, y, varToTest)
Arguments
X |
design matrix of size n*p |
y |
response vector of length n |
varToTest |
vector containing the index of the column of X to test |
Details
y = X * beta + epsilon
null hypothesis: beta[varToTest] = 0 alternative hypothesis: it exists an index k in varToTest such that beta[k] != 0
The test statistic is based on a full and a reduced model. full: y = X * beta[varToTest] + epsilon reduced: the null model
Value
a vector of the same length as varToTest containing the p-values of the test.
See Also
Hierarchical Multiple Testing procedure
Description
Apply Hierarchical Multiple Testing procedure on a MLGL
object
Usage
HMT(
res,
X,
y,
control = c("FWER", "FDR"),
alpha = 0.05,
test = partialFtest,
addRoot = FALSE,
Shaffer = FALSE,
...
)
Arguments
res |
|
X |
matrix of size n*p |
y |
vector of size n. |
control |
either "FDR" or "FWER" |
alpha |
control level for testing procedure |
test |
test used in the testing procedure. Default is partialFtest |
addRoot |
If TRUE, add a common root containing all the groups |
Shaffer |
If TRUE, a Shaffer correction is performed (only if control = "FWER") |
... |
extra parameters for selFDR |
Value
a list containing:
- lambdaOpt
lambda values maximizing the number of rejects
- var
A vector containing the index of selected variables for the first
lambdaOpt
value- group
A vector containing the values index of selected groups for the first
lambdaOpt
value- selectedGroups
Selected groups for the first
lambdaOpt
value- indLambdaOpt
indices associated with optimal lambdas
- reject
Selected groups for all lambda values
- alpha
Control level
- test
Test used in the testing procedure
- control
"FDR" or "FWER"
- time
Elapsed time
- hierTest
list containing the output of the testing function for each lambda. Each element can be used with the selFWER or selFDR functions.
- lambda
lambda path
- nGroup
Number of groups before testing
- nSelectedGroup
Numer of groups after testing
See Also
hierarchicalFWER hierarchicalFDR selFWER selFDR
Examples
set.seed(42)
X <- simuBlockGaussian(50, 12, 5, 0.7)
y <- X[, c(2, 7, 12)] %*% c(2, 2, -2) + rnorm(50, 0, 0.5)
res <- MLGL(X, y)
# perform hierarchical testing with FWER control
out <- HMT(res, X, y, alpha = 0.05)
# test a new value of alpha for a specific lambda
selFWER(out$hierTest[[60]], alpha = 0.1)
Multi-Layer Group-Lasso
Description
Run hierarchical clustering following by a group-lasso on all the different partitions.
Usage
MLGL(X, ...)
## Default S3 method:
MLGL(
X,
y,
hc = NULL,
lambda = NULL,
weightLevel = NULL,
weightSizeGroup = NULL,
intercept = TRUE,
loss = c("ls", "logit"),
sizeMaxGroup = NULL,
verbose = FALSE,
...
)
## S3 method for class 'formula'
MLGL(
formula,
data,
hc = NULL,
lambda = NULL,
weightLevel = NULL,
weightSizeGroup = NULL,
intercept = TRUE,
loss = c("ls", "logit"),
verbose = FALSE,
...
)
Arguments
X |
matrix of size n*p |
... |
Others parameters for |
y |
vector of size n. If loss = "logit", elements of y must be in -1,1 |
hc |
output of |
lambda |
lambda values for group lasso. If not provided, the function generates its own values of lambda |
weightLevel |
a vector of size p for each level of the hierarchy. A zero indicates that the level will be ignored.
If not provided, use 1/(height between 2 successive levels). Only if |
weightSizeGroup |
a vector of size 2*p-1 containing the weight for each group.
Default is the square root of the size of each group. Only if |
intercept |
should an intercept be included in the model ? |
loss |
a character string specifying the loss function to use, valid options are: "ls" least squares loss (regression) and "logit" logistic loss (classification) |
sizeMaxGroup |
maximum size of selected groups. If NULL, no restriction |
verbose |
print some information |
formula |
an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. |
data |
an optional data.frame, list or environment (or object coercible by as.data.frame to a data.frame) containing the variables in the model. If not found in data, the variables are taken from environment (formula) |
Value
a MLGL object containing:
- lambda
lambda values
- b0
intercept values for
lambda
- beta
A list containing the values of estimated coefficients for each values of
lambda
- var
A list containing the index of selected variables for each values of
lambda
- group
A list containing the values index of selected groups for each values of
lambda
- nVar
A vector containing the number of non zero coefficients for each values of
lambda
- nGroup
A vector containing the number of non zero groups for each values of
lambda
- structure
A list containing 3 vectors. var: all variables used. group: associated groups. weight: weight associated with the different groups. level: for each group, the corresponding level of the hierarchy where it appears and disappears. 3 indicates the level with a partition of 3 groups.
- time
computation time
- dim
dimension of
X
- hc
Output of hierarchical clustering
- call
Code executed by user
Author(s)
Quentin Grimonprez
See Also
cv.MLGL, stability.MLGL, listToMatrix, predict.MLGL, coef.MLGL, plot.cv.MLGL
Examples
set.seed(42)
# Simulate gaussian data with block-diagonal variance matrix containing 12 blocks of size 5
X <- simuBlockGaussian(50, 12, 5, 0.7)
# Generate a response variable
y <- X[, c(2, 7, 12)] %*% c(2, 2, -2) + rnorm(50, 0, 0.5)
# Apply MLGL method
res <- MLGL(X, y)
Hierarchical Clustering with distance matrix computed using bootstrap replicates
Description
Hierarchical Clustering with distance matrix computed using bootstrap replicates
Usage
bootstrapHclust(X, frac = 1, B = 50, method = "ward.D2", nCore = NULL)
Arguments
X |
data |
frac |
fraction of sample used at each replicate |
B |
number of replicates |
method |
desired method: "single", "complete", "average", "mcquitty", "ward.D", "ward.D2", "centroid", "median". |
nCore |
number of cores |
Value
An object of class hclust
Examples
hc <- bootstrapHclust(USArrests, nCore = 1)
Get coefficients from a MLGL
object
Description
Get coefficients from a MLGL
object
Usage
## S3 method for class 'MLGL'
coef(object, s = NULL, ...)
Arguments
object |
|
s |
values of lambda. If NULL, use values from object |
... |
Not used. Other arguments to predict. |
Value
A matrix with estimated coefficients for given values of s.
Author(s)
Quentin Grimonprez
See Also
Get coefficients from a cv.MLGL
object
Description
Get coefficients from a cv.MLGL
object
Usage
## S3 method for class 'cv.MLGL'
coef(object, s = c("lambda.1se", "lambda.min"), ...)
Arguments
object |
|
s |
Either "lambda.1se" or "lambda.min" |
... |
Not used. Other arguments to predict. |
Value
A matrix with estimated coefficients for given values of s.
Author(s)
Quentin Grimonprez
See Also
Compute the group size weight vector with an authorized maximal size
Description
Compute the group size weight vector with an authorized maximal size
Usage
computeGroupSizeWeight(hc, sizeMax = NULL)
Arguments
hc |
output of hclust |
sizeMax |
maximum size of cluster to consider |
Value
the weight vector
Examples
set.seed(42)
# Simulate gaussian data with block-diagonal variance matrix containing 12 blocks of size 5
X <- simuBlockGaussian(50, 12, 5, 0.7)
# Generate a response variable
y <- X[, c(2, 7, 12)] %*% c(2, 2, -2) + rnorm(50, 0, 0.5)
# use 20 as the maximal number of group
hc <- hclust(dist(t(X)))
w <- computeGroupSizeWeight(hc, sizeMax = 20)
# Apply MLGL method
res <- MLGL(X, y, hc = hc, weightSizeGroup = w)
Multi-Layer Group-Lasso with cross V-fold validation
Description
V-fold cross validation for MLGL
function
Usage
cv.MLGL(
X,
y,
nfolds = 5,
lambda = NULL,
hc = NULL,
weightLevel = NULL,
weightSizeGroup = NULL,
loss = c("ls", "logit"),
intercept = TRUE,
sizeMaxGroup = NULL,
verbose = FALSE,
...
)
Arguments
X |
matrix of size n*p |
y |
vector of size n. If loss = "logit", elements of y must be in -1,1 |
nfolds |
number of folds |
lambda |
lambda values for group lasso. If not provided, the function generates its own values of lambda |
hc |
output of |
weightLevel |
a vector of size p for each level of the hierarchy. A zero indicates that the level will be ignored. If not provided, use 1/(height between 2 successive levels) |
weightSizeGroup |
a vector |
loss |
a character string specifying the loss function to use, valid options are: "ls" least squares loss (regression) and "logit" logistic loss (classification) |
intercept |
should an intercept be included in the model ? |
sizeMaxGroup |
maximum size of selected groups. If NULL, no restriction |
verbose |
print some informations |
... |
Others parameters for |
Details
Hierarchical clustering is performed with all the variables. Then, the partitions from the different levels of the hierarchy are used in the different run of MLGL for cross validation.
Value
a cv.MLGL object containing:
- lambda
values of
lambda
.- cvm
the mean cross-validated error.
- cvsd
estimate of standard error of
cvm
- cvupper
upper curve =
cvm+cvsd
- cvlower
lower curve =
cvm-cvsd
- lambda.min
The optimal value of
lambda
that gives minimum cross validation errorcvm
.- lambda.1se
The largest value of
lambda
such that error is within 1 standard error of the minimum.- time
computation time
Author(s)
Quentin Grimonprez
See Also
MLGL, stability.MLGL, predict.cv.gglasso, coef.cv.MLGL, plot.cv.MLGL
Examples
set.seed(42)
# Simulate gaussian data with block-diagonal variance matrix containing 12 blocks of size 5
X <- simuBlockGaussian(50, 12, 5, 0.7)
# Generate a response variable
y <- X[, c(2, 7, 12)] %*% c(2, 2, -2) + rnorm(50, 0, 0.5)
# Apply cv.MLGL method
res <- cv.MLGL(X, y)
Full process of MLGL
Description
Run hierarchical clustering following by a group-lasso on all the different partition and a hierarchical testing procedure. Only for linear regression problem.
Usage
fullProcess(X, ...)
## Default S3 method:
fullProcess(
X,
y,
control = c("FWER", "FDR"),
alpha = 0.05,
test = partialFtest,
hc = NULL,
fractionSampleMLGL = 1/2,
BHclust = 50,
nCore = NULL,
addRoot = FALSE,
Shaffer = FALSE,
...
)
## S3 method for class 'formula'
fullProcess(
formula,
data,
control = c("FWER", "FDR"),
alpha = 0.05,
test = partialFtest,
hc = NULL,
fractionSampleMLGL = 1/2,
BHclust = 50,
nCore = NULL,
addRoot = FALSE,
Shaffer = FALSE,
...
)
Arguments
X |
matrix of size n*p |
... |
Others parameters for MLGL |
y |
vector of size n. |
control |
either "FDR" or "FWER" |
alpha |
control level for testing procedure |
test |
test used in the testing procedure. Default is partialFtest |
hc |
output of |
fractionSampleMLGL |
a real between 0 and 1: the fraction of individuals to use in the sample for MLGL (see Details). |
BHclust |
number of replicates for computing the distance matrix for the hierarchical clustering tree |
nCore |
number of cores used for distance computation. Use all cores by default. |
addRoot |
If TRUE, add a common root containing all the groups |
Shaffer |
If TRUE, a Shaffer correction is performed (only if control = "FWER") |
formula |
an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. |
data |
an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment (formula) |
Details
Divide the n individuals in two samples. Then the three following steps are done: 1) Bootstrap Hierarchical Clustering of the variables of X 2) MLGL on the second sample of individuals 3) Hierarchical testing procedure on the first sample of individuals.
Value
a list containing:
- res
output of MLGL function
- lambdaOpt
lambda values maximizing the number of rejects
- var
A vector containing the index of selected variables for the first
lambdaOpt
value- group
A vector containing the values index of selected groups for the first
lambdaOpt
value- selectedGroups
Selected groups for the first
lambdaOpt
value- reject
Selected groups for all lambda values
- alpha
Control level
- test
Test used in the testing procedure
- control
"FDR" or "FWER"
- time
Elapsed time
Author(s)
Quentin Grimonprez
See Also
MLGL, hierarchicalFDR, hierarchicalFWER, selFDR, selFWER
Examples
# least square loss
set.seed(42)
X <- simuBlockGaussian(50, 12, 5, 0.7)
y <- X[, c(2, 7, 12)] %*% c(2, 2, -2) + rnorm(50, 0, 0.5)
res <- fullProcess(X, y)
Hierarchical testing with FDR control
Description
Apply hierarchical test for each hierarchy, and test external variables for FDR control at level alpha
Usage
hierarchicalFDR(X, y, group, var, test = partialFtest, addRoot = FALSE)
Arguments
X |
original data |
y |
associated response |
group |
vector with index of groups. group[i] contains the index of the group of the variable var[i]. |
var |
vector with the variables contained in each group. group[i] contains the index of the group of the variable var[i]. |
test |
function for testing the nullity of a group of coefficients in linear regression.
The function has 3 arguments: |
addRoot |
If TRUE, add a common root containing all the groups |
Details
Version of the hierarchical testing procedure of Yekutieli for MLGL output. You can use th selFDR function to select groups at a desired level alpha.
Value
a list containing:
- pvalues
pvalues of the different test (without correction)
- adjPvalues
adjusted pvalues
- groupId
Index of the group
- hierMatrix
Matrix describing the hierarchical tree.
References
Yekutieli, Daniel. "Hierarchical False Discovery Rate-Controlling Methodology." Journal of the American Statistical Association 103.481 (2008): 309-16.
See Also
Examples
set.seed(42)
X <- simuBlockGaussian(50, 12, 5, 0.7)
y <- X[, c(2, 7, 12)] %*% c(2, 2, -2) + rnorm(50, 0, 0.5)
res <- MLGL(X, y)
test <- hierarchicalFDR(X, y, res$group[[20]], res$var[[20]])
Hierarchical testing with FWER control
Description
Apply hierarchical test for each hierarchy, and test external variables for FWER control at level alpha
Usage
hierarchicalFWER(
X,
y,
group,
var,
test = partialFtest,
Shaffer = FALSE,
addRoot = FALSE
)
Arguments
X |
original data |
y |
associated response |
group |
vector with index of groups. group[i] contains the index of the group of the variable var[i]. |
var |
vector with the variables contained in each group. group[i] contains the index of the group of the variable var[i]. |
test |
function for testing the nullity of a group of coefficients in linear regression.
The function has 3 arguments: |
Shaffer |
boolean, if TRUE, a Shaffer correction is performed |
addRoot |
If TRUE, add a common root containing all the groups |
Details
Version of the hierarchical testing procedure of Meinshausen for MLGL output. You can use th selFWER function to select groups at a desired level alpha
Value
a list containing:
- pvalues
pvalues of the different test (without correction)
- adjPvalues
adjusted pvalues
- groupId
Index of the group
- hierMatrix
Matrix describing the hierarchical tree.
References
Meinshausen, Nicolai. "Hierarchical Testing of Variable Importance." Biometrika 95.2 (2008): 265-78.
See Also
Examples
set.seed(42)
X <- simuBlockGaussian(50, 12, 5, 0.7)
y <- X[, c(2, 7, 12)] %*% c(2, 2, -2) + rnorm(50, 0, 0.5)
res <- MLGL(X, y)
test <- hierarchicalFWER(X, y, res$group[[20]], res$var[[20]])
Obtain a sparse matrix of the coefficients of the path
Description
Obtain a sparse matrix of the coefficients of the path
Usage
listToMatrix(x, row = c("covariates", "lambda"))
Arguments
x |
|
row |
"lambda" or "covariates". If row="covariates", each row of the output matrix represents a covariate else if row="lambda", it represents a value of lambda. |
Details
This function can be used with a MLGL
object to obtain a matrix with all estimated coefficients
for the p original variables.
In case of overlapping groups, coefficients from repeated variables are summed.
Value
a sparse matrix containing the estimated coefficients for different lambdas
See Also
Examples
# Simulate gaussian data with block-diagonal variance matrix containing 12 blocks of size 5
X <- simuBlockGaussian(50, 12, 5, 0.7)
# Generate a response variable
y <- X[, c(2, 7, 12)] %*% c(2, 2, -2) + rnorm(50, 0, 0.5)
# Apply MLGL method
res <- MLGL(X, y)
# Convert output in sparse matrix format
beta <- listToMatrix(res)
Group-lasso with overlapping groups
Description
Group-lasso with overlapping groups
Usage
overlapgglasso(
X,
y,
var,
group,
lambda = NULL,
weight = NULL,
loss = c("ls", "logit"),
intercept = TRUE,
...
)
Arguments
X |
matrix of size n*p |
y |
vector of size n. If loss = "logit", elements of y must be in -1,1 |
var |
vector containing the variable to use |
group |
vector containing the associated groups |
lambda |
lambda values for group lasso. If not provided, the function generates its own values of lambda |
weight |
a vector the weight for each group. Default is the square root of the size of each group |
loss |
a character string specifying the loss function to use, valid options are: "ls" least squares loss (regression) and "logit" logistic loss (classification) |
intercept |
should an intercept be included in the model ? |
... |
Others parameters for |
Details
Use a group-lasso algorithm (see gglasso
) to solve a group-lasso with overlapping groups.
Each variable j of the original matrix X
is paste k(j) times in a new dataset with k(j) the number of
different groups containing the variable j.
The new dataset is used to solve the group-lasso with overlapping groups running a group-lasso algorithm.
Value
a MLGL object containing:
- lambda
lambda values
- b0
intercept values for
lambda
- beta
A list containing the values of estimated coefficients for each values of
lambda
- var
A list containing the index of selected variables for each values of
lambda
- group
A list containing the values index of selected groups for each values of
lambda
- nVar
A vector containing the number of non zero coefficients for each values of
lambda
- nGroup
A vector containing the number of non zero groups for each values of
lambda
- structure
A list containing 3 vectors. var: all variables used. group: associated groups. weight: weight associated with the different groups.
- time
computation time
- dim
dimension of
X
Source
Laurent Jacob, Guillaume Obozinski, and Jean-Philippe Vert. 2009. Group lasso with overlap and graph lasso. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML '09).
See Also
Examples
# Least square loss
set.seed(42)
X <- simuBlockGaussian(50, 12, 5, 0.7)
y <- X[, c(2, 7, 12)] %*% c(2, 2, -2) + rnorm(50, 0, 0.5)
var <- c(1:60, 1:8, 7:15)
group <- c(rep(1:12, each = 5), rep(13, 8), rep(14, 9))
res <- overlapgglasso(X, y, var, group)
# Logistic loss
y <- 2 * (rowSums(X[, 1:4]) > 0) - 1
var <- c(1:60, 1:8, 7:15)
group <- c(rep(1:12, each = 5), rep(13, 8), rep(14, 9))
res <- overlapgglasso(X, y, var, group, loss = "logit")
Partial F-test
Description
Perform a partial F-test
Usage
partialFtest(X, y, varToTest)
Arguments
X |
design matrix of size n*p |
y |
response vector of length n |
varToTest |
vector containing the index of the column of X to test |
Details
y = X * beta + epsilon
null hypothesis: beta[varToTest] = 0 alternative hypothesis: it exists an index k in varToTest such that beta[k] != 0
The test statistic is based on a full and a reduced model. full: y = X * beta + epsilon reduced: y = X * beta[-varToTest] + epsilon
Value
a vector of the same length as varToTest containing the p-values of the test.
See Also
Plot the path obtained from HMT
function
Description
Plot the path obtained from HMT
function
Usage
## S3 method for class 'HMT'
plot(
x,
log.lambda = FALSE,
lambda.lines = FALSE,
lambda.opt = c("min", "max", "both"),
...
)
Arguments
x |
|
log.lambda |
If TRUE, use log(lambda) instead of lambda in abscissa |
lambda.lines |
If TRUE, add vertical lines at lambda values |
lambda.opt |
If there is several optimal lambdas, which one to print "min", "max" or "both" |
... |
Other parameters for plot function |
See Also
Examples
set.seed(42)
# Simulate gaussian data with block-diagonal variance matrix containing 12 blocks of size 5
X <- simuBlockGaussian(50, 12, 5, 0.7)
# Generate a response variable
y <- X[, c(2, 7, 12)] %*% c(2, 2, -2) + rnorm(50, 0, 0.5)
# Apply MLGL method
res <- MLGL(X, y)
out <- HMT(res, X, y)
plot(out)
Plot the path obtained from MLGL
function
Description
Plot the path obtained from MLGL
function
Usage
## S3 method for class 'MLGL'
plot(x, log.lambda = FALSE, lambda.lines = FALSE, ...)
Arguments
x |
|
log.lambda |
If TRUE, use log(lambda) instead of lambda in abscissa |
lambda.lines |
if TRUE, add vertical lines at lambda values |
... |
Other parameters for plot function |
See Also
Examples
# Simulate gaussian data with block-diagonal variance matrix containing 12 blocks of size 5
set.seed(42)
X <- simuBlockGaussian(50, 12, 5, 0.7)
# Generate a response variable
y <- X[, c(2, 7, 12)] %*% c(2, 2, -2) + rnorm(50, 0, 0.5)
# Apply MLGL method
res <- MLGL(X, y)
# Plot the solution path
plot(res)
Plot the cross-validation obtained from cv.MLGL
function
Description
Plot the cross-validation obtained from cv.MLGL
function
Usage
## S3 method for class 'cv.MLGL'
plot(x, log.lambda = FALSE, ...)
Arguments
x |
|
log.lambda |
If TRUE, use log(lambda) instead of lambda in abscissa |
... |
Other parameters for plot function |
See Also
Examples
set.seed(42)
# Simulate gaussian data with block-diagonal variance matrix containing 12 blocks of size 5
X <- simuBlockGaussian(50, 12, 5, 0.7)
# Generate a response variable
y <- X[, c(2, 7, 12)] %*% c(2, 2, -2) + rnorm(50, 0, 0.5)
# Apply cv.MLGL method
res <- cv.MLGL(X, y)
# Plot the cv error curve
plot(res)
Plot the path obtained from fullProcess
function
Description
Plot the path obtained from fullProcess
function
Usage
## S3 method for class 'fullProcess'
plot(
x,
log.lambda = FALSE,
lambda.lines = FALSE,
lambda.opt = c("min", "max", "both"),
...
)
Arguments
x |
|
log.lambda |
If TRUE, use log(lambda) instead of lambda in abscissa |
lambda.lines |
If TRUE, add vertical lines at lambda values |
lambda.opt |
If there is several optimal lambdas, which one to print "min", "max" or "both" |
... |
Other parameters for plot function |
See Also
Examples
set.seed(42)
# Simulate gaussian data with block-diagonal variance matrix containing 12 blocks of size 5
X <- simuBlockGaussian(50, 12, 5, 0.7)
# Generate a response variable
y <- X[, c(2, 7, 12)] %*% c(2, 2, -2) + rnorm(50, 0, 0.5)
# Apply MLGL method
res <- fullProcess(X, y)
# Plot the solution path
plot(res)
Plot the stability path obtained from stability.MLGL
function
Description
Plot the stability path obtained from stability.MLGL
function
Usage
## S3 method for class 'stability.MLGL'
plot(x, log.lambda = FALSE, threshold = 0.75, ...)
Arguments
x |
|
log.lambda |
If TRUE, use log(lambda) instead of lambda in abscissa |
threshold |
Threshold for selection frequency |
... |
Other parameters for plot function |
Value
A list containing:
- var
Index of selected variables for the given threshold.
- group
Index of the associated group.
- threshold
Value of threshold
See Also
Examples
set.seed(42)
# Simulate gaussian data with block-diagonal variance matrix containing 12 blocks of size 5
X <- simuBlockGaussian(50, 12, 5, 0.7)
# Generate a response variable
y <- X[, c(2, 7, 12)] %*% c(2, 2, -2) + rnorm(50, 0, 0.5)
# Apply stability.MLGL method
res <- stability.MLGL(X, y)
selected <- plot(res)
print(selected)
Predict fitted values from a MLGL
object
Description
Predict fitted values from a MLGL
object
Usage
## S3 method for class 'MLGL'
predict(object, newx = NULL, s = NULL, type = c("fit", "coefficients"), ...)
Arguments
object |
|
newx |
matrix with new individuals for prediction. If type="coefficients", the parameter has to be NULL |
s |
values of lambda. If NULL, use values from object |
type |
if "fit", return the fitted values for each values of s, if "coefficients", return the estimated coefficients for each s |
... |
Not used. Other arguments to predict. |
Value
A matrix with fitted values or estimated coefficients for given values of s.
Author(s)
original code from gglasso package Author: Yi Yang <yiyang@umn.edu>, Hui Zou <hzou@stat.umn.edu>
function inspired from predict function from gglasso package by Yi Yang and Hui Zou.
See Also
Examples
X <- simuBlockGaussian(n = 50, nBlock = 12, sizeBlock = 5, rho = 0.7)
y <- drop(X[, c(2, 7, 12)] %*% c(2, 2, -1)) + rnorm(50, 0, 0.5)
m1 <- MLGL(X, y, loss = "ls")
predict(m1, newx = X)
predict(m1, s=3, newx = X)
predict(m1, s=1:3, newx = X)
Predict fitted values from a cv.MLGL
object
Description
Predict fitted values from a cv.MLGL
object
Usage
## S3 method for class 'cv.MLGL'
predict(
object,
newx = NULL,
s = c("lambda.1se", "lambda.min"),
type = c("fit", "coefficients"),
...
)
Arguments
object |
|
newx |
matrix with new individuals for prediction. If type="coefficients", the parameter has to be NULL |
s |
Either "lambda.1se" or "lambda.min" |
type |
if "fit", return the fitted values for each values of s, if "coefficients", return the estimated coefficients for each s |
... |
Not used. Other arguments to predict. |
Value
A matrix with fitted values or estimated coefficients for given values of s.
Author(s)
Quentin Grimonprez
See Also
Print Values
Description
Print a HMT
object
Usage
## S3 method for class 'HMT'
print(x, ...)
Arguments
x |
|
... |
Not used. |
See Also
Examples
set.seed(42)
# Simulate gaussian data with block-diagonal variance matrix containing 12 blocks of size 5
X <- simuBlockGaussian(50, 12, 5, 0.7)
# Generate a response variable
y <- X[, c(2, 7, 12)] %*% c(2, 2, -2) + rnorm(50, 0, 0.5)
# Apply MLGL method
res <- MLGL(X, y)
out <- HMT(res, X, y)
print(out)
Print Values
Description
Print a MLGL
object
Usage
## S3 method for class 'MLGL'
print(x, ...)
Arguments
x |
|
... |
Not used. |
See Also
Examples
set.seed(42)
# Simulate gaussian data with block-diagonal variance matrix containing 12 blocks of size 5
X <- simuBlockGaussian(50, 12, 5, 0.7)
# Generate a response variable
y <- X[, c(2, 7, 12)] %*% c(2, 2, -2) + rnorm(50, 0, 0.5)
# Apply MLGL method
res <- MLGL(X, y)
print(res)
Print Values
Description
Print a fullProcess
object
Usage
## S3 method for class 'fullProcess'
print(x, ...)
Arguments
x |
|
... |
Not used. |
See Also
fullProcess summary.fullProcess
Examples
set.seed(42)
# Simulate gaussian data with block-diagonal variance matrix containing 12 blocks of size 5
X <- simuBlockGaussian(50, 12, 5, 0.7)
# Generate a response variable
y <- X[, c(2, 7, 12)] %*% c(2, 2, -2) + rnorm(50, 0, 0.5)
# Apply MLGL method
res <- fullProcess(X, y)
print(res)
Selection from hierarchical testing with FDR control
Description
Select groups from hierarchical testing procedure with FDR control (hierarchicalFDR)
Usage
selFDR(out, alpha = 0.05, global = TRUE, outer = TRUE)
Arguments
out |
output of hierarchicalFDR function |
alpha |
control level for test |
global |
if FALSE the provided alpha is the desired level control for each family. |
outer |
if TRUE, the FDR is controlled only on outer node (rejected groups without rejected children). If FALSE, it is controlled on the full tree. |
Details
See the reference for mode details about the method.
If each family is controlled at a level alpha, we have the following control: FDR control of full tree: alpha * delta * 2 (delta = 1.44) FDR control of outer node: alpha * L * delta * 2 (delta = 1.44)
Value
a list containing:
- toSel
vector of boolean. TRUE if the group is selected
- groupId
Names of groups
- local.alpha
control level for each family of hypothesis
- global.alpha
control level for the tree (full tree or outer node)
References
Yekutieli, Daniel. "Hierarchical False Discovery Rate-Controlling Methodology." Journal of the American Statistical Association 103.481 (2008): 309-16.
See Also
Examples
set.seed(42)
X <- simuBlockGaussian(50, 12, 5, 0.7)
y <- X[, c(2, 7, 12)] %*% c(2, 2, -2) + rnorm(50, 0, 0.5)
res <- MLGL(X, y)
test <- hierarchicalFDR(X, y, res$group[[20]], res$var[[20]])
sel <- selFDR(test, alpha = 0.05)
Selection from hierarchical testing with FWER control
Description
Select groups from hierarchical testing procedure with FWER control (hierarchicalFWER)
Usage
selFWER(out, alpha = 0.05)
Arguments
out |
output of hierarchicalFWER function |
alpha |
control level for test |
Details
Only outer nodes (rejected groups without rejected children) are returned as TRUE.
Value
a list containing:
- toSel
vector of boolean. TRUE if the group is selected
- groupId
Names of groups
References
Meinshausen, Nicolai. "Hierarchical Testing of Variable Importance." Biometrika 95.2 (2008): 265-78.
See Also
Examples
set.seed(42)
X <- simuBlockGaussian(50, 12, 5, 0.7)
y <- X[, c(2, 7, 12)] %*% c(2, 2, -2) + rnorm(50, 0, 0.5)
res <- MLGL(X, y)
test <- hierarchicalFWER(X, y, res$group[[20]], res$var[[20]])
sel <- selFWER(test, alpha = 0.05)
Simulate multivariate Gaussian samples with block diagonal variance matrix
Description
Simulate n samples from a gaussian multivariate law with 0 vector mean and block diagonal variance matrix with diagonal 1 and block of rho.
Usage
simuBlockGaussian(n, nBlock, sizeBlock, rho)
Arguments
n |
number of samples to simulate |
nBlock |
number of blocks |
sizeBlock |
size of blocks |
rho |
correlation within each block |
Value
a matrix of size n * (nBlock * sizeBlock) containing the samples
Author(s)
Quentin Grimonprez
Examples
X <- simuBlockGaussian(50, 12, 5, 0.7)
Stability Selection for Multi-Layer Group-lasso
Description
Stability selection for MLGL
Usage
stability.MLGL(
X,
y,
B = 50,
fraction = 0.5,
hc = NULL,
lambda = NULL,
weightLevel = NULL,
weightSizeGroup = NULL,
loss = c("ls", "logit"),
intercept = TRUE,
verbose = FALSE,
...
)
Arguments
X |
matrix of size n*p |
y |
vector of size n. If loss = "logit", elements of y must be in -1,1 |
B |
number of bootstrap sample |
fraction |
Fraction of data used at each of the |
hc |
output of |
lambda |
lambda values for group lasso. If not provided, the function generates its own values of lambda |
weightLevel |
a vector of size p for each level of the hierarchy. A zero indicates that the level will be ignored. If not provided, use 1/(height between 2 successive levels) |
weightSizeGroup |
a vector |
loss |
a character string specifying the loss function to use, valid options are: "ls" least squares loss (regression) and "logit" logistic loss (classification) |
intercept |
should an intercept be included in the model ? |
verbose |
print some informations |
... |
Others parameters for |
Details
Hierarchical clustering is performed with all the variables. Then, the partitions from the different levels of the hierarchy are used in the different runs of MLGL for estimating the probability of selection of each group.
Value
a stability.MLGL object containing:
- lambda
sequence of
lambda
.- B
Number of bootstrap samples.
- stability
A matrix of size length(lambda)*number of groups containing the probability of selection of each group
- var
vector containing the index of covariates
- group
vector containing the index of associated groups of covariates
- time
computation time
Author(s)
Quentin Grimonprez
References
Meinshausen and Buhlmann (2010). Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72.4, p. 417-473.
See Also
Examples
set.seed(42)
# Simulate gaussian data with block-diagonal variance matrix containing 12 blocks of size 5
X <- simuBlockGaussian(50, 12, 5, 0.7)
# Generate a response variable
y <- X[, c(2, 7, 12)] %*% c(2, 2, -2) + rnorm(50, 0, 0.5)
# Apply stability.MLGL method
res <- stability.MLGL(X, y)
Object Summaries
Description
Summary of a HMT
object
Usage
## S3 method for class 'HMT'
summary(object, ...)
Arguments
object |
|
... |
Not used. |
See Also
Examples
set.seed(42)
# Simulate gaussian data with block-diagonal variance matrix containing 12 blocks of size 5
X <- simuBlockGaussian(50, 12, 5, 0.7)
# Generate a response variable
y <- X[, c(2, 7, 12)] %*% c(2, 2, -2) + rnorm(50, 0, 0.5)
# Apply MLGL method
res <- MLGL(X, y)
out <- HMT(res, X, y)
summary(out)
Object Summaries
Description
Summary of a MLGL
object
Usage
## S3 method for class 'MLGL'
summary(object, ...)
Arguments
object |
|
... |
Not used. |
See Also
Examples
set.seed(42)
# Simulate gaussian data with block-diagonal variance matrix containing 12 blocks of size 5
X <- simuBlockGaussian(50, 12, 5, 0.7)
# Generate a response variable
y <- X[, c(2, 7, 12)] %*% c(2, 2, -2) + rnorm(50, 0, 0.5)
# Apply MLGL method
res <- MLGL(X, y)
summary(res)
Object Summaries
Description
Summary of a fullProcess
object
Usage
## S3 method for class 'fullProcess'
summary(object, ...)
Arguments
object |
|
... |
Not used. |
See Also
Examples
set.seed(42)
# Simulate gaussian data with block-diagonal variance matrix containing 12 blocks of size 5
X <- simuBlockGaussian(50, 12, 5, 0.7)
# Generate a response variable
y <- X[, c(2, 7, 12)] %*% c(2, 2, -2) + rnorm(50, 0, 0.5)
# Apply MLGL method
res <- fullProcess(X, y)
summary(res)
Find all unique groups in hclust
results
Description
Find all unique groups in hclust
results
Usage
uniqueGroupHclust(hc)
Arguments
hc |
output of |
Value
A list containing:
- indexGroup
Vector containing the index of variables.
- varGroup
Vector containing the index of the group of each variable.
Author(s)
Quentin Grimonprez
Examples
hc <- hclust(dist(USArrests), "average")
res <- uniqueGroupHclust(hc)