Type: | Package |
Title: | Sliced Inverse Regression with Thresholding |
Version: | 1.0.2 |
Author: | Clement Weinreich [aut, cre], Jerome Saracco [aut], Hadrien Lorenzo [aut] |
Maintainer: | Clement Weinreich <clement@weinreich.fr> |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2.0)] |
Description: | Implements a thresholded version of the Sliced Inverse Regression method (Li, K. C. (1991) <doi:10.2307/2290563>), which allows to do variable selection. |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.0 |
Imports: | strucchange |
Suggests: | knitr, rmarkdown, mvtnorm |
VignetteBuilder: | knitr |
URL: | https://clement-w.github.io/SIRthresholded/ |
NeedsCompilation: | no |
Packaged: | 2023-06-09 07:08:26 UTC; clement |
Repository: | CRAN |
Date/Publication: | 2023-06-09 07:32:54 UTC |
Classic SIR
Description
Apply a single-index SIR
on (X,Y)
with H
slices. This function allows to obtain an
estimate of a basis of the EDR
(Effective Dimension Reduction) space via the eigenvector
\hat{b}
associated with the largest nonzero eigenvalue of the matrix of interest
\widehat{\Sigma}_n^{-1}\widehat{\Gamma}_n
. Thus, \hat{b}
is an EDR
direction.
Usage
SIR(Y, X, H = 10, graph = TRUE, choice = "")
Arguments
Y |
A numeric vector representing the dependent variable (a response vector). |
X |
A matrix representing the quantitative explanatory variables (bind by column). |
H |
The chosen number of slices (default is 10). |
graph |
A boolean that must be set to true to display graphics (default is TRUE). |
choice |
the graph to plot:
|
Value
An object of class SIR, with attributes:
b |
This is an estimated EDR direction, which is the principal eigenvector of the interest matrix. |
M1 |
The interest matrix. |
eig_val |
The eigenvalues of the interest matrix. |
n |
Sample size. |
p |
The number of variables in X. |
H |
The chosen number of slices. |
call |
Unevaluated call to the function. |
index_pred |
The index Xb' estimated by SIR. |
Y |
The response vector. |
Examples
# Generate Data
set.seed(10)
n <- 500
beta <- c(1,1,rep(0,8))
X <- mvtnorm::rmvnorm(n,sigma=diag(1,10))
eps <- rnorm(n)
Y <- (X%*%beta)**3+eps
# Apply SIR
SIR(Y, X, H = 10)
Bootstrap SIR
Description
Apply a single-index SIR
on B
bootstraped samples of (X,Y)
with H
slices.
Usage
SIR_bootstrap(Y, X, H = 10, B = 10, graph = TRUE, choice = "")
Arguments
Y |
A numeric vector representing the dependent variable (a response vector). |
X |
A matrix representing the quantitative explanatory variables (bind by column). |
H |
The chosen number of slices (default is 10). |
B |
The number of bootstrapped samples to draw (default is 10). |
graph |
A boolean that must be set to true to display graphics (default is TRUE). |
choice |
the graph to plot:
|
Value
An object of class SIR_bootstrap, with attributes:
b |
This is an estimated EDR direction, which is the principal eigenvector of the interest matrix. |
mat_b |
A matrix of size p*B that contains an estimation of beta in the columns for each bootstrapped sample. |
n |
Sample size. |
p |
The number of variables in X. |
H |
The chosen number of slices. |
call |
Unevaluated call to the function. |
index_pred |
The index b'X estimated by SIR. |
Y |
The response vector. |
Examples
# Generate Data
set.seed(10)
n <- 500
beta <- c(1,1,rep(0,8))
X <- mvtnorm::rmvnorm(n,sigma=diag(1,10))
eps <- rnorm(n)
Y <- (X%*%beta)**3+eps
# Apply bootstrap SIR
SIR_bootstrap(Y, X, H = 10, B = 10)
SIR threshold
Description
Apply a single-index SIR
on (X,Y)
with H
slices, with a parameter \lambda
which
apply a soft/hard thresholding to the interest matrix \widehat{\Sigma}_n^{-1}\widehat{\Gamma}_n
.
Usage
SIR_threshold(
Y,
X,
H = 10,
lambda = 0,
thresholding = "hard",
graph = TRUE,
choice = ""
)
Arguments
Y |
A numeric vector representing the dependent variable (a response vector). |
X |
A matrix representing the quantitative explanatory variables (bind by column). |
H |
The chosen number of slices (default is 10). |
lambda |
The thresholding parameter (default is 0). |
thresholding |
The thresholding method to choose between hard and soft (default is hard). |
graph |
A boolean that must be set to true to display graphics (default is TRUE). |
choice |
the graph to plot:
|
Value
An object of class SIR_threshold, with attributes:
b |
This is an estimated EDR direction, which is the principal eigenvector of the interest matrix. |
M1 |
The interest matrix thresholded. |
eig_val |
The eigenvalues of the interest matrix thresholded. |
eig_vect |
A matrix corresponding to the eigenvectors of the interest matrix. |
Y |
The response vector. |
n |
Sample size. |
p |
The number of variables in X. |
H |
The chosen number of slices. |
nb.zeros |
The number of 0 in the estimation of the vector beta. |
index_pred |
The index Xb' estimated by SIR. |
list.relevant.variables |
A list that contains the variables selected by the model. |
cos_squared |
The cosine squared between vanilla SIR and SIR thresholded. |
lambda |
The thresholding parameter used. |
thresholding |
The thresholding method used. |
call |
Unevaluated call to the function. |
X_reduced |
The X data restricted to the variables selected by the model. It can be used to estimate a new SIR model on the relevant variables to improve the estimation of b. |
Examples
# Generate Data
set.seed(10)
n <- 500
beta <- c(1,1,rep(0,8))
X <- mvtnorm::rmvnorm(n,sigma=diag(1,10))
eps <- rnorm(n)
Y <- (X%*%beta)**3+eps
# Apply SIR with hard thresholding
SIR_threshold(Y, X, H = 10, lambda = 0.2, thresholding = "hard")
SIR optimally thresholded on bootstraped replications
Description
Apply a single-index optimally soft/hard thresholded SIR
with H
slices on
'n_replications' bootstraped replications of (X,Y)
. The optimal number of
selected variables is the number of selected variables that came back most often
among the replications performed. From this, we can get the corresponding \hat{b}
and \lambda_{opt}
that produce the same number of selected variables in the result of
'SIR_threshold_opt'.
Usage
SIR_threshold_bootstrap(
Y,
X,
H = 10,
thresholding = "hard",
n_replications = 50,
graph = TRUE,
output = TRUE,
n_lambda = 100,
k = 2,
choice = ""
)
Arguments
Y |
A numeric vector representing the dependent variable (a response vector). |
X |
A matrix representing the quantitative explanatory variables (bind by column). |
H |
The chosen number of slices (default is 10). |
thresholding |
The thresholding method to choose between hard and soft (default is hard). |
n_replications |
The number of bootstraped replications of (X,Y) done to estimate the model (default is 50). |
graph |
A boolean, set to TRUE to plot graphs (default is TRUE). |
output |
A boolean, set to TRUE to print information (default is TRUE). |
n_lambda |
The number of lambda to test. The n_lambda tested lambdas are uniformally distributed between 0 and the maximum value of the interest matrix (default is 100). |
k |
Multiplication factor of the bootstrapped sample size (default is 1 = keep the same size as original data). |
choice |
the graph to plot:
|
Value
An object of class SIR_threshold_bootstrap, with attributes:
b |
This is the optimal estimated EDR direction, which is the principal eigenvector of the interest matrix. |
lambda_opt |
The optimal lambda. |
vec_nb_var_selec |
Vector that contains the number of selected variables for each replications. |
occurrences_var |
Vector that contains at index i the number of times the i_th variable has been selected in a replication. |
call |
Unevaluated call to the function. |
nb_var_selec_opt |
Optimal number of selected variables which is the number of selected variables that came back most often among the replications performed. |
list_relevant_variables |
A list that contains the variables selected by the model. |
n |
Sample size. |
p |
The number of variables in X. |
H |
The chosen number of slices. |
n_replications |
The number of bootstraped replications of (X,Y) done to estimate the model. |
thresholding |
The thresholding method used. |
X_reduced |
The X data restricted to the variables selected by the model. It can be used to estimate a new SIR model on the relevant variables to improve the estimation of b. |
mat_b |
Contains the estimation b at each bootstraped replications. |
lambdas_opt_boot |
Contains the optimal lambda found by SIR_threshold_opt at each replication. |
index_pred |
The index Xb' estimated by SIR. |
Y |
The response vector. |
M1 |
The interest matrix thresholded with the optimal lambda. |
Examples
# Generate Data
set.seed(8)
n <- 170
beta <- c(1,1,1,1,1,rep(0,15))
X <- mvtnorm::rmvnorm(n,sigma=diag(1,20))
eps <- rnorm(n,sd=8)
Y <- (X%*%beta)**3+eps
# Apply SIR with hard thresholding
SIR_threshold_bootstrap(Y,X,H=10,n_lambda=300,thresholding="hard", n_replications=30,k=2)
SIR optimally thresholded
Description
Apply a single-index SIR
on (X,Y)
with H
slices, with a soft/hard thresholding
of the interest matrix \widehat{\Sigma}_n^{-1}\widehat{\Gamma}_n
by an optimal
parameter \lambda_{opt}
. The \lambda_{opt}
is found automatically among a vector
of n_lambda
\lambda
, starting from 0 to the maximum value of
\widehat{\Sigma}_n^{-1}\widehat{\Gamma}_n
. For each feature of X
,
the number of \lambda
associated with a selection of this feature is stored
(in a vector of size p
). This vector is sorted in a decreasing way. Then, thanks to
strucchange::breakpoints
, a breakpoint is found in this sorted vector. The coefficients
of the variables at the left of the breakpoint, tend to be automatically toggled to 0 due
to the thresholding operation based on \lambda_{opt}
, and so should be removed (useless
variables). Finally, \lambda_{opt}
corresponds to the first \lambda
such that the
associated \hat{b}
provides the same number of zeros as the breakpoint's value.
For example, for X \in R^{10}
and n_lambda=100
, this sorted vector can look like this :
X10 | X3 | X8 | X5 | X7 | X9 | X4 | X6 | X2 | X1 |
2 | 3 | 3 | 4 | 4 | 4 | 6 | 10 | 95 | 100 |
Here, the breakpoint would be 8.
Usage
SIR_threshold_opt(
Y,
X,
H = 10,
n_lambda = 100,
thresholding = "hard",
graph = TRUE,
output = TRUE,
choice = ""
)
Arguments
Y |
A numeric vector representing the dependent variable (a response vector). |
X |
A matrix representing the quantitative explanatory variables (bind by column). |
H |
The chosen number of slices (default is 10). |
n_lambda |
The number of lambda to test. The n_lambda tested lambdas are uniformally distributed between 0 and the maximum value of the interest matrix. (default is 100). |
thresholding |
The thresholding method to choose between hard and soft (default is hard). |
graph |
A boolean, set to TRUE to plot graphs (default is TRUE). |
output |
A boolean, set to TRUE to print informations (default is TRUE). |
choice |
the graph to plot:
|
Value
An object of class SIR_threshold_opt, with attributes:
b |
This is the optimal estimated EDR direction, which is the principal eigenvector of the interest matrix. |
lambdas |
A vector that contains the tested lambdas. |
lambda_opt |
The optimal lambda. |
mat_b |
A matrix of size p*n_lambda that contains an estimation of beta in the columns for each lambda. |
n_lambda |
The number of lambda tested. |
vect_nb_zeros |
The number of 0 in b for each lambda. |
list_relevant_variables |
A list that contains the variables selected by the model. |
fit_bp |
An object of class breakpoints from the strucchange package, that contains informations about the breakpoint which allows to deduce the optimal lambda. |
indices_useless_var |
A vector that contains p items: each variable is associated with the number of lambda that selects this variable. |
vect_cos_squared |
A vector that contains for each lambda, the cosine squared between vanilla SIR and SIR thresholded. |
Y |
The response vector. |
n |
Sample size. |
p |
The number of variables in X. |
H |
The chosen number of slices. |
M1 |
The interest matrix thresholded with the optimal lambda. |
thresholding |
The thresholding method used. |
call |
Unevaluated call to the function. |
X_reduced |
The X data restricted to the variables selected by the model. It can be used to estimate a new SIR model on the relevant variables to improve the estimation of b. |
index_pred |
The index Xb' estimated by SIR. |
Examples
# Generate Data
set.seed(2)
n <- 200
beta <- c(1,1,rep(0,8))
X <- mvtnorm::rmvnorm(n,sigma=diag(1,10))
eps <- rnorm(n)
Y <- (X%*%beta)**3+eps
# Apply SIR with soft thresholding
SIR_threshold_opt(Y,X,H=10,n_lambda=300,thresholding="soft")
Graphical output of SIR
Description
Display the 10 first eigen values and the estimated index versus Y of the SIR model.
Usage
## S3 method for class 'SIR'
plot(x, choice = "", ...)
Arguments
x |
A SIR object |
choice |
the graph to plot:
|
... |
arguments to be passed to methods, such as graphical parameters (not used here). |
Value
No return value
Examples
# Generate Data
set.seed(10)
n <- 500
beta <- c(1,1,rep(0,8))
X <- mvtnorm::rmvnorm(n,sigma=diag(1,10))
eps <- rnorm(n)
Y <- (X%*%beta)**3+eps
# Apply SIR
res = SIR(Y, X, H = 10, graph = FALSE)
# Eigen values
plot(res,choice="eigvals")
# Estimated index versus Y
plot(res,choice="estim_ind")
Graphical output of SIR_bootstrap
Description
Display the 10 first eigen values and the estimated index versus Y of the SIRbootstrap model.
Usage
## S3 method for class 'SIR_bootstrap'
plot(x, choice = "", ...)
Arguments
x |
A SIR_bootstrap object |
choice |
the graph to plot:
|
... |
arguments to be passed to methods, such as graphical parameters (not used here). |
Value
No return value
Examples
# Generate Data
set.seed(10)
n <- 500
beta <- c(1,1,rep(0,8))
X <- mvtnorm::rmvnorm(n,sigma=diag(1,10))
eps <- rnorm(n)
Y <- (X%*%beta)**3+eps
# Apply bootstrap SIR
res = SIR_bootstrap(Y, X, H = 10, B = 10)
# Eigen values
plot(res,choice="eigvals")
# Estimated index versus Y
plot(res,choice="estim_ind")
Graphical output of SIR_threshold
Description
Display the 10 first eigen values and the estimated index versus Y of the thresholded SIR model.
Usage
## S3 method for class 'SIR_threshold'
plot(x, choice = "", ...)
Arguments
x |
A SIR_threshold object |
choice |
the graph to plot:
|
... |
arguments to be passed to methods, such as graphical parameters (not used here). |
Value
No return value
Examples
# Generate Data
set.seed(10)
n <- 500
beta <- c(1,1,rep(0,8))
X <- mvtnorm::rmvnorm(n,sigma=diag(1,10))
eps <- rnorm(n)
Y <- (X%*%beta)**3+eps
# Apply SIR with hard thresholding
res = SIR_threshold(Y, X, H = 10, lambda = 0.2, thresholding = "hard")
# Eigen values
plot(res,choice="eigvals")
# Estimated index versus Y
plot(res,choice="estim_ind")
Graphical output of SIR_threshold_bootstrap
Description
Display the estimated index versus Y of the SIR model, the size of the models,
the occurrence of variable selection, the distribution of the coefficients of
and \hat{b}
and the distribution of \lambda_{opt}
found across the replications.
Usage
## S3 method for class 'SIR_threshold_bootstrap'
plot(x, choice = "", ...)
Arguments
x |
A SIR_threshold_bootstrap object |
choice |
the graph to plot:
|
... |
arguments to be passed to methods, such as graphical parameters (not used here). |
Value
No return value
Examples
# Generate Data
set.seed(10)
n <- 200
beta <- c(1,1,rep(0,8))
X <- mvtnorm::rmvnorm(n,sigma=diag(1,10))
eps <- rnorm(n)
Y <- (X%*%beta)**3+eps
res = SIR_threshold_bootstrap(Y,X,H=10,n_lambda=300,thresholding="hard", n_replications=30,k=2)
# Estimated index versus Y
plot(res,choice="estim_ind")
# Model size
plot(res,choice="size")
# Selected variables
plot(res,choice="selec_var")
# Coefficients of b
plot(res,choice="coefs_b")
# Optimal lambdas
plot(res,choice="lambdas_replic")
Graphical output of SIR_threshold_opt
Description
Display the 10 first eigen values,the estimated index versus Y of the SIR model,
the evolution of cos^2
and variable selection according to \lambda
, and the
regularization path of \hat{b}
.
Usage
## S3 method for class 'SIR_threshold_opt'
plot(x, choice = "", ...)
Arguments
x |
A SIR_threshold_opt object |
choice |
the graph to plot:
|
... |
arguments to be passed to methods, such as graphical parameters (not used here). |
Value
No return value
Examples
# Generate Data
set.seed(10)
n <- 200
beta <- c(1,1,rep(0,8))
X <- mvtnorm::rmvnorm(n,sigma=diag(1,10))
eps <- rnorm(n)
Y <- (X%*%beta)**3+eps
# Apply SIR with soft thresholding
res = SIR_threshold_opt(Y,X,H=10,n_lambda=100,thresholding="soft")
# Estimated index versus Y
plot(res,choice="estim_ind")
# Choice of optimal lambda
plot(res,choice="opt_lambda")
# Evolution of cos^2 and var selection according to lambda
plot(res,choice="cos2_selec")
# Regularization path
plot(res,choice="regul_path")