Type: | Package |
Title: | GCC Estimation of the Multilevel Factor Model |
Version: | 1.1.0 |
Maintainer: | Rui Lin <ruilin1081@gmail.com> |
Description: | Provides methods for model selection, estimation, inference, and simulation for the multilevel factor model, based on the principal component estimation and generalised canonical correlation approach. Details can be found in "Generalised Canonical Correlation Estimation of the Multilevel Factor Model." Lin and Shin (2025) <doi:10.2139/ssrn.4295429>. |
Imports: | stats, stringr, sandwich |
Suggests: | plm |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Depends: | R (≥ 2.10) |
NeedsCompilation: | no |
Packaged: | 2025-06-27 08:28:46 UTC; Administrator |
Author: | Rui Lin [aut, cre], Yongcheol Shin [aut] |
Repository: | CRAN |
Date/Publication: | 2025-06-27 09:00:02 UTC |
Generalised canonical correlation estimation for the global factors
Description
This function is one of the main functions the package, employing the
generalized canonical correlation estimation for both the global factors
\boldsymbol{G}
and, when not explicitly provided, for the number of
global factors r_{0}
. Typically, this function is intended for internal
purposes. Users can opt for [GCC()] instead of [multilevel()], if they only
need to estimate the number of global factors.
Usage
GCC(
data,
standarise = TRUE,
r_max = 10,
r0 = NULL,
ri = NULL,
depvar_header = NULL,
i_header = NULL,
j_header = NULL,
t_header = NULL
)
Arguments
data |
Either a data.frame or a list of data matrices of length |
standarise |
A logical indicating whether the data is standardised before estimation or not. See Details. |
r_max |
An integer indicating the maximum number of factors allowed. See Details. |
r0 |
An integer of the number of global factors. See Details. |
ri |
An array of length |
depvar_header |
A character string specifying the header of the dependent variable. See Details. |
i_header |
A character string specifying the header of the block identifier. See Details. |
j_header |
A character string specifying the header of the individual identifier. See Details. |
t_header |
A character string specifying the header of the time identifier. See Details. |
Details
The user-supplied data.frame should contain at least four columns, namely the
dependent variable (y_{ijt}
), block identifier (i
), individual
identifier (j
), and time (t
). The user needs to supply their corresponding
headers in the data.frame to the function using the parameters "depvar_header",
"i_header", "j_header", and "t_header", respectively. If the data is supplied
as a list, these arguments will not be used.
If either r0 = NULL or ri = NULL, both of them will be estimated. In such case, "r_max" must be supplied. If "r0" and "ri" are supplied then "r_max" is not needed and will be ignored.
If standarise = TRUE, each time series will be standardised so it has zero mean and unit variance.
Value
A list containing the estimated number of global factors \hat{r}_{0}
,
the global factors \widehat{\boldsymbol{G}}
, and the other elements that are
used in multilevel().
References
Lin, R. and Shin, Y., 2025. Generalised Canonical Correlation Estimation of the Multilevel Factor Model. Available at SSRN 4783804.
Examples
panel <- UKhouse # load the data
Y_list <- panel2list(panel, depvar_header = "dlPrice", i_header = "Region",
j_header = "LPA_Type", t_header = "Date")
est_GCC <- GCC(Y_list, r_max = 10)
r0_hat <- est_GCC$r0 # number of global factors
G_hat <- est_GCC$G # global factors
Principal component (PC) estimation of the approximate factor model
Description
Perform PC estimation of the (2D) approximate factor model:
y_{it}=\boldsymbol{\lambda}_{i}^{\prime}\boldsymbol{F}_{t}+e_{it},
or in matrix notation:
\boldsymbol{Y}=\boldsymbol{F}\boldsymbol{\Lambda}^{\prime}+\boldsymbol{e}.
The factors \boldsymbol{F}
is estimated as \sqrt{T}
times the r
eigenvectors of
the matrix \boldsymbol{Y}\boldsymbol{Y}^{\prime}
corresponding to the r
largest eigenvalues in descending order, and the loading matrix is estimated by
\boldsymbol{\Lambda}=T^{-1}\boldsymbol{Y}^{\prime}\boldsymbol{F}
.
See e.g. Bai and Ng (2002).
Usage
PC(Y, r)
Arguments
Y |
A |
r |
= the number of factors. |
Value
A list containing the factors and factor loadings:
factor = a
T \times r
matrix of the estimated factors.loading = a
N \times r
matrix of the estimated factor loadings.
References
Bai, J. and Ng, S., 2002. Determining the number of factors in approximate factor models. Econometrica, 70(1), pp.191-221.
Examples
# simulate data
T <- 100
N <- 50
r <- 2
F <- matrix(stats::rnorm(T * r, 0, 1), nrow = T)
Lambda <- matrix(stats::rnorm(N * r, 0, 1), nrow = N)
err <- matrix(stats::rnorm(T * N, 0, 1), nrow = T)
Y <- F %*% t(Lambda) + err
# estimation
est_PC <- PC(Y, r)
England and Wales House Price Growth Data Categorised by Regions
Description
A data.frame containing the quarterly (mean) house prices of four different types of properties, (detached, semi-detached, terraced and flats/maisonettes) for 331 local planning authorities (LPA) over the period 1996Q1 to 2021Q2. See also Lin and Shin (2023).
Usage
UKhouse
Format
## 'UKhouse'
Details
Each LPA belongs to one of the ten regions: North East (NE), North West (NW),
Yorkshire and the Humber (YH), East Midlands (EM), West Midlands(WM),
East of England (EE), London (LD), South East (SE), South West (SW) and Wales (WA).
The real house price growth of the j
-th LPA-type pair in region
i
by deflating the nominal house price by CPI and log-differencing it as
\pi_{ijt}=100\times \log\left(\frac{PRICE_{ijt}}{CPI_{t}}\right)-100 \times
\log\left(\frac{PRICE_{ij,t-1}}{CPI_{t-1}}\right).
By removing the series with missing observations, it ends up with a balanced panel
with R = 10
, N =\sum_{i=1}^{R} N_{i} = 1300
and T = 102
.
Columns in the dataset:
"Date" Time variable.
"Region" Name of region which the LPA belongs to.
"LPA" Name of the LPA.
"Type" Name of the house type.
"LPA_Type" Name of the LPA-type pair.
Source
Office for National Statistics (ONS), ONS website, statistical bulletin, House price statistics for small areas in England and Wales: year ending June 2021
References
Lin, R. and Shin, Y., 2022. Generalised Canonical Correlation Estimation of the Multilevel Factor Model. Available at SSRN 4295429.
Check validity of the data and headers
Description
This is an internal function which checks the validity of the data and
provide a list of matrices of length R
for estimation.
Usage
check_data(
data,
depvar_header = NULL,
i_header = NULL,
j_header = NULL,
t_header = NULL
)
Arguments
data |
Either a data.frame or a list of data matrices of length |
depvar_header |
A character string specifying the header of the dependent variable. See Details. |
i_header |
A character string specifying the header of the block identifier. See Details. |
j_header |
A character string specifying the header of the individual identifier. See Details. |
t_header |
A character string specifying the header of the time identifier. See Details. |
Details
See Details of [GCC()].
Value
A list of data matrices of length R
.
Examples
panel <- UKhouse # load the data
Y_list <- check_data(panel,
depvar_header = "dlPrice", i_header = "Region",
j_header = "LPA_Type", t_header = "Date"
)
Selection criteria for the approximate factor model
Description
This function performs model selection for the (2D) approximate factor model and returns the estimated number of factors.
Usage
infocrit(Y, method, r_max = 10)
Arguments
Y |
A |
method |
A character string indicating which criteria to use. |
r_max |
An integer indicating the maximum number of factors allowed. 10 by default. |
Details
"method" can be one of the following: "ICp2" and "BIC3" by Bai and Ng (2002), "ER" by Ahn and Horenstein (2013), "ED" by Onatski (2010).
Value
The estimated number of factors.
References
Bai, J. and Ng, S., 2002. Determining the number of factors in approximate factor models. Econometrica, 70(1), pp.191-221.
Ahn, S.C. and Horenstein, A.R., 2013. Eigenvalue ratio test for the number of factors. Econometrica, 81(3), pp.1203-1227.
Onatski, A., 2010. Determining the number of factors from empirical distribution of eigenvalues. The Review of Economics and Statistics, 92(4), pp.1004-1016.
Examples
# simulate data
T <- 100
N <- 50
r <- 2
F <- matrix(stats::rnorm(T * r, 0, 1), nrow = T)
Lambda <- matrix(stats::rnorm(N * r, 0, 1), nrow = N)
err <- matrix(stats::rnorm(T * N, 0, 1), nrow = T)
Y <- F %*% t(Lambda) + err
# estimation
r_hat <- infocrit(Y, "BIC3", r_max = 10)
Full estimation of the multilevel factor model
Description
This is the main function of this package which performs full estimation of the multilevel factor model.
Usage
multilevel(
data,
ic = "BIC3",
standarise = TRUE,
r_max = 10,
r0 = NULL,
ri = NULL,
depvar_header = NULL,
i_header = NULL,
j_header = NULL,
t_header = NULL
)
Arguments
data |
Either a data.frame or a list of data matrices of length |
ic |
A character string of selection criteria to use for estimation of the numbers of local factors. See Details. |
standarise |
A logical indicating whether the data is standardised before estimation or not. See Details. |
r_max |
An integer indicating the maximum number of factors allowed. See Details. |
r0 |
An integer of the number of global factors. See Details. |
ri |
An array of length |
depvar_header |
A character string specifying the header of the dependent variable. See Details. |
i_header |
A character string specifying the header of the block identifier. See Details. |
j_header |
A character string specifying the header of the individual identifier. See Details. |
t_header |
A character string specifying the header of the time identifier. See Details. |
Details
The user-supplied data.frame should contain at least four columns, namely the
dependent variable (y_{ijt}
), block identifier (i
), individual
identifier (j
), and time (t
). The user needs to supply their corresponding
headers in the data.frame to the function using the parameters "depvar_header",
"i_header", "j_header", and "t_header", respectively. If the data is supplied
as a list, these arguments will not be used.
If either r0 = NULL or ri = NULL, then both of them will be estimated. In such case, "r_max" must be supplied. If "r0" and "ri" are supplied then "r_max" is not needed and will be ignored.
If standarise = TRUE, each time series will be standardised so it has zero mean and unit variance. It is recommended to standardise the data before estimation.
See Lin and Shin (2025) for more details.
Value
The return value is an S3 object of class "multi_result". It contains a list of the following items:
G = A matrix of the estimated global factors.
Gamma = A list of length
R
containing matrices of the estimated global loading matrices for each block.F = A list of length
R
containing matrices of the estimated local factors for each block.Lambda = A list of length
R
containing matrices of the estimated global loading matrices for each block.N = The total number of cross-sections in the panel.
Ni = An array of length
R
containing the number of cross-sections in each block.r0 = The number of global factors. Unchanged if pre-specified.
ri = An array of length
R
containing the number of local factors for each block. Unchanged if pre-specified.d = An array of length
R
containing the maximum total number of factors allowed for each block. The elements are identically equal to r_max if either r0 or ri is supplied as NULL.Resid = A list of length
R
containing the residual matrices for each block.delta2 = An array of the mock and the
r_{\max} + 1
largest squared singular values.ic = Selection criteria used for estimating the numbers of local factors.
block_names = A array of block names.
References
Lin, R. and Shin, Y., 2025. Generalised Canonical Correlation Estimation of the Multilevel Factor Model. Available at SSRN 4783804.
Examples
panel <- UKhouse # load the data
# use data.frame
est_multi <- multilevel(panel, ic = "BIC3", standarise = TRUE, r_max = 5,
depvar_header = "dlPrice", i_header = "Region",
j_header = "LPA_Type", t_header = "Date")
# or one can use a list of data matrices
Y_list <- panel2list(panel, depvar_header = "dlPrice", i_header = "Region",
j_header = "LPA_Type", t_header = "Date")
est_multi <- multilevel(Y_list, ic = "BIC3", standarise = TRUE, r_max = 5)
data.frame to list of data matrices
Description
This function converts the data.frame to a list of data matrices and finds the dimensions of the multilevel panel.
Usage
panel2list(
panel,
depvar_header = NULL,
i_header = NULL,
j_header = NULL,
t_header = NULL
)
Arguments
panel |
The user-supplied data frame for the multilevel panel data. See Details. |
depvar_header |
A character string specifying the header of the dependent variable. See Details. |
i_header |
A character string specifying the header of the block identifier. See Details. |
j_header |
A character string specifying the header of the individual identifier. See Details. |
t_header |
A character string specifying the header of the time identifier. See Details. |
Details
See the details of GCC().
Value
A list containing the data matrices of the R
blocks. Each of them
has dimension T\times N_{i}
.
Examples
panel <- UKhouse # load the data
# panel$Region identifies different blocks i=1,...,R.
# panel$LPA_Type identifies different individuals j=1,...,N_i.
Y_list<- panel2list(panel, depvar_header = "dlPrice", i_header = "Region",
j_header = "LPA_Type", t_header = "Date")
Print the relative importance ratios
Description
Print the relative importance ratios
Usage
## S3 method for class 'multi_result'
summary(object, ...)
Arguments
object |
An S3 object of class 'multi_result' created by multilevel(). |
... |
Additional arguments. |
Value
A matrix containing the summary of the model.
Examples
panel <- UKhouse # load the data
est_multi <- multilevel(panel, ic = "BIC3", standarise = TRUE, r_max = 5,
depvar_header = "dlPrice", i_header = "Region",
j_header = "LPA_Type", t_header = "Date")
summary(est_multi)
Get the variance estimates of the global component
Description
This function generates the variance estimates of the
global component for the j
-th individual in block i
at time t
.
Usage
vcov_global_comp(object, i, j, t)
Arguments
object |
An S3 object of class 'multi_result' created by multilevel(). |
i |
An integer indicating the |
j |
An integer indicating the |
t |
An integer indicating the time. |
Value
The variance of the global component.
Examples
panel <- UKhouse # load the data
est_multi <- multilevel(panel, ic = "BIC3", standarise = TRUE, r_max = 5,
depvar_header = "dlPrice", i_header = "Region",
j_header = "LPA_Type", t_header = "Date")
vcov_global_comp_ijt <- vcov_global_comp(est_multi, i = 1, j = 1, t = 1)
Get the covariance estimates for the global factors
Description
This function generates the covariance estimates for the global factors
at time t
.
Usage
vcov_global_factor(object, t)
Arguments
object |
An S3 object of class 'multi_result' created by [multilevel()]. |
t |
An integer specifying the time |
Value
An r_{0} \times r_{0}
covariance matrix.
Examples
panel <- UKhouse # load the data
est_multi <- multilevel(panel, ic = "BIC3", standarise = TRUE, r_max = 5,
depvar_header = "dlPrice", i_header = "Region",
j_header = "LPA_Type", t_header = "Date")
vcov <- vcov_global_factor(est_multi, t = est_multi$T / 2)
Get the covariance estimates for the global factor loadings
Description
This function generates the covariance estimates
for the global factor loadings for the j
-th individual in block i
.
Usage
vcov_global_loading(object, i, j)
Arguments
object |
An S3 object of class 'multi_result' created by [multilevel()]. |
i |
An integer indicating the |
j |
An integer indicating the |
Value
An r_{0} \times r_{0}
covariance matrix.
Examples
panel <- UKhouse # load the data
est_multi <- multilevel(panel, ic = "BIC3", standarise = TRUE, r_max = 5,
depvar_header = "dlPrice", i_header = "Region",
j_header = "LPA_Type", t_header = "Date")
vcov_gamma_11 <- vcov_global_loading(est_multi, i = 1, j = 1)
Get the variance estimates of the local component
Description
This function generates the variance estimates of the
local component for the j
-th individual in block i
at time t
.
Usage
vcov_local_comp(object, i, j, t)
Arguments
object |
An S3 object of class 'multi_result' created by multilevel(). |
i |
An integer indicating the |
j |
An integer indicating the |
t |
An integer indicating the time. |
Value
The variance of the local component.
Examples
panel <- UKhouse # load the data
est_multi <- multilevel(panel, ic = "BIC3", standarise = TRUE, r_max = 5,
depvar_header = "dlPrice", i_header = "Region",
j_header = "LPA_Type", t_header = "Date")
vcov_local_comp_ijt <- vcov_local_comp(est_multi, i = 1, j = 1, t = 1)
Get the covariance estimates for the local factors
Description
This function generates the covariance estimates
for the local factors in block i
at time t
.
Usage
vcov_local_factor(object, i, t)
Arguments
object |
An S3 object of class 'multi_result' created by multilevel(). |
i |
An integer indicating the |
t |
An integer specifying the time point. |
Value
An r_{i} \times r_{i}
covariance matrix.
Examples
panel <- UKhouse # load the data
est_multi <- multilevel(panel, ic = "BIC3", standarise = TRUE, r_max = 5,
depvar_header = "dlPrice", i_header = "Region",
j_header = "LPA_Type", t_header = "Date")
vcov_local_factor_11 <- vcov_local_factor(est_multi, i = 1, t = 1)
Get the covariance estimates for the local factor loadings
Description
This function generates the covariance estimates
for the local loadings for the j
-th individual in block i
.
Usage
vcov_local_loading(object, i, j)
Arguments
object |
An S3 object of class 'multi_result' created by multilevel(). |
i |
An integer indicating the |
j |
An integer indicating the |
Value
An r_{i} \times r_{i}
covariance matrix.
Examples
panel <- UKhouse # load the data
est_multi <- multilevel(panel, ic = "BIC3", standarise = TRUE, r_max = 5,
depvar_header = "dlPrice", i_header = "Region",
j_header = "LPA_Type", t_header = "Date")
vcov_local_loading_11 <- vcov_local_loading(est_multi, i = 1, j = 1)