Type: | Package |
Title: | Covariate-Augumented Generalized Factor Model |
Version: | 1.1 |
Date: | 2024-06-21 |
Author: | Wei Liu [aut, cre], Jiakun Jiang [aut], Dewei Xiang [aut], Xuancheng Zhou [aut] |
Maintainer: | Wei Liu <LiuWeideng@gmail.com> |
Description: | Covariate-augumented generalized factor model is designed to account for cross-modal heterogeneity, capture nonlinear dependencies among the data, incorporate additional information, and provide excellent interpretability while maintaining high computational efficiency. |
BugReports: | https://github.com/feiyoung/CMGFM/issues |
License: | GPL-3 |
Depends: | irlba, R (≥ 3.5.0) |
Imports: | MASS, stats, GFM, Rcpp (≥ 1.0.10) |
Suggests: | knitr, rmarkdown |
LinkingTo: | Rcpp, RcppArmadillo |
VignetteBuilder: | knitr |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.1 |
NeedsCompilation: | yes |
Packaged: | 2024-06-25 04:40:10 UTC; 10297 |
Repository: | CRAN |
Date/Publication: | 2024-06-25 15:00:05 UTC |
Fit the CMGFM model
Description
Fit the covariate-augumented generalized factor model
Usage
CMGFM(
XList,
Z,
types,
numvarmat,
q = 15,
Alist = NULL,
init = c("LFM", "GFM", "random"),
maxIter = 30,
epsELBO = 1e-08,
verbose = TRUE,
add_IC_iter = FALSE,
seed = 1
)
Arguments
XList |
a list consisting of multiple matrices in which each matrix has the same type of values, i.e., continuous, or count, or binomial/binary values. |
Z |
a matrix, the fixed-dimensional covariate matrix with control variables. |
types |
a string vector, specify the variable type in each matrix in |
numvarmat |
a |
q |
an optional string, specify the number of factors; default as 15. |
Alist |
an optional vector, the offset for each unit; default as full-zero vector. |
init |
an optional character, specify the method in initialization. |
maxIter |
the maximum iteration of the VEM algorithm. The default is 30. |
epsELBO |
an optional positive value, tolerance of relative variation rate of the evidence lower bound value, default as '1e-8'. |
verbose |
a logical value, whether output the information in iteration. |
add_IC_iter |
a logical value, add the identifiability condition in iterative algorithm or add it after algorithm converges; default as FALSE. |
seed |
an integer, set the random seed in initialization, default as 1; |
Details
None
Value
return a list including the following components:
-
betaf
- the estimated regression coefficient vector for each modality; -
Bf
- the estimated loading matrix for each modality; -
M
- the estimated modality-shared factor matrix; -
Xif
- the estimated modality-specified factor vector; -
S
- the estimated covariance matrix of modality-shared latent factors; -
Om
- the posterior variance of modality-specified latent factors; -
muf
- the estimated intercept vector for each modality; -
Sigmam
- the variance of modality-specified factors; -
invLambdaf
- the inverse of the estimated variances of error for each modality. -
ELBO
- the ELBO value when algorithm stops; -
ELBO_seq
- the sequence of ELBO values. -
time_use
- the running time in model fitting;
References
None
See Also
None
Examples
pveclist <- list('gaussian'=c(50, 150),'poisson'=c(50, 150),
'binomial'=c(100,60))
q <- 6
sigmavec <- rep(1,3)
pvec <- unlist(pveclist)
datlist <- gendata_cmgfm(pveclist = pveclist, seed = 1, n = 300,d = 3,
q = q, rho = rep(1,length(pveclist)), rho_z=0.2,
sigmavec=sigmavec, sigma_eps=1)
XList <- datlist$XList
Z <- datlist$Z
numvarmat <- datlist$numvarmat
types <- datlist$types
rlist <- CMGFM(XList, Z, types=types, numvarmat, q=q)
str(rlist)
Select the number of factors
Description
Select the number of factors using maximum singular value ratio based method
Usage
MSVR(
XList,
Z,
types,
numvarmat,
Alist = NULL,
q_max = 20,
threshold = 1e-05,
...
)
Arguments
XList |
a list consisting of multiple matrices in which each matrix has the same type of values, i.e., continuous, or count, or binomial/binary values. |
Z |
a matrix, the fixed-dimensional covariate matrix with control variables. |
types |
a string vector, specify the variable type in each matrix in |
numvarmat |
a |
Alist |
an optional vector, the offset for each unit; default as full-zero vector. |
q_max |
an optional string, specify the maximum number of factors; default as 20. |
threshold |
an optional positive value, a cutoff to filter the singular values that are smaller than it. |
... |
other arguments passed to CMGFM |
Details
None
Value
return the estimated number of factors.
References
None
See Also
None
Examples
pveclist <- list('gaussian'=c(50, 150),'poisson'=c(50, 150),
'binomial'=c(100,60))
q <- 6
sigmavec <- rep(1,3)
pvec <- unlist(pveclist)
datlist <- gendata_cmgfm(pveclist = pveclist, seed = 1, n = 300,d = 3,
q = q, rho = rep(1,length(pveclist)), rho_z=0.2,
sigmavec=sigmavec, sigma_eps=1)
XList <- datlist$XList
Z <- datlist$Z
numvarmat <- datlist$numvarmat
types <- datlist$types
hq <- MSVR(XList, Z, types=types, numvarmat, q_max=20)
print(c(q_true=q, q_est=hq))
Generate simulated data
Description
Generate simulated data from covariate-augumented generalized factor model
Usage
gendata_cmgfm(
seed = 1,
n = 300,
pveclist = list(gaussian = c(50, 150), poisson = c(50), binomial = c(100, 60)),
q = 6,
d = 3,
rho = rep(1, length(pveclist)),
rho_z = 1,
sigmavec = rep(0.5, length(pveclist)),
n_bin = 1,
sigma_eps = 1,
seed.para = 1
)
Arguments
seed |
a positive integer, the random seed for reproducibility of data generation process. |
n |
a positive integer, specify the sample size. |
pveclist |
a named list, specify the number of modalities for each variable type and dimension of variables in each modality. |
q |
a positive integer, specify the number of modality-shared factors. |
d |
a positive integer, specify the dimension of covariate matrix. |
rho |
a numeric vector with length |
rho_z |
a positive real, specify the signal strength of covariates. |
sigmavec |
a positive vector with length |
n_bin |
a positive integer, specify the number of trails in Binomial distribution. |
sigma_eps |
a positive real, the variance of overdispersion error. |
seed.para |
a positive integer, the random seed for reproducibility of data generation process by fixing the regression coefficient vector and loading matrices. |
Details
None
Value
return a list including the following components:
-
XList
- a list consisting of multiple matrices in which each matrix has the same type of values, i.e., continuous, or count, or binomial/binary values. -
Z
- a matrix, the fixed-dimensional covariate matrix with control variables; -
Alist
- the the offset vector for each modality; -
B0list
- the true loading matrix for each modality; -
mu0
- the true intercept vector for each modality; -
U0
- the modality-specified factor vector; -
F0
- the modality-shared factor matrix; -
Uplist
- the true intercept-loading matrix for each modality; -
beta
- the true regression coefficient vector for each modality; -
sigma_eps
- the standard deviation of error term; -
numvarmat
- a length(types)-by-d matrix, the number of variables in modalities that belong to the same type.
References
None
See Also
Examples
n <- 300;
pveclist = list('gaussian'=c(50, 150),'poisson'=c(50),'binomial'=c(100,60))
d <- 20; q <- 6;
datlist <- gendata_cmgfm(n=n, pveclist=pveclist, q=q, d=d)
str(datlist)