Type: | Package |
Title: | Latent (Variable) Analysis with Bayesian Learning |
Version: | 1.5.0 |
Date: | 2022-05-13 |
Author: | Jinsong Chen [aut, cre, cph] |
Maintainer: | Jinsong Chen <jinsong.chen@live.com> |
Description: | A variety of models to analyze latent variables based on Bayesian learning: the partially CFA (Chen, Guo, Zhang, & Pan, 2020) <doi:10.1037/met0000293>; generalized PCFA; partially confirmatory IRM (Chen, 2020) <doi:10.1007/s11336-020-09724-3>; Bayesian regularized EFA <doi:10.1080/10705511.2020.1854763>; Fully and partially EFA. |
License: | GPL-3 |
Encoding: | UTF-8 |
LazyData: | true |
Depends: | R (≥ 3.6.0) |
Imports: | stats, MASS, coda |
RoxygenNote: | 7.2.0 |
URL: | https://github.com/Jinsong-Chen/LAWBL, https://jinsong-chen.github.io/LAWBL/ |
BugReports: | https://github.com/Jinsong-Chen/LAWBL/issues |
Suggests: | knitr, rmarkdown, testthat |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2022-05-15 12:04:50 UTC; HKU |
Repository: | CRAN |
Date/Publication: | 2022-05-16 07:00:05 UTC |
LAWBL: Latent (Variable) Analysis with Bayesian Learning
Description
This package is to provide a variety of models to analyze latent variables based on Bayesian learning.
Details
LAWBL represents a partially confirmatory / exploratory approach to model latent variables based on Bayesian learning. Built on the power of statistical learning, it can address psychometric challenges such as parameter specification, local dependence, and factor extraction. Built on the scalability and flexibility of Bayesian inference and resampling techniques, it can accommodate modeling frameworks such as factor analysis, item response theory, cognitive diagnosis modeling and causal or explanatory modeling. The package can also handle different response formats or a mix of them, with or without missingness. The variety of models provide a partial approach covering a wide range of the exploratory-confirmatory continuum under the context of latent variable modeling.
Towards the confirmatory end, this package includes the Partially Confirmatory Factor Analysis (PCFA) model for continuous data (Chen, Guo, Zhang, & Pan, 2020), the generalized PCFA (GPCFA) model covering continuous, categorical, and mixed-type data, and the partially confirmatory item response model (PCIRM) for continuous and dichotomous data with intercept terms (Chen, 2020). For PCFA, GPCFA, and PCIRM, there are two major model variants with different constraints for identification. One assumes local independence (LI) with a more exploratory tendency, which can be also called the E-step. The other allows local dependence (LD) with a more confirmatory tendency, which can be also called the C-step.
Towards the exploratory end, the Bayesian regularized EFA (BREFA) with factor extraction and parameter estimation in one step (Chen 2021) is offered. It's further improved as the Fully and partially EFA with better performance and partial knowledge.
Parameters are obtained by sampling from the posterior distributions with the Markov chain Monte Carlo (MCMC) techniques. Different Bayesian learning methods are used to regularize the loading pattern, local dependence, and/or factor identification.
Note
This package is under development. You are very welcome to send me any comments or suggestions for improvements, and to share with me any problems you may encounter with the use of this package.
Author(s)
Jinsong Chen, jinsong.chen@live.com
References
Chen, J. (2020). A partially confirmatory approach to the multidimensional item response theory with the Bayesian Lasso. Psychometrika. 85(3), 738-774. DOI:10.1007/s11336-020-09724-3.
Chen, J., Guo, Z., Zhang, L., & Pan, J. (2021). A partially confirmatory approach to scale development with the Bayesian Lasso. Psychological Methods. 26(2), 210–235. DOI: 10.1037/met0000293.
Chen, J. (2021). A generalized partially confirmatory factor analysis framework with mixed Bayesian Lasso methods. Multivariate Behavioral Research. DOI: 10.1080/00273171.2021.1925520.
Chen, J. (2021). A Bayesian regularized approach to exploratory factor analysis in one step. Structural Equation Modeling: A Multidisciplinary Journal. DOI: 10.1080/10705511.2020.1854763.
Chen, J. (2022). Partially confirmatory approach to factor analysis with Bayesian learning: A LAWBL tutorial. Structural Equation Modeling: A Multidisciplinary Journal. DOI: 10.1080/00273171.2021.1925520.
Chen, J. (In Press). Fully and partially exploratory factor analysis with bi-level Bayesian regularization. Behavior Research Methods.
National Longitudinal Survey of Youth 1997
Description
A data set consisted of 3,458 individual responses to 27 mixed-type items, with a 1.12 percentage of missing data
Usage
nlsy27
Format
A list with components:
dat
The response data
Q
Intial design matrix with three factors and two to three specified loadings per factor
cati
Indices of categorical (polytomous) items
(Generalized) Partially Confirmatory Factor Analysis
Description
PCFA
is a partially confirmatory approach covering a wide range of
the exploratory-confirmatory continuum in factor analytic models (Chen, Guo, Zhang, & Pan, 2021).
The PCFA is only for continuous data, while the generalized PCFA (GPCFA; Chen, 2021)
covers both continuous and categorical data.
There are two major model variants with different constraints for identification. One assumes local
independence (LI) with a more exploratory tendency, which can be also called the E-step.
The other allows local dependence (LD) with a more confirmatory tendency, which can be also
called the C-step. Parameters are obtained by sampling from the posterior distributions with
the Markov chain Monte Carlo (MCMC) techniques. Different Bayesian Lasso methods are used to
regularize the loading pattern and LD. The estimation results can be summarized with summary.lawbl
and the factorial eigenvalue can be plotted with plot_lawbl
.
Usage
pcfa(
dat,
Q,
LD = TRUE,
cati = NULL,
cand_thd = 0.2,
PPMC = FALSE,
burn = 5000,
iter = 5000,
update = 1000,
missing = NA,
rfit = TRUE,
sign_check = FALSE,
sign_eps = -0.5,
rs = FALSE,
auto_stop = FALSE,
max_conv = 10,
rseed = 12345,
digits = 4,
alas = FALSE,
verbose = FALSE
)
Arguments
dat |
A |
Q |
A |
LD |
logical; |
cati |
The set of categorical (polytomous) items in sequence number (i.e., 1 to |
cand_thd |
Candidate parameter for sampling the thresholds with the MH algorithm. |
PPMC |
logical; |
burn |
Number of burn-in iterations before posterior sampling. |
iter |
Number of formal iterations for posterior sampling (> 0). |
update |
Number of iterations to update the sampling information. |
missing |
Value for missing data (default is |
rfit |
logical; |
sign_check |
logical; |
sign_eps |
minimum value for switch sign of loading vector (if |
rs |
logical; |
auto_stop |
logical; |
max_conv |
maximum consecutive number of convergence for auto stop. |
rseed |
An integer for the random seed. |
digits |
Number of significant digits to print when printing numeric values. |
alas |
logical; for adaptive Lasso or not. The default is |
verbose |
logical; to display the sampling information every
|
Value
pcfa
returns an object of class lawbl
without item intercepts. It contains a lot of information about
the posteriors that can be summarized using summary.lawbl
.
References
Chen, J., Guo, Z., Zhang, L., & Pan, J. (2021). A partially confirmatory approach to scale development with the Bayesian Lasso. Psychological Methods. 26(2), 210–235. DOI: 10.1037/met0000293.
Chen, J. (2021). A generalized partially confirmatory factor analysis framework with mixed Bayesian Lasso methods. Multivariate Behavioral Research. DOI: 10.1080/00273171.2021.1925520.
Examples
#####################################################
# Example 1: Estimation with continuous data & LD #
#####################################################
dat <- sim18cfa1$dat
J <- ncol(dat)
K <- 3
Q<-matrix(-1,J,K);
Q[1:6,1]<-Q[7:12,2]<-Q[13:18,3]<-1
m0 <- pcfa(dat = dat, Q = Q, LD = TRUE,burn = 2000, iter = 2000)
summary(m0) # summarize basic information
summary(m0, what = 'qlambda') #summarize significant loadings in pattern/Q-matrix format
summary(m0, what = 'offpsx') #summarize significant LD terms
######################################################
# Example 2: Estimation with categorical data & LI #
######################################################
dat <- sim18ccfa40$dat
J <- ncol(dat)
K <- 3
Q<-matrix(-1,J,K);
Q[1:2,1]<-Q[7:8,2]<-Q[13:14,3]<-1
m1 <- pcfa(dat = dat, Q = Q,LD = FALSE,cati=-1,burn = 2000, iter = 2000)
summary(m1) # summarize basic information
summary(m1, what = 'qlambda') #summarize significant loadings in pattern/Q-matrix format
summary(m1, what = 'offpsx') #summarize significant LD terms
summary(m1,what='thd') #thresholds for categorical items
Partially Confirmatory Item Response Model
Description
pcirm
is a partially confirmatory approach to item response models (Chen, 2020),
which estimates the intercept for continuous and dichotomous data. Similar to PCFA and GPCFA,
there are two major model variants with different constraints for identification. One assumes local
independence (LI) with a more exploratory tendency, which can be also called the E-step.
The other allows local dependence (LD) with a more confirmatory tendency, which can be also
called the C-step. Parameters are obtained by sampling from the posterior distributions with
the Markov chain Monte Carlo (MCMC) techniques. Different Bayesian Lasso methods are used to
regularize the loading pattern and LD. The estimation results can be summarized with summary.lawbl
and the factorial eigenvalue can be plotted with plot_lawbl
.
Usage
pcirm(
dat,
Q,
LD = TRUE,
cati = NULL,
PPMC = FALSE,
burn = 5000,
iter = 5000,
update = 1000,
missing = NA,
rseed = 12345,
sign_check = FALSE,
sign_eps = -0.5,
auto_stop = FALSE,
max_conv = 10,
digits = 4,
alas = FALSE,
verbose = FALSE
)
Arguments
dat |
A |
Q |
A |
LD |
logical; |
cati |
The set of dichotomous items in sequence number (i.e., 1 to |
PPMC |
logical; |
burn |
Number of burn-in iterations before posterior sampling. |
iter |
Number of formal iterations for posterior sampling (> 0). |
update |
Number of iterations to update the sampling information. |
missing |
Value for missing data (default is |
rseed |
An integer for the random seed. |
sign_check |
logical; |
sign_eps |
minimum value for switch sign of loading vector (if |
auto_stop |
logical; |
max_conv |
maximum consecutive number of convergence for auto stop. |
digits |
Number of significant digits to print when printing numeric values. |
alas |
logical; for adaptive Lasso or not. The default is |
verbose |
logical; to display the sampling information every
|
Value
pcirm
returns an object of class lawbl
with item intercepts. It contains a lot of information about
the posteriors that can be summarized using summary.lawbl
.
References
Chen, J. (2020). A partially confirmatory approach to the multidimensional item response theory with the Bayesian Lasso. Psychometrika. 85(3), 738-774. DOI:10.1007/s11336-020-09724-3.
Examples
####################################
# Example 1: Estimation with LD #
####################################
dat <- sim24ccfa21$dat
J <- ncol(dat)
K <- 3
Q<-matrix(-1,J,K);
Q[1:8,1]<-Q[9:16,2]<-Q[17:24,3]<-1
m0 <- pcirm(dat = dat, Q = Q, LD = TRUE, cati = -1, burn = 2000,iter = 2000)
summary(m0) # summarize basic information
summary(m0, what = 'qlambda') #summarize significant loadings in pattern/Q-matrix format
summary(m0, what = 'offpsx') #summarize significant LD terms
####################################
# Example 2: Estimation with LD #
####################################
Q<-cbind(Q,-1);
Q[15:16,4]<-1
m1 <- pcirm(dat = dat, Q = Q, LD = FALSE, cati = -1, burn = 2000,iter = 2000)
summary(m1) # summarize basic information
summary(m1, what = 'qlambda') #summarize significant loadings in pattern/Q-matrix format
summary(m1, what = 'offpsx') #summarize significant LD terms
Partially Exploratory Factor Analysis
Description
PEFA
is a partially exploratory approach to factor analysis, which can incorporate
partial knowledge together with unknown number of factors, using bi-level Bayesian regularization.
When partial knowledge is not needed, it reduces to the fully exploratory factor analysis (FEFA
; Chen, 2021).
A large number of factors can be imposed for selection where true factors will be identified against spurious factors.
The loading vector is reparameterized to tackle model sparsity at the factor and loading levels
with the multivariate spike and slab priors. Parameters are obtained by sampling from the posterior
distributions with the Markov chain Monte Carlo (MCMC) techniques. The estimation results can be summarized
with summary.lawbl
and the trace or density of the posterior can be plotted with plot_lawbl
.
Usage
pefa(
dat,
Q = NULL,
K = 8,
mjf = 3,
PPMC = FALSE,
burn = 5000,
iter = 5000,
missing = NA,
eig_eps = 1,
sign_eps = 0,
rfit = TRUE,
rs = FALSE,
update = 1000,
rseed = 12345,
verbose = FALSE,
auto_stop = FALSE,
max_conv = 10,
digits = 4
)
Arguments
dat |
A |
Q |
A |
K |
Maximum number of factors for selection under |
mjf |
Minimum number of items per factor. |
PPMC |
logical; |
burn |
Number of burn-in iterations before posterior sampling. |
iter |
Number of formal iterations for posterior sampling (> 0). |
missing |
Value for missing data (default is |
eig_eps |
minimum eigenvalue for factor extraction. |
sign_eps |
minimum value for switch sign of loading vector. |
rfit |
logical; |
rs |
logical; |
update |
Number of iterations to update the sampling information. |
rseed |
An integer for the random seed. |
verbose |
logical; to display the sampling information every
|
auto_stop |
logical; |
max_conv |
maximum consecutive number of convergence for auto stop. |
digits |
Number of significant digits to print when printing numeric values. |
Value
pcfa
returns an object of class lawbl
without item intercepts. It contains a lot of information about
the posteriors that can be summarized using summary.lawbl
.
References
Chen, J. (2021). A Bayesian regularized approach to exploratory factor analysis in one step. Structural Equation Modeling: A Multidisciplinary Journal, 28(4), 518-528. DOI: 10.1080/10705511.2020.1854763.
Chen, J. (In Press). Fully and partially exploratory factor analysis with bi-level Bayesian regularization. Behavior Research Methods.
Examples
#####################################################
# Example 1: Fully EFA #
#####################################################
dat <- sim18cfa0$dat
m0 <- pefa(dat = dat, K=5, burn = 2000, iter = 2000,verbose = TRUE)
summary(m0) # summarize basic information
summary(m0, what = 'qlambda') #summarize significant loadings in pattern/Q-matrix format
summary(m0, what = 'phi') #summarize factorial correlations
summary(m0, what = 'eigen') #summarize factorial eigenvalue
##########################################################
# Example 2: PEFA with two factors partially specified #
##########################################################
J <- ncol(dat)
K <- 5
Q<-matrix(-1,J,K);
Q[1:2,1]<-Q[7:8,2]<-1
Q
m1 <- pefa(dat = dat, Q = Q,burn = 2000, iter = 2000,verbose = TRUE)
summary(m1)
summary(m1, what = 'qlambda')
summary(m1, what = 'phi')
summary(m1,what='eigen')
Posterior plots for lawbl
object
Description
Provide posterior plots based on the factorial eigenvalues of a lawbl
object.
For PEFA
or FEFA
, only true factors will be plotted.
Usage
plot_lawbl(object, what = "trace", istart = 1, iend = -1)
Arguments
object |
A |
what |
A list of options for what to plot.
|
istart |
Starting point of the Markov chain for plotting. |
iend |
Ending point of the Markov chain for plotting; -1 for the actual final point. |
Examples
dat <- sim18cfa0$dat
J <- ncol(dat)
K <- 3
Q<-matrix(-1,J,K);
Q[1:2,1]<-Q[7:8,2]<-Q[13:14,3]<-1
m0 <- pcfa(dat = dat, Q = Q, LD = FALSE,burn = 1000, iter = 1000)
plot_lawbl(m0) # trace
plot_lawbl(m0, what='density')
plot_lawbl(m0, what='EPSR')
Simulated CCFA data with LI and missingness
Description
Categorical CFA data simulated based on 18 items, 3 factors, and 4 categories
with local independence and 10 percent missingness at random; factorial correlation \Phi=.3
.
Usage
sim18ccfa40
Format
A list with components:
dat
A dataset with simulated responses of 1000 individuals to 18 items
qlam
Loading pattern and values used to simulated the data
Simulated CCFA data with LD and missingness
Description
Categorical CFA data simulated based on 18 items, 3 factors, and 4 categories
with local dependence and 10 percent missingness at random; factorial correlation \Phi=.3
.
Usage
sim18ccfa41
Format
A list with components:
dat
A dataset with simulated responses of 1000 individuals to 18 items
qlam
Loading pattern and values used to simulated the data
LD
Local dependence between items (LD effect = .3)
Simulated CFA data with LI
Description
CFA data simulated based on 18 items, 3 factors and local independence;
factorial correlation \Phi=.3
.
Usage
sim18cfa0
Format
A list with components:
dat
A dataset with simulated responses of 1000 individuals to 18 items
qlam
Loading pattern and values used to simulated the data
Simulated CFA data with LD
Description
CFA data simulated based on 18 items, 3 factors and local dependence; factorial correlation \Phi=.3
.
Usage
sim18cfa1
Format
A list with components:
dat
A dataset with simulated responses of 1000 individuals to 18 items
qlam
Loading pattern and values used to simulated the data
LD
Local dependence between items (LD effect = .3)
Simulated MCFA data with LD and Missingness
Description
CFA data mixed with continuous and categorical responses simulated based on 3 factors,
6 4-category items, 12 continuous items, local dependence, and 10 percent missigness at random;
factorial correlation \Phi=.3
.
Usage
sim18mcfa41
Format
A list with components:
dat
A dataset with simulated responses of 1000 individuals to 18 items
qlam
Loading pattern and values used to simulated the data
LD
Local dependence between items (LD effect = .3)
Simulated CCFA data (dichotomous) with LD and a minor factor/trait
Description
Categorical CFA data simulated based on 24 items, 4 factors, 2 categories
and local dependence; factorial correlation \Phi=.3
.
The last factor/trait is minor (measured by cross-loadings only).
Usage
sim24ccfa21
Format
A list with components:
dat
A dataset with simulated responses of 1000 individuals to 24 items
qlam
Loading pattern and values used to simulated the data
LD
Local dependence between items (LD effect = .3)
Simulating data with Latent Variable Modeling
Description
sim_lvm
can simulate data based on factor analysis or
item response models with different response formats (continuous or categorical),
loading patterns and residual covariance (local dependence) structures.
Usage
sim_lvm(
N = 1000,
mla = NULL,
K = 3,
J = 18,
cpf = 0,
lam = 0.7,
lac = 0.3,
phi = 0.3,
ph12 = -1,
ecr = 0,
P = 0,
b = 0.3,
K1 = 0,
ph1 = 0.2,
b1 = 0.3,
ilvl = NULL,
cati = NULL,
noc = c(4),
misp = 0,
ome_out = FALSE,
necw = K,
necb = K,
add_ind = c(),
add_la = 0.5,
add_phi = 0,
zero_it = 0,
rseed = 333,
digits = 4
)
Arguments
N |
Sample size. |
mla |
Population loading matrix. |
K |
Number of factors (if |
J |
Number of items (if |
cpf |
Number of cross-loadings per factor (if |
lam |
Number of formal iterations for posterior sampling. |
lac |
Number of iterations to update the sampling information. |
phi |
Homogeneous correlations between any two factors. |
ph12 |
Correlation between factor 1 and 2 (if it's different from |
ecr |
Residual correlation (local dependence). |
P |
Number of observable predictors (for MIMIC model). |
b |
Coefficients of observable predictors (for MIMIC model). |
K1 |
Number of latent predictors (for MIMIC model). |
ph1 |
Correlations between latent predictors (for MIMIC model). |
b1 |
Coefficients of latent predictors (for MIMIC model). |
ilvl |
Specified levels of all items (i.e., need to specify Item 1 to |
cati |
The set of polytomous items in sequence number (i.e., can be any number set
in between 1 and |
noc |
Number of levels for polytomous items. |
misp |
Proportion of missingness. |
ome_out |
Output factor score or not. |
necw |
Number of within-factor local dependence. |
necb |
Number of between-factor local dependence. |
add_ind |
(Additional) minor factor with cross-loadings. |
add_la |
Value of cross-loadings on (Additional) minor factor. |
add_phi |
Correlations between (Additional) minor factor and other factors. |
zero_it |
Surplus items with zero loading. |
rseed |
An integer for the random seed. |
digits |
Number of significant digits to print when printing numeric values. |
Value
An object of class list
containing the data, loading, and factorial correlation matrix.
Examples
# for continuous data with cross-loadings and local dependence effect .3
out <- sim_lvm(N=1000,K=3,J=18,lam = .7, lac=.3,ecr=.3)
summary(out$dat)
out$MLA
out$ofd_ind
# for categorical data with cross-loadings .4 and 10% missingness
out <- sim_lvm(N=1000,K=3,J=18,lam = .7, lac=.4,cati=-1,noc=4,misp=.1)
summary(out$dat)
out$MLA
out$ofd_ind
Summary method for lawbl
objects
Description
Provide summaries of posterior information for a lawbl
object, .
Usage
## S3 method for class 'lawbl'
summary(
object,
what = "basic",
med = FALSE,
SL = 0.05,
detail = FALSE,
digits = 4,
istart = 1,
iend = -1,
...
)
Arguments
object |
A |
what |
A list of options for what to summarize.
|
med |
logical; if the posterior median ( |
SL |
Significance level for interval estimate. The default is .05. |
detail |
logical; if only significant ( |
digits |
Number of significant digits to print when printing numeric values. |
istart |
Starting point of the Markov chain for summary. |
iend |
Ending point of the Markov chain for summary; -1 for the actual final point. |
... |
additional arguments |
Value
A list or matrix containing the summarized information based on the option what
.
Examples
dat <- sim18cfa0$dat
J <- ncol(dat)
K <- 3
Q<-matrix(-1,J,K);
Q[1:2,1]<-Q[7:8,2]<-Q[13:14,3]<-1
m0 <- pcfa(dat = dat, Q = Q, LD = FALSE,burn = 1000, iter = 1000)
summary(m0) # summarize basic information
summary(m0, what = 'lambda') #summarize significant loadings
summary(m0, what = 'qlambda') #summarize significant loadings in pattern/Q-matrix format
summary(m0, what = 'offpsx') #summarize significant LD terms