Type: | Package |
Title: | Simultaneous Non-Gaussian Component Analysis |
Version: | 0.1.3 |
Date: | 2025-01-27 |
Author: | Liangkang Wang |
Maintainer: | Liangkang Wang <liangkang_wang@brown.edu> |
Description: | Implementation of SING algorithm to extract joint and individual non-Gaussian components from two datasets. SING uses an objective function that maximizes the skewness and kurtosis of latent components with a penalty to enhance the similarity between subject scores. Unlike other existing methods, SING does not use PCA for dimension reduction, but rather uses non-Gaussianity, which can improve feature extraction. Benjamin B.Risk, Irina Gaynanova (2021) <doi:10.1214/21-AOAS1466>. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.2.3 |
LinkingTo: | Rcpp, RcppArmadillo |
Imports: | MASS (≥ 7.3-57), Rcpp (≥ 1.0.8.3), clue (≥ 0.3-61), gam (≥ 1.20.1), ICtest (≥ 0.3-5) |
Suggests: | knitr, covr, testthat (≥ 3.0.0), rmarkdown |
Config/testthat/edition: | 3 |
Depends: | R (≥ 2.10) |
VignetteBuilder: | knitr |
NeedsCompilation: | yes |
Packaged: | 2025-01-27 15:30:01 UTC; lwang151 |
Repository: | CRAN |
Date/Publication: | 2025-01-27 15:50:05 UTC |
Calculate the power of a square matrix
Description
returns a matrix composed of eigenvector x diag(eigenvalue ^ power) x eigenvector'
Usage
S %^% power
Arguments
S |
a square matrix |
power |
the times of power |
Value
a matrix after power calculation that eigenvector x diag(eigenvalue ^ power) x eigenvector'
Examples
a <- matrix(1:9,3,3)
a %^% 2
find the number of non-Gaussian components in the data.
Description
find the number of non-Gaussian components in the data.
Usage
NG_number(data, type = "S3")
Arguments
data |
original matrix with n x p. |
type |
'S1', 'S2' or 'S3' |
Value
the number of non-Gaussian components in the data.
Examples
library(singR)
data("exampledata")
data=exampledata
NG_number(data$dX)
Match the colums of Mx and My
Description
angleMatchICA
match the colums of Mx and My, using the n x p parameterization of the JIN decomposition assumes
Usage
angleMatchICA(Mx, My, Sx = NULL, Sy = NULL)
Arguments
Mx |
Subject score for X matrix of n x n.comp |
My |
Subject score for Y matrix of n x n.comp |
Sx |
Variable loadings for X matrix of n.comp x px |
Sy |
Variable loadings for Y matrix of n.comp x py |
Value
a list of matrixes: ## Mx: ## My: ## matchedangles: ## allangles: ## perm: ## omangles:
Average Mj for Mx and My Here subjects are by rows, columns correspond to components
Description
Average Mj for Mx and My Here subjects are by rows, columns correspond to components
Usage
aveM(mjX, mjY)
Arguments
mjX |
n x rj |
mjY |
n x rj |
Value
a new Mj
Examples
#get simulation data
data(exampledata)
data=exampledata
# To get n.comp value, we can use NG_number function.
# use JB statistic as the measure of nongaussianity to run lngca with df=0
output_JB=singR(dX=exampledata$dX,dY=exampledata$dY,
df=0,rho_extent="small",distribution="JB",individual=TRUE)
est.Mj = aveM(output_JB$est.Mjx,output_JB$est.Mjy)
Calculates the sum of the JB scores across all components, useful for determining rho.
Description
We measure non-Gaussianity using Jarque-Bera (JB) statistic, which is a weighted combination of squared skewness and kurtosis, JB paper. The data has to be standardized and mean 0 and sd to 1.
Usage
calculateJB(S = NULL, U = NULL, X = NULL, alpha = 0.8)
Arguments
S |
the variable loadings r x px. |
U |
U matrix for matched columns rj x n |
X |
whitened data matrix n x px, data = whitenerXA %*% dXcentered |
alpha |
JB weighting of skewness and kurtosis. default = 0.8 |
Value
the sum of JB score across all components.
Returns square root of the precision matrix for whitening
Description
Returns square root of the precision matrix for whitening
Usage
covwhitener(X, n.comp = ncol(X), center.row = FALSE)
Arguments
X |
Matrix |
n.comp |
the number of components |
center.row |
whether to center |
Value
square root of the precision matrix for whitening
create graph dataset with netmat and mmp_order a data.frame called with vectorization of reordered netmat by mmp_order.
Description
create graph dataset with netmat and mmp_order a data.frame called with vectorization of reordered netmat by mmp_order.
Usage
create.graph.long(gmatrix, sort_indices = NULL)
Arguments
gmatrix |
netmat |
sort_indices |
mmp_order |
Value
a data.frame with vectors: ## X1: vector of numerics. ## X2: vector of numerics. ## value: vectorization of reordered netmat by mmp_order.
Curvilinear algorithm with r0 joint components
Description
The curvilinear algorithm is modified from Wen and Yin paper.
Usage
curvilinear(
Ux,
Uy,
xData,
yData,
invLx,
invLy,
rho,
tau = 0.01,
alpha = 0.8,
maxiter = 1000,
tol = 1e-06,
rj
)
Arguments
Ux |
Matrix with n.comp x n, initial value of Ux, comes from greedyMatch. |
Uy |
Matrix with n.comp x n, initial value of Uy, comes from greedyMatch. |
xData |
matrix with n x px, Xw = Lx %*% Xc. |
yData |
matrix with n x py, Yw = Ly %*% Yc. |
invLx |
Inverse matrix of Lx, matrix n x n. |
invLy |
Inverse matrix of Ly, matrix n x n. |
rho |
the weight parameter of matching relative to non-gaussianity. |
tau |
initial step size, default value is 0.01 |
alpha |
controls weighting of skewness and kurtosis. Default value is 0.8, which corresponds to the Jarque-Bera test statistic with 0.8 weighting on squared skewness and 0.2 on squared kurtosis. |
maxiter |
default value is 1000 |
tol |
the threshold of change in Ux and Uy to stop the curvlinear function |
rj |
the joint rank, comes from greedyMatch. |
Value
a list of matrices:
Ux
Optimized Ux with matrix n.comp x n.
Uy
Optimized Uy with matrix n.comp x n.
tau
step size
iter
number of iterations.
error
PMSE(Ux,Uxnew)+PMSE(Uy,Uynew)
obj
Objective Function value
Curvilinear algorithm based on C code with r0 joint components
Description
#' The curvilinear algorithm is modified from Wen and Yin paper.
Usage
curvilinear_c(
Ux,
Uy,
xData,
yData,
invLx,
invLy,
rho,
tau = 0.01,
alpha = 0.8,
maxiter = 1000,
tol = 1e-06,
rj
)
Arguments
Ux |
Matrix with n.comp x n, initial value of Ux, comes from greedyMatch. |
Uy |
Matrix with n.comp x n, initial value of Uy, comes from greedyMatch. |
xData |
matrix with n x px, Xw = Lx %*% Xc. |
yData |
matrix with n x py, Yw = Ly %*% Yc. |
invLx |
Inverse matrix of Lx, matrix n x n. |
invLy |
Inverse matrix of Ly, matrix n x n. |
rho |
the weight parameter of matching relative to non-gaussianity. |
tau |
initial step size, default value is 0.01 |
alpha |
controls weighting of skewness and kurtosis. Default value is 0.8, which corresponds to the Jarque-Bera test statistic with 0.8 weighting on squared skewness and 0.2 on squared kurtosis. |
maxiter |
default value is 1000 |
tol |
the threshold of change in Ux and Uy to stop the curvilinear function |
rj |
the joint rank, comes from greedyMatch. |
Value
a list of matrices:
Ux
Optimized Ux with matrix n.comp x n.
Uy
Optimized Uy with matrix n.comp x n.
tau
step size
iter
number of iterations.
error
PMSE(Ux,Uxnew)+PMSE(Uy,Uynew)
obj
Objective Function value
Estimate mixing matrix from estimates of components
Description
Estimate mixing matrix from estimates of components
Usage
est.M.ols(sData, xData, intercept = TRUE)
Arguments
sData |
S rx x px |
xData |
dX n x px |
intercept |
default = TRUE |
Value
a matrix Mx, dimension n x rx.
Data for simulation example 1
Description
Data for simulation example 1
Usage
exampledata
Format
A data list with 10 subsets:
- dX
original data matrix for X with n x px, 48x3602
- dY
original data matrix for Y with n x py, 48x4950
- mj
true mj matrix, n x rj, 48x2
- sIx
true S matrix of independent non-Gaussian components in X, ri_x x px, 2x3602
- sIy
true S matrix of independent non-Gaussian components in Y, ri_y x py, 2x4950
- sjx
true S matrix of joint non-Gaussian components in X, rj x px, 2x3602
- sjy
true S matrix of joint non-Gaussian components in Y, rj x py, 2x4950
- snr
signal to noise ratio
- R2x
R2 for x data
- R2y
R2 for y data
Generate initialization from specific space
Description
Generate initialization from specific space
Usage
gen.inits(p, d, runs, orth.method = c("svd", "givens"))
Arguments
p |
p*p orthodox matrix |
d |
p*d orthodox matrix |
runs |
the number of orthodox matrix |
orth.method |
orthodox method |
Value
a list of initialization of mixing matrices.
Examples
gen.inits(2,3,3,'svd')
Greedy Match
Description
Greedy Match
matches a column of Mx and My by minimizing chordal distance between vectors,
removes the matched columns and then finds the next pair.
This equivalent to maximizing absolute correlation for data in which each column has mean equal to zero.
Returns permuted columns of Mx and My. This function does not do any scaling or sign flipping.
For this matching to coincide with angle matching, the columns must have zero mean.
Usage
greedymatch(Mx, My, Ux, Uy)
Arguments
Mx |
Subject Score for X with n x n.comp.X matrix |
My |
Subject Score for Y with n x n.comp.Y matrix |
Ux |
Matrix with n.comp x n, Mx = Lx^-1 %*% t Ux, Lx is the whitener matrix of dX. |
Uy |
Matrix with n.comp x n, My = Ly^-1 %*% t Uy, Ly is the whitener matrix of dY. |
Value
a list of matrices:
Mx
Columns of original Mx reordered from highest to lowest correlation with matched component in My
My
Columns of original My reordered from highest to lowest correlation with matched component in Mx
Ux
Permuted rows of original Ux corresponds to MapX
Uy
Permuted rows of original Uy corresponds to MapY
correlations
a vector of correlations for each pair of columns in permuted Mx and M
mapX
the sequence of the columns in original Mx.
mapY
the sequence of the columns in original MY.
Decompose the original data through LNGCA method.
Description
Implements the methods of linear non-Gaussian component analysis (LNGCA) and likelihood component analysis (when using a density, e.g., tilted Gaussian) from the LNGCA paper
Usage
lngca(
xData,
n.comp = NULL,
Ux.list = NULL,
whiten = c("sqrtprec", "eigenvec", "none"),
maxit = 1000,
eps = 1e-06,
verbose = FALSE,
restarts.pbyd = 0,
restarts.dbyd = 0,
distribution = c("JB", "tiltedgaussian", "logistic"),
density = FALSE,
out.all = FALSE,
orth.method = c("svd", "givens"),
df = 0,
stand = FALSE,
...
)
Arguments
xData |
the original dataset for decomposition, matrix of n x px. |
n.comp |
the number of components to be estimated. |
Ux.list |
list of user specified initial values for Ux. If null, will generate random orthogonal matrices. See restarts.pbyd and restarts.dbyd |
whiten |
whitening method. Defaults to "svd" which uses the n left eigenvectors divided by sqrt(px-1) by 'eigenvec'. Optionally uses the square root of the n x n "precision" matrix by 'sqrtprec'. |
maxit |
max iteration, defalut = 1000 |
eps |
default = 1e-06 |
verbose |
default = FALSE |
restarts.pbyd |
default = 0. Generates p x d random orthogonal matrices. Use a large number for large datasets. Note: it is recommended that you run lngca twice with different seeds and compare the results, which should be similar when a sufficient number of restarts is used. In practice, stability with large datasets and a large number of components can be challenging. |
restarts.dbyd |
default = 0. These are d x d initial matrices padded with zeros, which results in initializations from the principal subspace. Can speed up convergence but may miss low variance non-Gaussian components. |
distribution |
distribution methods with default to tilted Gaussian. "logistic" is similar to infomax ICA, JB is capable of capture super and sub Gaussian distribution while being faster than tilted Gaussian. (tilted Gaussian tends to be most accurate, but computationally much slower.) |
density |
return the estimated tilted Gaussian density? default = FALSE |
out.all |
default = FALSE |
orth.method |
default = 'svd'. Method to generate random initial matrices. See [gen.inits()] |
df |
default = 0, df of the spline used in fitting the non-parametric density. use df=8 or so for tilted gaussian. set df=0 for JB and logistic. |
stand |
whether to standardize the data to have row and column means equal to 0 and the row standard deviation equal to 1 (i.e., all variables on same scale). Often used when combined with singR for data integration. |
... |
other arguments to tiltedgaussian estimation |
Value
Function outputs a list including the following:
U
matrix rx x n, part of the expression that Ax = Ux x Lx and Ax x Xc = Sx, where Lx is the whitener matrix.
loglik
the value of log-likelihood in the lngca method.
S
the variable loading matrix r x px, each row is a component, which can be used to measure nongaussianity
df
egree of freedom.
distribution
the method used for data decomposition.
whitener
A symmetric whitening matrix n x n from dX, the same with whitenerXA = est.sigmaXA %^% -0.5
M
Mx Mtrix with n x rx.
nongaussianity
the nongaussianity score for each component saved in S matrix.
Examples
#get simulation data
data(exampledata)
data=exampledata
# To get n.comp value, we can use NG_number function.
# use JB statistic as the measure of nongaussianity to run lngca with df=0
estX_JB = lngca(xData = data$dX, n.comp = 4,
whiten = 'sqrtprec', restarts.pbyd = 20, distribution='JB',df=0)
# use the tiltedgaussian distribution to run lngca with df=8. This takes a long time:
estX_tilt = lngca(xData = data$dX, n.comp = 4,
whiten = 'sqrtprec', restarts.pbyd = 20, distribution='tiltedgaussian',df=8)
# true non-gaussian component of Sx, include individual and joint components
trueSx = rbind(data$sjX,data$siX)
# use pmse to compare the difference of the two methods
pmse(S1 = t(trueSx),S2=t(estX_JB$S),standardize = TRUE)
pmse(S1 = t(trueSx),S2=t(estX_tilt$S),standardize = TRUE)
# the lngca using tiltedgaussian tends to be more accurate
# with smaller pmse value, but takes longer to run.
match ICA
Description
match ICA
Usage
matchICA(S, template, M = NULL)
Arguments
S |
loading variable matrix |
template |
template for match |
M |
subject score matrix |
Value
the match result
Orthogonalization of matrix
Description
Orthogonalization of matrix
Usage
orthogonalize(W)
Arguments
W |
arbitrary matrix |
Value
orthogonalized matrix
Permutation test with Greedymatch
Description
Permutation test with Greedymatch
Usage
permTestJointRank(
MatchedMx,
MatchedMy,
nperm = 1000,
alpha = 0.01,
multicore = 0
)
Arguments
MatchedMx |
matrix with nsubject x n.comp.X, comes from greedymatch |
MatchedMy |
matrix with nsubject2 x n.comp.Y, comes from greedymatch |
nperm |
default value = 1000 |
alpha |
default value = 0.01 |
multicore |
default value = 0 |
Value
a list of matrixes ## rj: joint component rank ## pvalues: pvalue for the components(columns) not matched ## fwer_alpha: quantile of corr permutation with 1- alpha
Permutation test to get joint components ranks
Description
Permutation test to get joint components ranks
Usage
permmatRank_joint(matchedResults, nperms = 100)
Arguments
matchedResults |
results generated by angleMatchICA |
nperms |
the number of permutation |
Value
a list of matrixes ## pvalues: pvalues for the matched colunmns don't have correlation. ## corrperm: correlation value for original Mx with each random permutation of My. ## corrmatched: the correlation for each pair of matched columns.
Permutation invariant mean squared error
Description
Permutation invariant mean squared error
Usage
pmse(M1 = NULL, M2 = NULL, S1 = NULL, S2 = NULL, standardize = FALSE)
Arguments
M1 |
Subject score 1 matrix r x n. |
M2 |
Subject score 2 matrix r x n. |
S1 |
Loading 1 with matrix p x r. |
S2 |
Loading 2 with matrix p x r. |
standardize |
whether to standardize |
Value
permutation invariant mean squared error
Examples
#get simulation data
data(exampledata)
# use JB stat to compute with singR
output_JB=singR(dX=exampledata$dX,dY=exampledata$dY,
df=0,rho_extent="small",distribution="JB",individual=TRUE)
# use pmse to measure difference from the truth
pmse(M1 = t(output_JB$est.Mj),M2 = t(exampledata$mj),standardize = TRUE)
Sign change for S matrix to image
Description
Sign change for S matrix to image
Usage
signchange(S, M = NULL)
Arguments
S |
S, r x px. |
M |
Mx, n x r. |
Value
a list of positive S and positive Mx.
SImultaneous Non-Gaussian Component analysis for data integration.
Description
This function combines all steps from the SING paper
Usage
singR(
dX,
dY,
n.comp.X = NULL,
n.comp.Y = NULL,
df = 0,
rho_extent = c("small", "medium", "large"),
Cplus = TRUE,
tol = 1e-10,
stand = FALSE,
distribution = "JB",
maxiter = 1500,
individual = FALSE,
whiten = c("sqrtprec", "eigenvec", "none"),
restarts.dbyd = 0,
restarts.pbyd = 20
)
Arguments
dX |
original dataset for decomposition, matrix of n x px. |
dY |
original dataset for decomposition, matrix of n x py. |
n.comp.X |
the number of non-Gaussian components in dataset X. If null, will estimate the number using ICtest::FOBIasymp. |
n.comp.Y |
the number of non-Gaussian components in dataset Y. If null, will estimate the number using ICtest::FOBIasymp. |
df |
default value=0 when use JB, if df>0, estimates a density for the loadings using a tilted Gaussian (non-parametric density estimate). |
rho_extent |
Controls similarity of the scores in the two datasets. Numerical value and three options in character are acceptable. small, medium or large is defined from the JB statistic. Try "small" and see if the loadings are equal, then try others if needed. If numeric input, it will multiply the input by JBall to get the rho. |
Cplus |
whether to use C code (faster) in curvilinear search. |
tol |
difference tolerance in curvilinear search. |
stand |
whether to use standardization, if true, it will make the column and row means to 0 and columns sd to 1. If false, it will only make the row means to 0. |
distribution |
"JB" or "tiltedgaussian"; "JB" is much faster. In SING, this refers to the "density" formed from the vector of loadings. "tiltedgaussian" with large df can potentially model more complicated patterns. |
maxiter |
the max iteration number for the curvilinear search. |
individual |
whether to return the individual non-Gaussian components, default value = F. |
whiten |
whitening method used in lngca. Defaults to "svd" which uses the n left eigenvectors divided by sqrt(px-1) by 'eigenvec'. Optionally uses the square root of the n x n "precision" matrix by 'sqrtprec'. |
restarts.dbyd |
default = 0. These are d x d initial matrices padded with zeros, which results in initializations from the principal subspace. Can speed up convergence but may miss low variance non-Gaussian components. |
restarts.pbyd |
default = 20. Generates p x d random orthogonal matrices. Use a large number for large datasets. Note: it is recommended that you run lngca twice with different seeds and compare the results, which should be similar when a sufficient number of restarts is used. In practice, stability with large datasets and a large number of components can be challenging. |
Value
Function outputs a list including the following:
Sjx
variable loadings for joint NG components in dataset X with matrix rj x px.
Sjy
variable loadings for joint NG components in dataset Y with matrix rj x py.
Six
variable loadings for individual NG components in dataset X with matrix riX x px.
Siy
variable loadings for individual NG components in dataset Y with matrix riX x py.
Mix
scores of individual NG components in X with matrix n x riX.
Miy
scores of individual NG components in Y with matrix n x riY.
est.Mjx
Estimated subject scores for joint components in dataset X with matrix n x rj.
est.Mjy
Estimated subject scores for joint components in dataset Y with matrix n x rj.
est.Mj
Average of est.Mjx and est.Mjy as the subject scores for joint components in both datasets with matrix n x rj.
C_plus
whether to use C version of curvilinear search.
rho_extent
the weight of rho in search
df
degree of freedom, = 0 when use JB, >0 when use tiltedgaussian.
Examples
#get simulation data
data(exampledata)
# use JB stat to compute with singR
output_JB=singR(dX=exampledata$dX,dY=exampledata$dY,
df=0,rho_extent="small",distribution="JB",individual=TRUE)
# use tiltedgaussian distribution to compute with singR.
# tiltedgaussian may be more accurate but is considerably slower,
# and is not recommended for large datasets.
output_tilted=singR(dX=exampledata$dX,dY=exampledata$dY,
df=5,rho_extent="small",distribution="tiltedgaussian",individual=TRUE)
# use pmse to measure difference from the truth
pmse(M1 = t(output_JB$est.Mj),M2 = t(exampledata$mj),standardize = TRUE)
pmse(M1 = t(output_tilted$est.Mj),M2 = t(exampledata$mj),standardize = TRUE)
Standardization with double centered and column scaling
Description
Standardization with double centered and column scaling
Usage
standard(data, dif.tol = 0.001, max.iter = 10)
Arguments
data |
input matrix with n x px. |
dif.tol |
the value for the threshold of scaling |
max.iter |
default value = 10 |
Value
standardized matrix with n x px.
Examples
spmwm = 3*matrix(rnorm(100000),nrow=100)+1
dim(spmwm)
apply(spmwm,1,mean) # we want these to be 0
apply(spmwm,2,mean) # we want these to be 0
apply(spmwm,2,sd) # we want each of these variances to be 1
spmwm_cp=standard(spmwm)
max(abs(apply(spmwm_cp,1,mean)))
max(abs(apply(spmwm_cp,2,mean)))
max(abs(apply(spmwm_cp,2,sd)-1))
Convert angle vector into orthodox matrix
Description
Convert angle vector into orthodox matrix
Usage
theta2W(theta)
Arguments
theta |
vector of angles theta |
Value
an orthodox matrix
tiltedgaussian
Description
tiltedgaussian
Usage
tiltedgaussian(xData, df = 8, B = 100, ...)
Arguments
xData |
input data |
df |
degree freedom |
B |
default value=100 |
... |
ellipsis |
Create network matrices from vectorized lower diagonals
vec2net
transfer the matrix vectorized lower diagonals into net to show the component image.
Description
Create network matrices from vectorized lower diagonals
vec2net
transfer the matrix vectorized lower diagonals into net to show the component image.
Usage
vec2net(invector, make.diag = 1)
Arguments
invector |
vectorized lower diagonals. |
make.diag |
default value = 1. |
Value
a net matrx
Whitening Function
Description
Whitening Function
Usage
whitener(X, n.comp = ncol(X), center.row = FALSE)
Arguments
X |
dataset p x n. |
n.comp |
the number of components |
center.row |
whether center the row of data |
Value
a whitener matrix