Title: | Adaptive Generalized PCA |
Version: | 0.1.3 |
Description: | Implements adaptive gPCA, as described in: Fukuyama, J. (2017) <doi:10.48550/arXiv.1702.00501>. The package also includes functionality for applying the method to 'phyloseq' objects so that the method can be easily applied to microbiome data and a 'shiny' app for interactive visualization. |
Depends: | R (≥ 3.1.0) |
License: | AGPL-3 |
LazyData: | true |
VignetteBuilder: | knitr |
Suggests: | knitr, rmarkdown |
Imports: | ape (≥ 3.1.4), ggplot2 (≥ 1.0.0), shiny (≥ 1.0.0), phyloseq (≥ 1.14.0) |
RoxygenNote: | 6.0.1 |
NeedsCompilation: | no |
Packaged: | 2022-12-07 20:37:35 UTC; jfukuyam |
Author: | Julia Fukuyama [aut, cre] |
Maintainer: | Julia Fukuyama <julia.fukuyama@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2022-12-08 11:42:31 UTC |
adaptiveGPCA: A package for structured dimensionality reduction
Description
This package implements the methods for structured dimensionality reduction described in Fukuyama, J. (2017). The general idea is to obtain a low-dimensional representation of the data, similar to that given by PCA, which incorporates side information about the relationships between the variables. The output is similar to a PCA biplot, but the variable loadings are regularized so that similar variables are encouraged to have similar loadings on the principal axes.
Details
There are two main ways of using this package. The function
adaptivegpca
will choose how much to regularize the
variables according to the similarities between them, while the
function gpcaFullFamily
produces analogous output for
a range of regularization parameters. With this function, the
results for the different regularization parameters are inspected
with the visualizeFullFamily
function, and the
desired parameter is chosen manually.
The package also contains functionality to integrate with phyloseq:
the function processPhyloseq
takes a
phyloseq
object and creates the inputs
necessary to perform adaptive gPCA on a microbiome dataset
including information about the phylogenetic relationships between
the bacteria.
Antibiotic time course experiment.
Description
A phyloseq object describing a time course experiment in which three people two courses of cipro and had their gut microbiomes sampled. See Dethlefsen and Relman, PNAS (2010), at https://www.ncbi.nlm.nih.gov/pubmed/20847294 for more details.
Format
A phyloseq object.
A subset of the antibiotic data
Description
This is a smaller version of the AntibioticPhyloseq
dataset,
for use in the examples so that the running time isn't so long. It
has the same samples and a randomly selected set of 200 of the
taxa. It is stored as a list with three components: the normalized
OTU abundances (X
), the similarity matrix for the taxa
(Q
), and the diagonal weight matrix (D
, the identity
matrix).
Format
A list with three components.
Adaptive gPCA
Description
Performs adaptive generalized PCA, a dimensionality-reduction method which takes into account similarities between the variables. See Fukuyama, J. (2017) for more details.
Usage
adaptivegpca(X, Q, k = 2, weights = rep(1, nrow(X)))
Arguments
X |
A |
Q |
A |
k |
The number of components to return. |
weights |
A vector of length |
Value
A list containing the row/sample scores (U
), the
variable loadings (QV
), the proportion of variance explained
by each of the principal components (vars
), the value of
r
that was used (r
).
Examples
data(AntibioticSmall)
out.agpca = adaptivegpca(AntibioticSmall$X, AntibioticSmall$Q, k = 2)
Check that the axes specified are valid
Description
Check that the axes specified are valid
Usage
check_axes(axes, x)
Arguments
axes |
A set of user-specified axes. |
x |
Object of class |
Check compatibility of agpca and phyloseq objects
Description
Check that the dimensions of the agpca object match the phyloseq object and that the phyloseq object has a taxonomy table and a phylogenetic tree.
Usage
check_phyloseq(agpcafit, physeq)
Arguments
agpcafit |
An adaptivegpca object. |
physeq |
A phyloseq object. |
Estimate parameters in hierarchical model
Description
Estimates the values of r
and \sigma
in a model X \sim N(0, \sigma^2
(r Q + (1 - r)I))
.
Usage
estimateComponents(X, Q, Qeig = NULL)
Arguments
X |
An |
Q |
A |
Qeig |
If the eigendecomposition of |
Value
A list with r
and \sigma
.
Examples
data(AntibioticSmall)
estimateComponents(AntibioticSmall$X, AntibioticSmall$Q)
Estimate variance components
Description
Estimate variance components in a two-parameter model where X \sim
N(0, \sigma^2 (r_1 Q + (1 - r_1) (r_2 Q^(-1) + (1 - r_2) I)))
Usage
estimateComponents2(X, Q)
Arguments
X |
An n x p data matrix. |
Q |
A p x p psd matrix giving the similarity between the variables. |
Value
A vector with r1 and r2
Find reflection
Description
Find a reflection of one data frame so that it most closely matches another.
Usage
findReflection(df1, df2)
Arguments
df1 |
The base data frame. |
df2 |
The data frame that will be reflected across either the x-axis or the y-axis (or both or neither) so that the points in it most closely match df1. |
Value
A vector of length ncol(df1)
: Multiplying the first
column of df2 by the first element and multiplying the second
column of df2 by the second element and so on gives the optimal
reflection.
gPCA
Description
Performs standard gPCA with k
components on a data matrix X
with
row inner product Q
and weights D
.
Usage
gpca(X, Q, D = rep(1, nrow(X)), k)
Arguments
X |
A data matrix of size |
Q |
An inner product matrix for the rows, either as a |
D |
Sample weights, a vector of length |
k |
The number of components to return. |
Value
A list with variable loadings on the principal axes
(QV
), sample/row scores (U
), the fraction of the
variance explained by each of the axes (vars
).
Examples
data(AntibioticSmall)
out.gpca = gpca(AntibioticSmall$X, AntibioticSmall$Q, k = 2)
gPCA using pre-computed eidendecomposition
Description
Performs gPCA with pre-computed eigenvectors and eigenvalues.
Usage
gpcaEvecs(X, evecs, evals, D = rep(1, nrow(X)), k)
Arguments
X |
Data matrix. |
evecs |
Eigenvectors of |
evals |
Eigenvalues of |
D |
Sample weights |
k |
The number of components to return. |
Make a sequence of ordinations
Description
Creates a sequence of gPCA data representations. One end of the
sequence (r = 0
) doesn't do any regularization according to
the variable structure (and so is just standard PCA), and the other
(r = 1
) does a maximal amount of regularization according to
the variable structure.
Usage
gpcaFullFamily(X, Q, weights = rep(1, nrow(X)), k = 2, rvec = (0:100)/100,
findReflections = TRUE, returnLong = FALSE, sampledata = NULL,
variabledata = NULL)
Arguments
X |
A data matrix of size |
Q |
A |
weights |
A vector of weights for the rows of |
k |
The number of components to compute for each ordination. |
rvec |
The values of |
findReflections |
Whether or not flip the axes so as to make
neighboring ordinations as close as possible. If |
returnLong |
Return a long data frame with the samples/variables instead of a list of data frames. |
sampledata |
Extra sample data to be included along with the sample scores. |
variabledata |
Extra variable data to be included along with the variable loadings. |
Value
A list containing elements for the sample points
(locationList
), the species points (speciesList
), and
the variance fractions (varsList
). Each element is itself a
list of data frames (location/species points) or of vectors (for
the variances).
Examples
data(AntibioticSmall)
out.ff = gpcaFullFamily(AntibioticSmall$X, AntibioticSmall$Q, k = 2)
Derivative of the likelihood
Description
Derivative of the likelihood in the hierarchical model as a
function of r
.
Usage
gradLik(Xtilde, r, D)
Arguments
Xtilde |
The transformed data |
r |
r |
D |
The eigenvalues of Q |
Derivative of \sigma^2
Description
Derivative of \sigma^2(r)
in the hierarchical model
Usage
gradSigma2OfR(Xtilde, r, D)
Arguments
Xtilde |
The transformed data |
r |
r |
D |
The eigenvalues of Q |
Shiny gadget for tree/taxonomy inspection
Description
Shiny gadget that allows users to visualize the scores of the taxa on the agpca axes, their positions on the phylogenetic tree, and their taxonomic assignments.
Usage
inspectTaxonomy(agpcafit, physeq, axes = c(1, 2), br.length = FALSE,
height = 600)
Arguments
agpcafit |
An agpca object, created either by the function
|
physeq |
A phyloseq object with a tree and a taxonomy table. |
axes |
The axes to plot, must be a vector of two whole numbers. |
br.length |
Plot the tree with the branch lengths? |
height |
The height, in pixels, of the plotting region. |
Value
The function will open a browser window showing the tree and the locations of the taxa on the selected agpca axes. "Brushing" over the plot will highlight the positions of the selected taxa on the tree and list their taxonomic assignments. Clicking the "done" button will exit the app and return a data frame containing the positions of the selected taxa on the agpca axes, the taxonomic assignments of the selected taxa, and their names.
Examples
## Not run:
data(AntibioticPhyloseq)
pp = processPhyloseq(AntibioticPhyloseq)
out.agpca = adaptivegpca(pp$X, pp$Q, k = 2)
treeInspect(out.agpca, AntibioticPhyloseq)
## End(Not run)
The likelihood at a given value of r
and \sigma
Description
The likelihood at a given value of r
and \sigma
Usage
likelihood(Xtilde, sigma, r, D)
Arguments
Xtilde |
The transformed data |
sigma |
Overall scaling factor. |
r |
r |
D |
The eigenvalues of Q |
The likelihood at a given value of r
and the maximizing \sigma
for
that value of r
.
Description
The likelihood at a given value of r
and the maximizing \sigma
for
that value of r
.
Usage
likelihoodR(Xtilde, r, D)
Arguments
Xtilde |
The transformed data |
r |
r |
D |
The eigenvalues of Q |
Likelihood of data in two-parameter model
Description
Likelihood of data in two-parameter model
Usage
likelihood_two_params(r1, r2, Xtilde, D)
Arguments
r1 |
Coefficient of Q |
r2 |
Coefficient of Q^(-1) in the noise part. |
Xtilde |
The data projected onto the eigenvectors of Q. |
D |
The eigenvalues of Q |
Value
The marginal likelihood of the data given r1 and r2.
Normalizes a matrix.
Description
Normalizes a count matrix X for correspondence analysis and returns the corresponding metric. The normalization for X is as follows: First the row sums of X are computed, giving the weights for each sample. These weights are stored in a matrix D, which defines an inner product on the columns of X. Then the vectors of counts stored in the rows of X are replaced with proportions, and the resulting matrix is centered according to the inner product defined by D. Both the centered data matrix and D are returned to the user.
Usage
normalizeMatrix(X)
Arguments
X |
The matrix to be normalized. |
Value
A list with the normalized matrix (Xtilde
) and the
row weights (D
).
Plot an adaptivegpca object
Description
Plots the output from adaptivegpca
, either a scree
plot, the samples, or the variables.
Usage
## S3 method for class 'adaptivegpca'
plot(x, type = c("scree", "samples", "variables"),
axes = c(1, 2), ...)
Arguments
x |
An object of class |
type |
What type of plot to make. |
axes |
Which axes to plot. |
... |
Not used. |
Examples
data(AntibioticSmall)
out.agpca = adaptivegpca(AntibioticSmall$X, AntibioticSmall$Q, k = 2)
plot(out.agpca)
plot(out.agpca, type = "samples")
plot(out.agpca, type = "variables")
Print an adaptivegpca object
Description
Print an adaptivegpca object
Usage
## S3 method for class 'adaptivegpca'
print(x, ...)
Arguments
x |
|
... |
Not used. |
Make the input matrices for adaptive gPCA
Description
Takes a phyloseq object and creates the matrices necessary to do adaptive gPCA.
Usage
processPhyloseq(physeq, ca = FALSE)
Arguments
physeq |
A |
ca |
If TRUE, do the normalization as for correspondence analysis (transform counts to relative abundances, compute sample weights, center the relative abundances according to the sample weights). Otherwise, simply center the data. |
Value
A list of the matrix to perform adaptive gPCA on
(X
), the species similarity matrix (Q
), and the
sample weights (weights
).
Examples
data(AntibioticPhyloseq)
pp = processPhyloseq(AntibioticPhyloseq)
Value of \sigma^2
that maximizes the likelihood for a given value of
r
Description
Value of \sigma^2
that maximizes the likelihood for a given value of
r
Usage
sigma2OfR(Xtilde, r, D)
Arguments
Xtilde |
The transformed data |
r |
r |
D |
The eigenvalues of Q |
Variance along eigenvectors of Q
Description
Project the sample points stored in the rows of X
along the
eigenvectors of Q
and find the variance along each of the
projections.
Usage
varianceOnEvecs(X, Q)
Arguments
X |
An |
Q |
A |
Value
A vector containing the variance of the samples along each
of the eigenvectors of Q
.
Examples
data(AntibioticSmall)
voe = varianceOnEvecs(AntibioticSmall$X, AntibioticSmall$Q)
Shiny gadget for adaptive gPCA
Description
Shiny gadget that shows the ordinations from an entire family of gPCAs and returns a gPCA object with the one selected by the user.
Usage
visualizeFullFamily(fullFamily, sample_data = NULL,
sample_mapping = aes_string(x = "Axis1", y = "Axis2"),
sample_facet = NULL, var_data = NULL, var_mapping = aes_string(x =
"Axis1", y = "Axis2"), layout = c(2, 6))
Arguments
fullFamily |
The output from |
sample_data |
Optional data used for plotting the samples |
sample_mapping |
An aesthetic mapping to be passed to
|
sample_facet |
A |
var_data |
Optional data used for plotting the variables |
var_mapping |
An aesthetic mapping to be passed to
|
layout |
A vector of length 2. The first number gives the number of columns (out of 12) for the sidebar, the second number gives the number of columns (out of 12) for the sample plot in the main panel. |
Value
This function will open a 'shiny' app in a browser
window. You can investigate the results for different values of
r
with this app. Once you press the 'done' button, the app
will close and the function will return an R object containing the
results for the value of r
(the regularization parameter)
that was chosen in the app. The returned object is a list
containing the variable loadings on the principal axes (QV
),
the sample/row scores (U
), and the fraction of the variance
explained by each of the axes (vars
).
Examples
## Not run:
data(AntibioticPhyloseq)
pp = processPhyloseq(AntibioticPhyloseq)
out.ff = gpcaFullFamily(pp$X, Q = pp$Q, D = pp$D, k = 2)
out.agpca = visualizeFullFamily(out.ff,
sample_data = sample_data(AntibioticPhyloseq),
sample_mapping = aes(x = Axis1, y = Axis2, color = condition),
var_data = tax_table(AntibioticPhyloseq),
var_mapping = aes(x = Axis1, y = Axis2, color = Phylum))
## End(Not run)