Title: Adaptive Generalized PCA
Version: 0.1.3
Description: Implements adaptive gPCA, as described in: Fukuyama, J. (2017) <doi:10.48550/arXiv.1702.00501>. The package also includes functionality for applying the method to 'phyloseq' objects so that the method can be easily applied to microbiome data and a 'shiny' app for interactive visualization.
Depends: R (≥ 3.1.0)
License: AGPL-3
LazyData: true
VignetteBuilder: knitr
Suggests: knitr, rmarkdown
Imports: ape (≥ 3.1.4), ggplot2 (≥ 1.0.0), shiny (≥ 1.0.0), phyloseq (≥ 1.14.0)
RoxygenNote: 6.0.1
NeedsCompilation: no
Packaged: 2022-12-07 20:37:35 UTC; jfukuyam
Author: Julia Fukuyama [aut, cre]
Maintainer: Julia Fukuyama <julia.fukuyama@gmail.com>
Repository: CRAN
Date/Publication: 2022-12-08 11:42:31 UTC

adaptiveGPCA: A package for structured dimensionality reduction

Description

This package implements the methods for structured dimensionality reduction described in Fukuyama, J. (2017). The general idea is to obtain a low-dimensional representation of the data, similar to that given by PCA, which incorporates side information about the relationships between the variables. The output is similar to a PCA biplot, but the variable loadings are regularized so that similar variables are encouraged to have similar loadings on the principal axes.

Details

There are two main ways of using this package. The function adaptivegpca will choose how much to regularize the variables according to the similarities between them, while the function gpcaFullFamily produces analogous output for a range of regularization parameters. With this function, the results for the different regularization parameters are inspected with the visualizeFullFamily function, and the desired parameter is chosen manually.

The package also contains functionality to integrate with phyloseq: the function processPhyloseq takes a phyloseq object and creates the inputs necessary to perform adaptive gPCA on a microbiome dataset including information about the phylogenetic relationships between the bacteria.


Antibiotic time course experiment.

Description

A phyloseq object describing a time course experiment in which three people two courses of cipro and had their gut microbiomes sampled. See Dethlefsen and Relman, PNAS (2010), at https://www.ncbi.nlm.nih.gov/pubmed/20847294 for more details.

Format

A phyloseq object.


A subset of the antibiotic data

Description

This is a smaller version of the AntibioticPhyloseq dataset, for use in the examples so that the running time isn't so long. It has the same samples and a randomly selected set of 200 of the taxa. It is stored as a list with three components: the normalized OTU abundances (X), the similarity matrix for the taxa (Q), and the diagonal weight matrix (D, the identity matrix).

Format

A list with three components.


Adaptive gPCA

Description

Performs adaptive generalized PCA, a dimensionality-reduction method which takes into account similarities between the variables. See Fukuyama, J. (2017) for more details.

Usage

adaptivegpca(X, Q, k = 2, weights = rep(1, nrow(X)))

Arguments

X

A n \times p data matrix.

Q

A p \times p similarity matrix on the variables defining an inner product on the rows of X, can also be given as an eigendecomposition (formatted as the output from eigen).

k

The number of components to return.

weights

A vector of length n containing weights for the rows of X.

Value

A list containing the row/sample scores (U), the variable loadings (QV), the proportion of variance explained by each of the principal components (vars), the value of r that was used (r).

Examples

data(AntibioticSmall)
out.agpca = adaptivegpca(AntibioticSmall$X, AntibioticSmall$Q, k = 2)

Check that the axes specified are valid

Description

Check that the axes specified are valid

Usage

check_axes(axes, x)

Arguments

axes

A set of user-specified axes.

x

Object of class adaptivegpca.


Check compatibility of agpca and phyloseq objects

Description

Check that the dimensions of the agpca object match the phyloseq object and that the phyloseq object has a taxonomy table and a phylogenetic tree.

Usage

check_phyloseq(agpcafit, physeq)

Arguments

agpcafit

An adaptivegpca object.

physeq

A phyloseq object.


Estimate parameters in hierarchical model

Description

Estimates the values of r and \sigma in a model X \sim N(0, \sigma^2 (r Q + (1 - r)I)).

Usage

estimateComponents(X, Q, Qeig = NULL)

Arguments

X

An n \times p data matrix.

Q

A p \times p matrix giving the prior variance on the rows of X.

Qeig

If the eigendecomposition of Q is already computed, it can be included here.

Value

A list with r and \sigma.

Examples

data(AntibioticSmall)
estimateComponents(AntibioticSmall$X, AntibioticSmall$Q)

Estimate variance components

Description

Estimate variance components in a two-parameter model where X \sim N(0, \sigma^2 (r_1 Q + (1 - r_1) (r_2 Q^(-1) + (1 - r_2) I)))

Usage

estimateComponents2(X, Q)

Arguments

X

An n x p data matrix.

Q

A p x p psd matrix giving the similarity between the variables.

Value

A vector with r1 and r2


Find reflection

Description

Find a reflection of one data frame so that it most closely matches another.

Usage

findReflection(df1, df2)

Arguments

df1

The base data frame.

df2

The data frame that will be reflected across either the x-axis or the y-axis (or both or neither) so that the points in it most closely match df1.

Value

A vector of length ncol(df1): Multiplying the first column of df2 by the first element and multiplying the second column of df2 by the second element and so on gives the optimal reflection.


gPCA

Description

Performs standard gPCA with k components on a data matrix X with row inner product Q and weights D.

Usage

gpca(X, Q, D = rep(1, nrow(X)), k)

Arguments

X

A data matrix of size n \times p.

Q

An inner product matrix for the rows, either as a p \times p matrix or an eigendecomposition of such a matrix.

D

Sample weights, a vector of length n.

k

The number of components to return.

Value

A list with variable loadings on the principal axes (QV), sample/row scores (U), the fraction of the variance explained by each of the axes (vars).

Examples

data(AntibioticSmall)
out.gpca = gpca(AntibioticSmall$X, AntibioticSmall$Q, k = 2)

gPCA using pre-computed eidendecomposition

Description

Performs gPCA with pre-computed eigenvectors and eigenvalues.

Usage

gpcaEvecs(X, evecs, evals, D = rep(1, nrow(X)), k)

Arguments

X

Data matrix.

evecs

Eigenvectors of Q, the inner product/similarity matrix.

evals

Eigenvalues of Q.

D

Sample weights

k

The number of components to return.


Make a sequence of ordinations

Description

Creates a sequence of gPCA data representations. One end of the sequence (r = 0) doesn't do any regularization according to the variable structure (and so is just standard PCA), and the other (r = 1) does a maximal amount of regularization according to the variable structure.

Usage

gpcaFullFamily(X, Q, weights = rep(1, nrow(X)), k = 2, rvec = (0:100)/100,
  findReflections = TRUE, returnLong = FALSE, sampledata = NULL,
  variabledata = NULL)

Arguments

X

A data matrix of size n \times p.

Q

A p \times p similarity matrix defining an inner product on the rows of X.

weights

A vector of weights for the rows of X.

k

The number of components to compute for each ordination.

rvec

The values of r for which to make the ordinations.

findReflections

Whether or not flip the axes so as to make neighboring ordinations as close as possible. If k is very large this should be false since all possible axis combinations are searched over.

returnLong

Return a long data frame with the samples/variables instead of a list of data frames.

sampledata

Extra sample data to be included along with the sample scores.

variabledata

Extra variable data to be included along with the variable loadings.

Value

A list containing elements for the sample points (locationList), the species points (speciesList), and the variance fractions (varsList). Each element is itself a list of data frames (location/species points) or of vectors (for the variances).

Examples

data(AntibioticSmall)
out.ff = gpcaFullFamily(AntibioticSmall$X, AntibioticSmall$Q, k = 2)

Derivative of the likelihood

Description

Derivative of the likelihood in the hierarchical model as a function of r.

Usage

gradLik(Xtilde, r, D)

Arguments

Xtilde

The transformed data

r

r

D

The eigenvalues of Q


Derivative of \sigma^2

Description

Derivative of \sigma^2(r) in the hierarchical model

Usage

gradSigma2OfR(Xtilde, r, D)

Arguments

Xtilde

The transformed data

r

r

D

The eigenvalues of Q


Shiny gadget for tree/taxonomy inspection

Description

Shiny gadget that allows users to visualize the scores of the taxa on the agpca axes, their positions on the phylogenetic tree, and their taxonomic assignments.

Usage

inspectTaxonomy(agpcafit, physeq, axes = c(1, 2), br.length = FALSE,
  height = 600)

Arguments

agpcafit

An agpca object, created either by the function adaptivegpca or by visualizeFullFamily.

physeq

A phyloseq object with a tree and a taxonomy table.

axes

The axes to plot, must be a vector of two whole numbers.

br.length

Plot the tree with the branch lengths?

height

The height, in pixels, of the plotting region.

Value

The function will open a browser window showing the tree and the locations of the taxa on the selected agpca axes. "Brushing" over the plot will highlight the positions of the selected taxa on the tree and list their taxonomic assignments. Clicking the "done" button will exit the app and return a data frame containing the positions of the selected taxa on the agpca axes, the taxonomic assignments of the selected taxa, and their names.

Examples

## Not run: 
data(AntibioticPhyloseq)
pp = processPhyloseq(AntibioticPhyloseq)
out.agpca = adaptivegpca(pp$X, pp$Q, k = 2)
treeInspect(out.agpca, AntibioticPhyloseq)

## End(Not run)

The likelihood at a given value of r and \sigma

Description

The likelihood at a given value of r and \sigma

Usage

likelihood(Xtilde, sigma, r, D)

Arguments

Xtilde

The transformed data

sigma

Overall scaling factor.

r

r

D

The eigenvalues of Q


The likelihood at a given value of r and the maximizing \sigma for that value of r.

Description

The likelihood at a given value of r and the maximizing \sigma for that value of r.

Usage

likelihoodR(Xtilde, r, D)

Arguments

Xtilde

The transformed data

r

r

D

The eigenvalues of Q


Likelihood of data in two-parameter model

Description

Likelihood of data in two-parameter model

Usage

likelihood_two_params(r1, r2, Xtilde, D)

Arguments

r1

Coefficient of Q

r2

Coefficient of Q^(-1) in the noise part.

Xtilde

The data projected onto the eigenvectors of Q.

D

The eigenvalues of Q

Value

The marginal likelihood of the data given r1 and r2.


Normalizes a matrix.

Description

Normalizes a count matrix X for correspondence analysis and returns the corresponding metric. The normalization for X is as follows: First the row sums of X are computed, giving the weights for each sample. These weights are stored in a matrix D, which defines an inner product on the columns of X. Then the vectors of counts stored in the rows of X are replaced with proportions, and the resulting matrix is centered according to the inner product defined by D. Both the centered data matrix and D are returned to the user.

Usage

normalizeMatrix(X)

Arguments

X

The matrix to be normalized.

Value

A list with the normalized matrix (Xtilde) and the row weights (D).


Plot an adaptivegpca object

Description

Plots the output from adaptivegpca, either a scree plot, the samples, or the variables.

Usage

## S3 method for class 'adaptivegpca'
plot(x, type = c("scree", "samples", "variables"),
  axes = c(1, 2), ...)

Arguments

x

An object of class adaptivegpca

type

What type of plot to make. scree will make a scree plot showing the eigenvalues, samples will plot the samples, and variables will plot the variables.

axes

Which axes to plot.

...

Not used.

Examples

data(AntibioticSmall)
out.agpca = adaptivegpca(AntibioticSmall$X, AntibioticSmall$Q, k = 2)
plot(out.agpca)
plot(out.agpca, type = "samples")
plot(out.agpca, type = "variables")

Print an adaptivegpca object

Description

Print an adaptivegpca object

Usage

## S3 method for class 'adaptivegpca'
print(x, ...)

Arguments

x

adaptivegpca object.

...

Not used.


Make the input matrices for adaptive gPCA

Description

Takes a phyloseq object and creates the matrices necessary to do adaptive gPCA.

Usage

processPhyloseq(physeq, ca = FALSE)

Arguments

physeq

A phyloseq object, from the phyloseq package.

ca

If TRUE, do the normalization as for correspondence analysis (transform counts to relative abundances, compute sample weights, center the relative abundances according to the sample weights). Otherwise, simply center the data.

Value

A list of the matrix to perform adaptive gPCA on (X), the species similarity matrix (Q), and the sample weights (weights).

Examples

data(AntibioticPhyloseq)
pp = processPhyloseq(AntibioticPhyloseq)

Value of \sigma^2 that maximizes the likelihood for a given value of r

Description

Value of \sigma^2 that maximizes the likelihood for a given value of r

Usage

sigma2OfR(Xtilde, r, D)

Arguments

Xtilde

The transformed data

r

r

D

The eigenvalues of Q


Variance along eigenvectors of Q

Description

Project the sample points stored in the rows of X along the eigenvectors of Q and find the variance along each of the projections.

Usage

varianceOnEvecs(X, Q)

Arguments

X

An n \times p data matrix, each row corresponding to a sample.

Q

A p \times p similarity matrix, either as a matrix or as its eigendecomposition (the output from eigen).

Value

A vector containing the variance of the samples along each of the eigenvectors of Q.

Examples

data(AntibioticSmall)
voe = varianceOnEvecs(AntibioticSmall$X, AntibioticSmall$Q)

Shiny gadget for adaptive gPCA

Description

Shiny gadget that shows the ordinations from an entire family of gPCAs and returns a gPCA object with the one selected by the user.

Usage

visualizeFullFamily(fullFamily, sample_data = NULL,
  sample_mapping = aes_string(x = "Axis1", y = "Axis2"),
  sample_facet = NULL, var_data = NULL, var_mapping = aes_string(x =
  "Axis1", y = "Axis2"), layout = c(2, 6))

Arguments

fullFamily

The output from gpcaFullFamily

sample_data

Optional data used for plotting the samples

sample_mapping

An aesthetic mapping to be passed to ggplot for plotting the samples

sample_facet

A ggplot faceting command used for faceting the samples.

var_data

Optional data used for plotting the variables

var_mapping

An aesthetic mapping to be passed to ggplot for plotting the variables

layout

A vector of length 2. The first number gives the number of columns (out of 12) for the sidebar, the second number gives the number of columns (out of 12) for the sample plot in the main panel.

Value

This function will open a 'shiny' app in a browser window. You can investigate the results for different values of r with this app. Once you press the 'done' button, the app will close and the function will return an R object containing the results for the value of r (the regularization parameter) that was chosen in the app. The returned object is a list containing the variable loadings on the principal axes (QV), the sample/row scores (U), and the fraction of the variance explained by each of the axes (vars).

Examples

## Not run: 
data(AntibioticPhyloseq)
pp = processPhyloseq(AntibioticPhyloseq)
out.ff = gpcaFullFamily(pp$X, Q = pp$Q, D = pp$D, k = 2)
out.agpca = visualizeFullFamily(out.ff,
    sample_data = sample_data(AntibioticPhyloseq),
    sample_mapping = aes(x = Axis1, y = Axis2, color = condition),
    var_data = tax_table(AntibioticPhyloseq),
    var_mapping = aes(x = Axis1, y = Axis2, color = Phylum))

## End(Not run)