Help for package PRANA

Type:

Package

Title:

Pseudo-Value Regression Approach for Network Analysis (PRANA)

Version:

1.0.6

Description:

A novel pseudo-value regression approach for the differential co-expression network analysis in expression data, which can incorporate additional clinical variables in the model. This is a direct regression modeling for the differential network analysis, and it is therefore computationally amenable for the most users. The full methodological details can be found in Ahn S et al (2023) <doi:10.1186/s12859-022-05123-w>.

License:

GPL-3

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.2.1

Depends:

R (≥ 3.5.0)

Imports:

dplyr, parallel, robustbase, stats, minet

Suggests:

knitr, rmarkdown, testthat (≥ 3.0.0)

Language:

en-US

VignetteBuilder:

knitr

Config/testthat/edition:

NeedsCompilation:

Packaged:

2024-09-22 19:26:01 UTC; seungjunahn

Author:

Seungjun Ahn

[cre, aut, trl], Somnath Datta [ctb, ths]

Maintainer:

Seungjun Ahn <seungjun.ahn@mountsinai.org>

Repository:

CRAN

Date/Publication:

2024-09-22 19:50:07 UTC

EBS

Description

A function to calculate the adjusted p-value for each gene (Datta and Datta, 2005)

Usage

EBS(pvo, alpha = 0.05, B = 500, h = 1)

Arguments

pvo

An object with p-values estimated from the user-provided expression data

alpha

a level of significance (default is 0.05)

B

size of bootstrapping (default is 500)

h

a bandwidth (default is 1)

Value

A vector of adjusted p-values for each gene.

References

Datta, S and Datta, S (2005). Empirical Bayes screening of many p-values with applications to microarray studies. Bioinformatics, 21(9), 1987-1994

PRANA

Description

A pseudo-value regression approach for differential network analysis that adjusts for additional covariates (PRANA).

Usage

PRANA(RNASeqdat, clindat, groupA, groupB)

Arguments

RNASeqdat

An RNA-Seq data with subjects in rows and genes in columns.

clindat

A data with clinical variables to be included in the regression (e.g., binary group variable indicating current smoking status, continuous age, ...)

groupA

Indices of the subjects in the first category (e.g., non-current smoker) of binary group variable.

groupB

Indices of the subjects in the second category (e.g., current smoker) of binary group variable.

Value

A list containing three data frame objects that summarize the results of PRANA. This includes beta coefficients, p-values, and adjusted p-values via the empirical Bayes approach for each predictor variables that are included in the regression model.

References

Ahn S, Grimes T, Datta S. A pseudo-value regression approach for differential network analysis of co-expression data. BMC Bioinformatics, 2023;24(1):8

Examples

data(combinedCOPDdat_RGO) # A complete data containing expression and clinical data.

# A gene expression data part of the downloaded data.
rnaseqdat = combinedCOPDdat_RGO[ , 8:ncol(combinedCOPDdat_RGO)]
rnaseqdat = as.data.frame(apply(rnaseqdat, 2, as.numeric))

# A clinical data with additional covariates sorted by current smoking groups:
# The first column is ID, so do not include.
phenodat = combinedCOPDdat_RGO[order(combinedCOPDdat_RGO$currentsmoking), 2:7]

# Indices of non-current smoker (namely Group A)
index_grpA = which(combinedCOPDdat_RGO$currentsmoking == 0)
# Indices of current smoker (namely Group B)
index_grpB = which(combinedCOPDdat_RGO$currentsmoking == 1)

PRANAres <- PRANA(RNASeqdat = rnaseqdat, clindat = phenodat,
groupA = index_grpA, groupB = index_grpB)

adjpval

Description

A function to retrieve a table with adjusted p-values after running PRANA. The table includes all variables that were included in the pseudo-value regression model.

Usage

adjpval(PRANAres)

Arguments

PRANAres

An object called after running PRANA.

Value

A table that includes adjusted p-values for all variables included in the fitted model.

Examples

data(combinedCOPDdat_RGO) # A complete data containing expression and clinical data.

# A gene expression data part of the downloaded data.
rnaseqdat = combinedCOPDdat_RGO[ , 8:ncol(combinedCOPDdat_RGO)]
rnaseqdat = as.data.frame(apply(rnaseqdat, 2, as.numeric))

# A clinical data with additional covariates sorted by current smoking groups:
# The first column is ID, so do not include.
phenodat = combinedCOPDdat_RGO[order(combinedCOPDdat_RGO$currentsmoking), 2:7]

# Indices of non-current smoker (namely Group A)
index_grpA = which(combinedCOPDdat_RGO$currentsmoking == 0)
# Indices of current smoker (namely Group B)
index_grpB = which(combinedCOPDdat_RGO$currentsmoking == 1)

# Use PRANA() function to perform the pseudo-value regression analysis.
PRANAres <- PRANA(RNASeqdat = rnaseqdat, clindat = phenodat,
 groupA = index_grpA, groupB = index_grpB)

# Now, we want to keep the table with adjusted p-values only.
adjpval(PRANAres)

adjpval_specific_var

Description

A function to retrieve a vector of adjusted p-values after running PRANA.

Usage

adjpval_specific_var(adjptab, varname)

Arguments

adjptab

A table that includes adjusted p-values for a specific variable.

varname

Specify the name of the variable of interest.

Value

A vector of adjusted p-values for a single variable from the model.

Examples

data(combinedCOPDdat_RGO) # A complete data containing expression and clinical data.

# A gene expression data part of the downloaded data.
rnaseqdat = combinedCOPDdat_RGO[ , 8:ncol(combinedCOPDdat_RGO)]
rnaseqdat = as.data.frame(apply(rnaseqdat, 2, as.numeric))

# A clinical data with additional covariates sorted by current smoking groups:
# The first column is ID, so do not include.
phenodat = combinedCOPDdat_RGO[order(combinedCOPDdat_RGO$currentsmoking), 2:7]

# Indices of non-current smoker (namely Group A)
index_grpA = which(combinedCOPDdat_RGO$currentsmoking == 0)
# Indices of current smoker (namely Group B)
index_grpB = which(combinedCOPDdat_RGO$currentsmoking == 1)

# Use PRANA() function to perform the pseudo-value regression analysis.
PRANAres <- PRANA(RNASeqdat = rnaseqdat, clindat = phenodat,
groupA = index_grpA, groupB = index_grpB)

# Create an object to keep the table with adjusted p-values using adjpval() function.
adjpvaltab <- adjpval(PRANAres)

# Retrieve a vector of adjusted p-values for a single variable of interest.
adjpval_specific_var(adjptab = adjpvaltab, varname = "currentsmoking")

coeff

Description

A function to retrieve a table with coefficient estimates after running PRANA. The table includes all variables that were included in the pseudo-value regression model.

Usage

coeff(PRANAres)

Arguments

PRANAres

An object called after running PRANA.

Value

A table that includes coefficient estimates for all variables included in the fitted model.

Examples

#' data(combinedCOPDdat_RGO) # A complete data containing expression and clinical data.

# A gene expression data part of the downloaded data.
rnaseqdat = combinedCOPDdat_RGO[ , 8:ncol(combinedCOPDdat_RGO)]
rnaseqdat = as.data.frame(apply(rnaseqdat, 2, as.numeric))

# A clinical data with additional covariates sorted by current smoking groups:
# The first column is ID, so do not include.
phenodat = combinedCOPDdat_RGO[order(combinedCOPDdat_RGO$currentsmoking), 2:7]

# Indices of non-current smoker (namely Group A)
index_grpA = which(combinedCOPDdat_RGO$currentsmoking == 0)
# Indices of current smoker (namely Group B)
index_grpB = which(combinedCOPDdat_RGO$currentsmoking == 1)

# Use PRANA() function to perform the pseudo-value regression analysis.
# Then, create an object called `res` to call results later.
PRANAres <- PRANA(RNASeqdat = rnaseqdat, clindat = phenodat,
 groupA = index_grpA, groupB = index_grpB)

# Now, we want to keep the table with coefficient estimates only.
coeff(PRANAres)

coeff_specific_var

Description

A function to retrieve a vector of coefficient estimates for a specific variable of interest after running PRANA.

Usage

coeff_specific_var(coefftab, varname)

Arguments

coefftab

A table that includes adjusted p-values for a specific variable.

varname

Specify the name of the variable of interest.

Value

A vector of coefficient estimates for a single variable from the model.

Examples

#' data(combinedCOPDdat_RGO) # A complete data containing expression and clinical data.

# A gene expression data part of the downloaded data.
rnaseqdat = combinedCOPDdat_RGO[ , 8:ncol(combinedCOPDdat_RGO)]
rnaseqdat = as.data.frame(apply(rnaseqdat, 2, as.numeric))

# A clinical data with additional covariates sorted by current smoking groups:
# The first column is ID, so do not include.
phenodat = combinedCOPDdat_RGO[order(combinedCOPDdat_RGO$currentsmoking), 2:7]

# Indices of non-current smoker (namely Group A)
index_grpA = which(combinedCOPDdat_RGO$currentsmoking == 0)
# Indices of current smoker (namely Group B)
index_grpB = which(combinedCOPDdat_RGO$currentsmoking == 1)

# Use PRANA() function to perform the pseudo-value regression analysis.
# Then, create an object called `res` to call results later.
PRANAres <- PRANA(RNASeqdat = rnaseqdat, clindat = phenodat,
 groupA = index_grpA, groupB = index_grpB)

# Create an object to keep the table with coefficient estimates using coeff() function.
coefftab <- coeff(PRANAres)

# Lastly, we use coeff_specific_var function to retrieve
# adjusted p-values for a single variable of interest.
coeff_specific_var(coefftab = coefftab, varname = "currentsmoking")

Chronic Obstructive Pulmonary Disease (COPD) data

Description

A subset of the COPDGene study data, obtained from the Gene Expression Omnibus database. This contains expression and clinical covariates data. All of the 28 genes contained in this data are identified as COPD-related by Sakornsakolpat et al. Nat Genet, 2019.

Usage

data(combinedCOPDdat_RGO)

Format

combinedCOPDdat_RGO

References

Sakornsakolpat P, Prokopenko D, Lamontagne M, et al. Genetic landscape of chronic obstructive pulmonary disease identifies heterogeneous cell-type and phenotype associations. Nat Genet. 2019;51(3):494–505. (PubMed) Ahn S, Grimes T, Datta S. A pseudo-value regression approach for differential network analysis of co-expression data. BMC Bioinformatics, 2023;24(1):8. (PubMed)

Examples

data(combinedCOPDdat_RGO)
combinedCOPDdat_RGO

run_aracne

Description

A function to conduct co-expression analysis using ARACNE (Margolin et al. 2006). Uses the implementation from the minet package (Meyer et al. 2008). This function is from dnapath R package, which is archived in May 2024.

Usage

run_aracne(
  x,
  estimator = "spearman",
  disc = "none",
  nbins = NULL,
  eps = 0,
  ...
)

Arguments

x

A n by p matrix of gene expression data (n samples and p genes)

estimator

Argument is passed into minet::build.mim.

disc

Argument is passed into minet::build.mim.

nbins

Argument is passed into minet::build.mim.

eps

Argument is passed into minet::aracne.

...

Additional arguments are ignored.

Value

A p by p matrix of association scores.

References

Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A (2006). “ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context.” In BMC Bioinformatics, volume 7(1), S7. BioMed Central.

Meyer PE, Lafitte F, Bontempi G (2008). “minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks using Mutual Information.” BMC Bioinformatics, 9(1), 461.

sigDCGnames

Description

A function to retrieve the name of genes that are significantly differentially connected (DC). between two biological/clinical states (aka the main binary indicator) with the presence of additional covariate information.

Usage

sigDCGnames(adjptab, groupvar, alpha)

Arguments

adjptab

A table with adjusted p-values for all variables that were included in the pseudo-value regression model.

groupvar

Specify the name of binary indicator variable.

alpha

A level of significance (e.g. 0.05).

Value

Names of significantly DC genes (e.g. gene IDs) from PRANA. If you need both adjusted p-values and names, please use sigDCGtab() instead.

Examples

#' data(combinedCOPDdat_RGO) # A complete data containing expression and clinical data.

# A gene expression data part of the downloaded data.
rnaseqdat = combinedCOPDdat_RGO[ , 8:ncol(combinedCOPDdat_RGO)]
rnaseqdat = as.data.frame(apply(rnaseqdat, 2, as.numeric))

# A clinical data with additional covariates sorted by current smoking groups:
# The first column is ID, so do not include.
phenodat = combinedCOPDdat_RGO[order(combinedCOPDdat_RGO$currentsmoking), 2:7]

# Indices of non-current smoker (namely Group A)
index_grpA = which(combinedCOPDdat_RGO$currentsmoking == 0)
# Indices of current smoker (namely Group B)
index_grpB = which(combinedCOPDdat_RGO$currentsmoking == 1)

# Use PRANA() function to perform the pseudo-value regression analysis.
# Then, create an object called PRANA_Results to call results.
PRANAres <- PRANA(RNASeqdat = rnaseqdat, clindat = phenodat,
groupA = index_grpA, groupB = index_grpB)

# Next, we want to call the table with adjusted p-values only.
adjptab <- adjpval(PRANAres)

# Please specify the name of binary group indicator in sigDCGnames(groupvar = ).
sigDCGnames <- sigDCGnames(adjptab = adjptab, groupvar = "currentsmoking", alpha = 0.05)
sigDCGnames

sigDCGtab

Description

A function to retrieve the data frame that are significantly differentially connected (DC). between two biological/clinical states (aka the main binary indicator) with the presence of additional covariate information.

Usage

sigDCGtab(adjptab, groupvar, alpha)

Arguments

adjptab

A table with adjusted p-values and names for the variable that the user specifies in the groupvar.

groupvar

Specify the name of binary indicator variable.

alpha

A level of significance (e.g. 0.05).

Value

Adjusted p-values and names of significantly DC genes (e.g. gene IDs) from PRANA.

Examples

#' data(combinedCOPDdat_RGO) # A complete data containing expression and clinical data.

# A gene expression data part of the downloaded data.
rnaseqdat = combinedCOPDdat_RGO[ , 8:ncol(combinedCOPDdat_RGO)]
rnaseqdat = as.data.frame(apply(rnaseqdat, 2, as.numeric))

# A clinical data with additional covariates sorted by current smoking groups:
# The first column is ID, so do not include.
phenodat = combinedCOPDdat_RGO[order(combinedCOPDdat_RGO$currentsmoking), 2:7]

# Indices of non-current smoker (namely Group A)
index_grpA = which(combinedCOPDdat_RGO$currentsmoking == 0)
# Indices of current smoker (namely Group B)
index_grpB = which(combinedCOPDdat_RGO$currentsmoking == 1)

# Use PRANA() function to perform the pseudo-value regression analysis.
# Then, create an object called PRANA_Results to call results.
PRANAres <- PRANA(RNASeqdat = rnaseqdat, clindat = phenodat,
 groupA = index_grpA, groupB = index_grpB)

# Next, we want to call the table with adjusted p-values.
adjptab <- adjpval(PRANAres)

# Please specify the name of variable in sigDCGtab(groupvar = ).
sigDCGtab <- sigDCGtab(adjptab = adjptab, groupvar = "currentsmoking", alpha = 0.05)
sigDCGtab

thetahats

Description

A function to compute the total connectivity of each gene from the association matrix.

Usage

thetahats(asso.matinput)

Arguments

asso.matinput

An association matrix that is estimated from the user-provided expression data is used as an input to compute the total connectivity of each gene.

Value

A vector containing total connectivity of each gene (i.e. continuous version of centrality measure of a network)