Help for package AutoPipe

Type:

Package

Title:

Automated Transcriptome Classifier Pipeline: Comprehensive Transcriptome Analysis

Version:

0.1.6

Author:

Karam Daka [cre, aut], Dieter Henrik Heiland [aut]

Maintainer:

Karam Daka <k.dacca@gmail.com>

Description:

An unsupervised fully-automated pipeline for transcriptome analysis or a supervised option to identify characteristic genes from predefined subclasses. We rely on the 'pamr' http://www.bioconductor.org/packages//2.7/bioc/html/pamr.html clustering algorithm to cluster the Data and then draw a heatmap of the clusters with the most significant genes and the least significant genes according to the 'pamr' algorithm. This way we get easy to grasp heatmaps that show us for each cluster which are the clusters most defining genes.

License:

GPL-3

Encoding:

UTF-8

LazyData:

true

Imports:

cluster ,pamr ,siggenes ,annotate ,fgsea ,org.Hs.eg.db ,RColorBrewer ,ConsensusClusterPlus ,Rtsne ,clusterProfiler ,msigdbr

Depends:

R (≥ 3.5.0)

RoxygenNote:

6.1.1

NeedsCompilation:

Packaged:

2019-02-18 09:38:09 UTC; Karam

Repository:

CRAN

Date/Publication:

2019-02-27 17:00:36 UTC

Implemented t-distributed stochastic neighbor embedding

Description

This function is used to upload a table into R for further use in the AutoPipe

Usage

AutoPipe_tSNE(me,perplexity=30,max_iter=500,groups_men)

Arguments

me

The path of the expression table

perplexity

numeric; Perplexity parameter

max_iter

integer; Number of iterations (default: 1000)

groups_men

the data frame with the group clustering that the function Groups_Sup or top_supervised (2. place on the list) returns with the data about each sample and its coressponding cluster.

cluster the samples

Description

This function clusters the samples into x clusters.

Usage

Groups_Sup(me_TOP, me, number_of_k,TRw)

Arguments

me_TOP

the matrix with the n top genes, usually the from output of the function TopPAM

me

the original expression matrix. (with genes in rows and samples in columns).

number_of_k

the number of clusters

TRw

threshold for the elemenation of the samples with a Silhouette width lower than TRw. Default value is -1.

Examples


## load data
library(org.Hs.eg.db)
data(rna)
me_x=rna
res<-AutoPipe::TopPAM(me_x,max_clusters = 8, TOP=100)
me_TOP=res[[1]]
number_of_k=res[[3]]
File_genes=Groups_Sup(me_TOP, me=me_x, number_of_k,TRw=-1)
groups_men=File_genes[[2]]
me_x=File_genes[[1]]

Produce a Heatmap using a Supervised clustering Algorithm

Description

This function produces a plot with a Heatmap using a supervised clustering algorithm which the user choses. with a the mean Silhouette width plotted on the right top corner and the Silhouette width for each sample on top. On the right side of the plot the n highest and lowest scoring genes for each cluster will added. And next to them the coressponding pathways (see Details)

Usage

Supervised_Cluster_Heatmap(groups_men, gene_matrix,
method="PAMR",TOP=1000,TOP_Cluster=150,
show_sil=FALSE,show_clin=FALSE,genes_to_print=5,
print_genes=FALSE,samples_data=NULL,colors="RdBu",
GSE=FALSE,topPaths=5,db="c2",plot_mean_sil=FALSE,stats_clust =NULL,threshold=2)

Arguments

groups_men

the data frame with the group clustering that the function Groups_Sup or top_supervised (2. place on the list) returns with the data about each sample and its coressponding cluster.

gene_matrix

the matrix of n selected genes that the function Groups_Sup returns

method

the method to cluster of Clustering. The default is "PAMR" which uses the pamr library. other methods are SAM and our own "EXReg" (see details)

TOP

the number of the top genes to take. the default value is 1000.

TOP_Cluster

a numeric variable for the number of genes to include in the clusters. Default is 150.

show_sil

a logical value that indicates if the function should show the Silhouette width for each sample. Default is FALSE.

show_clin

a logical value if TRUE the function will plot the clinical data provided by the user. Default value is FALSE.

genes_to_print

the number of genes to print for each cluster. this function adds on the right side. of the heatmap the n highest expressed genes and the n lowest expressed genes for each cluster. Default value is 5.

print_genes

a logical value indicating if or not to plot the TOP genes for each cluster.Default value is FALSE.

samples_data

the clinical data provided by the user to plot under the heatmap. it will be plotted only if show_clin is TRUE. Default value is NULL. see details for format.

colors

the colors for the Heatmap. The function RColorBrewer palletes.

GSE

a logical variable that indicates wether to plot thr Gene Set Enrichment Analysis next to the heatmap. Default value is FALSE.

topPaths

a numerical value that says how many pathways the Gene Set Enrichment plots should contain fo each cluster. Default value is 5.

db

a value for the database for the GSE to be used. Default value is "c1". the paramater can one of the values: "c1","c2","c3",c4","c5","c6","c7","h". See the broad institue GSE GSE webpage for further information in each dataset.

plot_mean_sil

A logical value. if TRUE the function plots the mean of the Silhouette width for each cluster number or gap statistic.

stats_clust

A vector with the mean Silhouette widths or gap statistic for the number of clusters. The first value should be for 2 Clusters. 2nd is for 3 clusters and so on.

threshold

the threshhold for the pam analysis default is 2.

Details

sample data should be a data.frame with the sample names as rownames and the clinical triats as columns. each trait must be a numeric variable.

Examples


##load the org.Hs.eg Library
library(org.Hs.eg.db)
## load data
data(rna)
me_x=rna
## calculate best number of clusters and
res<-AutoPipe::TopPAM(me_x,max_clusters = 6, TOP=100)
me_TOP=res[[1]]
number_of_k=res[[3]]
File_genes=Groups_Sup(me_TOP, me=me_x, number_of_k,TRw=-1)
groups_men=File_genes[[2]]
me_x=File_genes[[1]]
o_g<-Supervised_Cluster_Heatmap(groups_men = groups_men, gene_matrix=me_x,
    method="PAMR",show_sil=TRUE,print_genes=TRUE,threshold=0,
    TOP = 100,GSE=FALSE,plot_mean_sil=TRUE,stats_clust=res[[2]])

Compute Top genes

Description

This function computes the n=TOP genes and the the best number of clusters

Usage

TopPAM(me, max_clusters=15,TOP=1000,B=100,clusterboot=FALSE)

Arguments

me

a matrix with genes in rows and samples in columns

max_clusters

max. number of clusters to check

TOP

the number of genes to take.

B

integer, number of Monte Carlo (“bootstrap”) samples.

clusterboot

A logical value indicating wether or not to calculate the Gap statistic and to bootstrap.

Details

we use the clusGap algorithm from the package cluster to calculate the Gap statistic.

Value

a list of 1. A matrix with the top genes 2. A list of means of the Silhouette width for each number of clusters. 3. The optimal number of clusters. 4. gap_st the gap statistic of the clustering 5. best number of clusters according to the gap statistic.

Examples


##load the org.Hs.eg Library
library(org.Hs.eg.db)
#' ## load data
data(rna)
me_x=rna
res<-AutoPipe::TopPAM(me_x,max_clusters = 8, TOP=100,clusterboot=FALSE)
me_TOP=res[[1]]
number_of_k=res[[3]]

Unsupervised Clustering

Description

A function for unsupervised Clustering of the data

Usage

UnSuperClassifier(data,clinical_data=NULL,thr=2,TOP_Cluster=150,TOP=100)

Arguments

data

the data for the clustering. Data should be in the following format: samples in columns and the genes in the rows (colnames and rownames accordingly). The rownames should be Entrez ID in order to plot a gene set enrichment analysis.

clinical_data

the clinical data provided by the user to plot under the heatmap. it will be plotted only if show_clin is TRUE. Default value is NULL. see details for format.

thr

The threshold for the PAMR algorithm default is 2.

TOP_Cluster

numeric; Number of genes in each cluster.

TOP

numeric; the number of the TOP genes to take from the gene exoression matrix see TopPAM TOP.

Details

sample data should be a data.frame with the sample names as rownames and the clinical triats as columns. each trait must be a numeric variable. @return the function is an autated Pipeline for clustering it plot cluster analysis for the geneset

A function to plot do a Consensus clustering to validate the results

Description

this function calls the ConsensusClusterPlus function with thedaraset and plots a plot with the heatmaps of the clustering for each number of clusters from 2 to max_clust

Usage

cons_clust(data,max_clust,TOPgenes)

Arguments

data

this is the data for the ConsensusClusterPlus

max_clust

the max number of clusters that should be evaluated.

TOPgenes

the number of the top genes to choose for the clustering

Value

plots a plot with all the heatmaps from the ConsensusClusterPlus for the number ofd clusters 2 to max_clust the same return value as the COnsensusClusterPlus

Examples


data(rna)
cons_clust(rna,5,TOPgenes=50)

Input Expression File

Description

This function is used to upload a table into R for further use in the AutoPipe

Usage

read_expression_file(file, format = "csv", sep=";",gene_name="SYMBOL", Trans=FALSE)

Arguments

file

The path of the expression table

format

The format of the table "csv" or "txt"

sep

The seperator of the input table

gene_name

Genes are given in "SYMBOL" or "ENTREZID"

Trans

Need Matrix Transpose TRUE or FALSE

Value

A data.frame with a gene expression matrix

rna egene expression of 48 meningiomas

Description

A dataset containing the gene expression data od 48 meningioma tumors

Usage

rna

Format

A data frame with 200 rows and 48 variables:

BT_1008: sample BT_1008,
BT_1017: sample BT_1017,
BT_1025: sample BT_1025,
BT_1042: sample BT_1042,
BT_1050: sample BT_1050,
BT_1056: sample BT_1056,
BT_1065: sample BT_1065,
BT_1067: sample BT_1067,
BT_1072: sample BT_1072,
BT_1078: sample BT_1078,
BT_1082: sample BT_1082,
BT_1091: sample BT_1091,
BT_1094: sample BT_1094,
BT_1097: sample BT_1097,
BT_1115: sample BT_1115,
BT_605: sample BT_605,
BT_617: sample BT_617,
BT_619: sample BT_619,
BT_633: sample BT_633,
BT_634: sample BT_634,
BT_644: sample BT_644,
BT_654: sample BT_654,
BT_659: sample BT_659,
BT_690: sample BT_690,
BT_695: sample BT_695,
BT_700: sample BT_700,
BT_738: sample BT_738,
BT_751: sample BT_751,
BT_771: sample BT_771,
BT_797: sample BT_797,
BT_803: sample BT_803,
BT_808: sample BT_808,
BT_820: sample BT_820,
BT_837: sample BT_837,
BT_855: sample BT_855,
BT_862: sample BT_862,
BT_873: sample BT_873,
BT_882: sample BT_882,
BT_887: sample BT_887,
BT_900: sample BT_900,
BT_905: sample BT_905,
BT_907: sample BT_907,
BT_920: sample BT_920,
BT_944: sample BT_944,
BT_962: sample BT_962,
BT_963: sample BT_963,
BT_982: sample BT_982,
BT_990: sample BT_990,

...

A Function for Assisting Supervised Clustering

Description

when perfoming a supervised clustering the user should run this function in order to get the best results.

Usage

top_supervised(me,TOP=1000,cluster_which,TRw=-1)

Arguments

me

the matrix of the gene exporessions, the olums should be the samples and the colnames the sample names the rownames should be the genes . at best the ENTEREZID

TOP

the top genes to choose, default is 100.

cluster_which

a dataframe with the supervised clustering arrangment of the samples. the dataframe should have the sample names in the first column and the clustering in the secound column.

TRw

the threshhold for excluding samples with silhouette width < TRw

Value

a list. the first place is the expression matrix, the secound is the silhouette for each sample.

Examples



library(org.Hs.eg.db)
data(rna)
cluster_which<-cbind(colnames(rna),c(rep(1,times=24),rep(2,times=24)))
me_x=rna
## calculate best number of clusters and
res<-top_supervised(me_x,TOP = 100,cluster_which)
me_TOP=res[[1]]
number_of_k=2
groups_men=res[[2]]
me_x=me_TOP
colnames(me_x)
o_g<-Supervised_Cluster_Heatmap(groups_men = groups_men, gene_matrix=me_x,
                               method="PAMR",show_sil=TRUE,print_genes=TRUE,threshold = 0,
                               TOP = 100,GSE=FALSE,plot_mean_sil=FALSE,stats_clust=res[[2]],
                               samples_data = as.data.frame(groups_men[,1,drop=FALSE]))