% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/IntegratedMRF.R
\name{Combination}
\alias{Combination}
\title{Weights for combination of predictions from different data subtypes using Least Square Regression based on various error estimation techniques}
\usage{
Combination(finalX, finalY_train, Cell, finalY_train_cell, n_tree, m_feature,
  min_leaf, Serial, Confidence_Level)
}
\arguments{
\item{finalX}{List of Matrices where each matrix represent a specific data subtype (such as genomic characterizations for 
drug sensitivity prediction). Each subtype can have different types of features. For example, if there are three subtypes containing
 100, 200 and 250 features respectively,  finalX will be a list containing 3 matrices of sizes M x 100, M x 200 and M x 250 
 where M is the number of Samples.}

\item{finalY_train}{A M x T matrix of output features for training samples, where M is number of samples and T is the number of output features. 
The dataset is assumed to contain no missing values. If there are missing values, an imputation method should be applied before using the function. 
A function 'Imputation' is included within the package.}

\item{Cell}{It contains a list of samples (the samples can be represented either numerically by indices or by names) for each data subtype. 
For the example of 3 data subtypes, it will be a list containing 3 arrays where each array contains the sample information for each data subtype.}

\item{finalY_train_cell}{Sample names of output features for training samples}

\item{n_tree}{Number of trees in the forest}

\item{m_feature}{Number of randomly selected features considered for a split in each regression tree node.}

\item{min_leaf}{Minimum number of samples in the leaf node}

\item{Serial}{Consists of a  list of all combinations of different subtypes of a dataset (except for the case with no dataset being selected). 
For example, if a 
dataset has 3 subtypes, then Serial is a list of size 2^3-1=7.  The ordering of the seven sets will be [1 2 3], [1 2], [1 3], [2 3], [1], [2], [3]}

\item{Confidence_Level}{Confidence level for calculation of confidence interval (User Defined)}
}
\value{
List with the following components: 
\item{BSP_coeff}{Combination weights using Bootstrap Error Model, where index are in list format. 
If number of genomic characterizations or subtypes of dataset is 5, then there will be 2^5-1=31 list of weights}
\item{Resub_coeff}{Combination weights using Resubstituition Error Model, where index are in list format. 
If number of genomic characterizations or subtypes of dataset is 5, then there will be 2^5-1=31 list of weights}
\item{BSP632_coeff}{Combination weights using 0.632Bootstrap Error Model, where index are in list format. 
If number of genomic characterizations or subtypes of dataset is 5, then there will be 2^5-1=31 list of weights}
\item{LOO_coeff}{Combination weights using Leave-One-Out Error Model, where index are in list format. 
If number of genomic characterizations or subtypes of dataset is 5, then there will be 2^5-1=31 list of weights} 
\item{Error}{Matrix of Mean Absolute Error, Mean Square Error and correlation between actual and predicted responses for integrated model based 
on Bootstrap, Re-substitution, 0.632Bootstrap and Leave-one-out error estimation sampling techniques for the integrated model 
containing all the data subtypes}
\item{Confidence Interval}{Low and High confidence interval for a user defined confidence level for the drug using Jackknife-After-Bootstrap Approach in a list}
\item{BSP_error_all_mae}{Bootstrap Mean Absolute Errors (MAE) for all combinations of the dataset subtypes. Size C x R, where C is the number of 
combinations and R is the number of output responses. C is in decreasing order, which means first value is combination of all subtypes 
and next ones are in decreasing order. For example, if a dataset has 3 subtypes, then C is equal to 2^3-1=7.  The ordering of C is the combination of
subtypes [1 2 3], [1 2], [1 3], [2 3], [1], [2], [3] }
\item{Resub_error_all_mae}{Re-substituition Mean Absolute Errors (MAE) for all combinations of the dataset subtypes. Size C x R, where C is the number of 
combinations and R is the number of output responses. C is in decreasing order, which means first value is combination of all subtypes 
and next ones are in decreasing order. For example, if a dataset has 3 subtypes, then C is equal to 2^3-1=7.  The ordering of C is the combination of
subtypes [1 2 3], [1 2], [1 3], [2 3], [1], [2], [3] }
\item{BSP632_error_all_mae}{0.632Bootstrap Mean Absolute Errors (MAE) for all combinations of the dataset subtypes. Size C x R, where C is the number of 
combinations and R is the number of output responses. C is in decreasing order, which means first value is combination of all subtypes 
and next ones are in decreasing order. For example, if a dataset has 3 subtypes, then C is equal to 2^3-1=7.  The ordering of C is the combination of
subtypes [1 2 3], [1 2], [1 3], [2 3], [1], [2], [3] }
\item{LOO_error_all_mae}{Leave One Out Mean Absolute Errors (MAE) for all combinations of the dataset subtypes. Size C x R, where C is the number of 
combinations and R is the number of output responses. C is in decreasing order, which means first value is combination of all subtypes 
and next ones are in decreasing order. For example, if a dataset has 3 subtypes, then C is equal to 2^3-1=7.  The ordering of C is the combination of
subtypes [1 2 3], [1 2], [1 3], [2 3], [1], [2], [3] }
The function also returns figures of different error estimation in .tiff format
}
\description{
Calculates combination weights for different subtypes of dataset combinations to generate integrated Random Forest (RF) or Multivariate 
Random Forest (MRF) model based on different error estimate options 
of Bootstrap, Re-substitution, 0.632 Bootstrap or Leave one out.
}
\details{
The function takes all the subtypes of dataset in matrix format and its corresponding sample information.
For the calculation purpose, we have taken the data of the samples that are common in all the subtypes and output training responses.
For example,  let a dataset has 3 sub-types with different number of samples and features, while indices of samples in subtype 1, 2, 3  and output feature matrix
is 1:10, 3:15, 5:16 and 5:11 respectively. So, features of sample index 5:10 (common to all subtypes and output feature matrix) of all subtypes and output feature 
matrix will be separated and considered for all calculations. 

For M x N dataset, N number of bootstrap sampling sets are considered. For each bootstrap sampling set and each subtype, a Random Forest (RF) 
or, Multivariate Random Forest (MRF) model is generated, which is used for calculating the prediction performance for out-of-bag samples.  
The prediction performance for each subtype of the dataset is based on the averaging over different bootstrap training sets. 
The combination weights (regression coefficients) for each combination of subtypes are generated using least Square Regression from the 
individual subtype predictions and used later to calculate mean absolute error, mean square error and correlation coefficient between 
predicted and actual values.

For re-substitution error estimation with M cell lines, 1 model is generated for each subtype of dataset, which is then used to 
calculate errors and combination weights for different data subtype combinations.   

For 0.632 Bootstrap error estimation, prediction of bootstrap and re-substitution error estimation is combined using 
0.632xBootstrap Error + 0.368xRe-substitution Error. 
These prediction results are then used to compute the errors and combination weights for different data subtype combinations.

Confidence Interval has been calculated using Jackkniffe-After-Bootstrap Approach and prediction result of bootstrap error estimation.

For leave-one-out error estimation using M cell lines, M models are generated for each subtype of dataset, which are then used to 
calculate the errors and combination weights for different data subtype combinations.
}
\examples{
#library(IntegratedPredictionUsingRandomForest)
#n_tree=10
#m_feature=5
#min_leaf=3
#Cell=NULL
#Expression=NULL
#finalX=NULL
#library(openxlsx)
#for (i in 1:5){#5=number_of_subtypes_in_dataset
#     Genome=read.xlsx("Subtype_filename.xlsx")
#     Expression[[i]]=Genome[complete.cases(Genome),]#remove all feature vector with NaN
#     Cell[[i]]=colnames(Expression[[i]], do.NULL = TRUE, prefix = "col")[-1]
#     # Taking the cell line names for that subtype of dataset
#     finalX[[i]]=matrix(as.numeric(t(Expression[[i]])[-1,]),nrow=length(Cell[[i]]))
#     #Input Matrix(MxN), with M number of samples and N number of features
#}
#Drug_Sen_train <- read.xlsx("Output_Response_FileName.xlsx", colNames = TRUE)
#for (j in 1:3){#3=Number_of_output_Response
#     XX=matrix(Drug_Sen_train[,Column_of_the_Response_for_Prediction],ncol=1)
#     finalY_train[,j]=matrix(Imputation(XX),ncol=1)
#}
#finalY_train_cell=Drug_Sen_train[,1]
#Serial=NULL
#library(caTools)
#for (p in length(Cell):1){
#       nk=combs(1:5,p)
#       sk=length(Serial)
#       for (q in 1:dim(nk)[1]){
#               Serial[[sk+q]]=nk[q, ]
#       }
#}
## Combination Index using Different Error Estimation Method
#Result=Combination(finalX,finalY_train,Cell,finalY_train_cell,n_tree,m_feature,min_leaf,Serial)
}
\references{
Wan, Qian, and Ranadip Pal. "An ensemble based top performing approach for 
NCI-DREAM drug sensitivity prediction challenge." PloS one 9.6 (2014): e101183.
}

