\encoding{UTF-8}
\name{pdat_}
\alias{pdat_}
\alias{pdat_CRC}
\alias{pdat_pancreatic}
\alias{pdat_hypoxia}
\alias{pdat_osmotic}
\title{Get Protein Data}
\description{
  Get data on protein expression and chemical composition.
}

\usage{
  pdat_CRC(dataset = NULL, basis = "QEC")
}

\arguments{
  \item{dataset}{character, specifies which dataset to retrieve}
  \item{basis}{character, keyword for basis species to use}
}

\details{
The \code{pdat_} functions calculate chemical compositional metrics (using \code{\link{protcomp}}) for relatively up- and down-expressed proteins reported in proteomic experiments.

Use \code{pdat_CRC} to retrieve data for protein expression in colorectal cancer, \code{pdat_pancreactic} for data on pancreatic cancer, \code{pdat_hypoxia} for data on hypoxia or 3D culture, and \code{pdat_osmotic} for data on hyperosmotic stress.
The functions get relative expression data from the CSV files stored in \code{extdata/expression/}, with subdirectories corresponding to the names of the functions.
Some of the functions also retrieve amino acid compositions from the files in \code{extdata/aa/} (for non-human proteins).

If \code{dataset} is \code{NULL}, the return value gives the names of all datasets that can be retrieved using the function.
Provide one of these names as the \code{dataset} argument to retrieve the data.
Each dataset name indicates the study (publication) where the data were reported, constructed by combining the first characters of the (first three or four) authors' family names with the 2-digit year of publication.
This coincides with the key-generation scheme used in some bibliography manager software.
This abbreviation also is used to name the CSV file containing the data.
If more than one dataset is available from a single study (for example, for relative protein expression in different stages of cancer), \code{dataset} is suffixed by an underscore followed by a short abbreviation indicating the particular dataset.

Tables listing mean compositional differences between up- and down-expressed proteins for each dataset are saved in \code{extdata/summary/}.
These files were created using the second example below.
}

\value{
A list consisting of \code{dataset} (the name of the dataset), \code{basis} (basis species used for the calculations), \code{description} (descriptive text), \code{pcomp} (compositional data generated by \code{\link{protcomp}}), \code{up2} (logical vector with length equal to the number of proteins; TRUE if the protein is up-expressed in group 2 compared to group 1 (i.e. cancer compared to normal), FALSE otherwise), \code{names} (gene names for the proteins, if available).
}

\seealso{
  \code{\link{get_pdat}}
}

\examples{
library(CHNOSZ)
pdat_CRC()
pdat_CRC("JKMF10")  # same result as get_pdat("JKMF10")

\dontrun{
# how the extdata/summary/summary_*.csv files were made
for(what in c("CRC", "pancreatic", "hypoxia", "osmotic")) {
  pdat_fun <- paste0("pdat_", what)
  datasets <- get(pdat_fun)()
  comptab <- lapply_canprot(datasets, function(dataset) {
    pdat <- get_pdat(dataset, pdat_fun)
    get_comptab(pdat)
  }, varlist = "pdat_fun")
  # write summary table
  comptab <- do.call(rbind, comptab)
  comptab <- cbind(set = c(letters, LETTERS)[1:nrow(comptab)], comptab)
  comptab[, 6:15] <- signif(comptab[, 6:15], 4)
  filename <- paste0("summary_", what, ".csv")
  write.csv(comptab, filename, row.names = FALSE, quote = 3)
}}
}

\concept{Protein expression}
