% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/map_peptides_on_structure.R
\name{map_peptides_on_structure}
\alias{map_peptides_on_structure}
\title{Maps peptides onto a PDB structure or AlphaFold prediction}
\usage{
map_peptides_on_structure(
  peptide_data,
  uniprot_id,
  pdb_id,
  chain,
  auth_seq_id,
  map_value,
  baseline_map_value = NULL,
  file_format = ".cif",
  alphafold_version = "v6",
  scale_per_structure = TRUE,
  export_location = NULL,
  structure_file = NULL,
  show_progress = TRUE
)
}
\arguments{
\item{peptide_data}{a data frame that contains the input columns to this function. If structure
or prediction files should be fetched automatically, please provide column names to the following
arguments: \strong{uniprot_id}, \strong{pdb_id}, \strong{chain}, \strong{auth_seq_id},
\strong{map_value}. If no PDB structure for a protein is available the \code{pdb_id} and \code{chain}
column should contain NA at these positions. If a structure or prediction file is provided in the
\code{structure_file} argument, this data frame should only contain information associated with
the provided structure. In case of a user provided structure, column names should be provided to
the following arguments: \strong{uniprot_id}, \strong{chain}, \strong{auth_seq_id}, \strong{map_value}.}

\item{uniprot_id}{a character column in the \code{peptide_data} data frame that contains UniProt
identifiers for a corresponding peptide, protein region or amino acid.}

\item{pdb_id}{a character column in the \code{peptide_data} data frame that contains PDB
identifiers for structures in which a corresponding peptide, protein region or amino acid is found.
If a protein prediction should be fetched from AlphaFold, this column should contain NA. This
column is not required if a structure or prediction file is provided in the \code{structure_file}
argument.}

\item{chain}{a character column in the \code{peptide_data} data frame that contains the name of
the chain from the PDB structure in which the peptide, protein region or amino acid is found.
If a protein prediction should be fetched from AlphaFold, this column should contain NA. If an
AlphaFold prediction is provided to the \code{structure_file} argument the chain should be
provided as usual (All AlphaFold predictions only have chain A). \strong{Important:} please provide
the author defined chain definitions for both ".cif" and ".pdb" files. When the output of the
\code{find_peptide_in_structure} function is used as the input for this function, this
corresponds to the \code{auth_asym_id} column.}

\item{auth_seq_id}{a character (or numeric) column in the \code{peptide_data} data frame
that contains semicolon separated positions of peptides, protein regions or amino acids in the
corresponding PDB structure or AlphaFold prediction. Can be \code{NA} for rows that should not be mapped.
This information can be obtained from the \code{find_peptide_in_structure} function. The corresponding
column in the output is called \code{auth_seq_id}. In case of AlphaFold predictions, UniProt positions
should be used. If signal positions and not stretches of amino acids are provided, the column
can be numeric and does not need to contain the semicolon separator.}

\item{map_value}{a numeric column in the \code{peptide_data} data frame that contains a value
associated with each peptide, protein region or amino acid. If one start to end position pair
has multiple different map values, the maximum will be used. This value will be displayed as a
colour gradient when mapped onto the structure. The value can for example be the fold change,
p-value or score associated with each peptide, protein region or amino acid (selection). If
the selections should be displayed with just one colour, the value in this column should be
the same for every selection. For the mapping, values are scaled between 50 and 100. Regions
in the structure that do not map any selection receive a value of 0. If an amino acid position
is associated with multiple mapped values, e.g. from different peptides, the maximum mapped
value will be displayed.}

\item{baseline_map_value}{optional, a numeric value defining the baseline of the
\code{map_value}. If, for a given structure or protein or the whole dataset (\code{scale_per_structure = FALSE}),
all mapped values are equal to this baseline, the scaled values are set to 50, which is the lower
bound of the scaling range. If not provided, constant values are scaled to 100, which is the upper bound.}

\item{file_format}{a character vector containing the file format of the structure that will be
fetched from the database for the PDB identifiers provided in the \code{pdb_id} column. This
can be either ".cif" or ".pdb". The default is \code{".cif"}. We recommend using ".cif" files
since every structure contains a ".cif" file but not every structure contains a ".pdb" file.
Fetching and mapping onto ".cif" files takes longer than for ".pdb" files. If a structure file
is provided in the \code{structure_file} argument, the file format is detected automatically
and does not need to be provided.}

\item{alphafold_version}{a character value that specifies the alphafold version that should be
used. This is regularly updated by the database. We always try to make the current version the
default version. Available version can be found here: https://ftp.ebi.ac.uk/pub/databases/alphafold/}

\item{scale_per_structure}{a logical value that specifies if scaling should be performed for
each structure independently (TRUE) or over the whole data set (FALSE). The default is TRUE,
which scales the scores of each structure independently so that each structure has a score
range from 50 to 100.}

\item{export_location}{optional, a character argument specifying the path to the location in
which the fetched and altered structure files should be saved. If left empty, they will be
saved in the current working directory. The location should be provided in the following
format "folderA/folderB".}

\item{structure_file}{optional, a character argument specifying the path to the location and
name of a structure file in ".cif" or ".pdb" format. If a structure is provided the \code{peptide_data}
data frame should only contain mapping information for this structure.}

\item{show_progress}{a logical, if \code{show_progress = TRUE}, a progress bar will be shown
(default is TRUE).}
}
\value{
The function exports a modified ".pdb" or ".cif" structure file. B-factors have been
replaced with scaled (50-100) values provided in the \code{map_value} column.
}
\description{
Peptides are mapped onto PDB structures or AlphaFold prediction based on their positions.
This is accomplished by replacing the B-factor information in the structure file with
values that allow highlighting of peptides, protein regions or amino acids when the structure
is coloured by B-factor. In addition to simply highlighting peptides, protein regions or amino
acids, a continuous variable such as fold changes associated with them can be mapped onto the
structure as a colour gradient.
}
\examples{
\donttest{
\dontshow{
.old_wd <- setwd(tempdir())
}
# Load libraries
library(dplyr)

# Create example data
peptide_data <- data.frame(
  uniprot_id = c("P0A8T7", "P0A8T7", "P60906"),
  peptide_sequence = c(
    "SGIVSFGKETKGKRRLVITPVDGSDPYEEMIPKWRQLNV",
    "NVFEGERVER",
    "AIGEVTDVVEKE"
  ),
  start = c(1160, 1197, 55),
  end = c(1198, 1206, 66),
  map_value = c(70, 100, 100)
)

# Find peptide positions in structures
positions_structure <- find_peptide_in_structure(
  peptide_data = peptide_data,
  peptide = peptide_sequence,
  start = start,
  end = end,
  uniprot_id = uniprot_id,
  retain_columns = c(map_value)) \%>\%
  filter(pdb_ids \%in\% c("6UU2", "2EL9"))

# Map peptides on structures
# You can determine the preferred output location
# with the export_location argument. Currently it
# is saved in the working directory.
map_peptides_on_structure(
  peptide_data = positions_structure,
  uniprot_id = uniprot_id,
  pdb_id = pdb_ids,
  chain = auth_asym_id,
  auth_seq_id = auth_seq_id,
  map_value = map_value,
  file_format = ".pdb",
  export_location = getwd()
)

\dontshow{
setwd(.old_wd)
}
}
}
