% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ID_functions.R
\name{FindDelMH}
\alias{FindDelMH}
\title{Return the length of microhomology at a deletion}
\usage{
FindDelMH(context, deleted.seq, pos, trace = 0, warn.cryptic = TRUE)
}
\arguments{
\item{context}{The deleted sequence plus ample surrounding
sequence on each side (at least as long as \code{del.sequence}).}

\item{deleted.seq}{The deleted sequence in \code{context}.}

\item{pos}{The position of \code{del.sequence} in \code{context}.}

\item{trace}{If > 0, then generate various 
messages showing how the computation is carried out.}

\item{warn.cryptic}{if \code{TRUE} generating a warning
if there is a cryptic repeat (see the example).}
}
\value{
The length of the maximum microhomology of \code{del.sequence}
  in \code{context}.
}
\description{
Return the length of microhomology at a deletion
}
\details{
This function is primarily for internal use, but we export it
to document the underlying logic.

Example:

\code{GGCTAGTT} aligned to \code{GGCTAGAACTAGTT} with
a deletion represented as:
\preformatted{

GGCTAGAACTAGTT
GG------CTAGTT GGCTAGTT GG[CTAGAA]CTAGTT
                           ----   ----
}

Presumed repair mechanism leading to this:

\preformatted{
  ....
GGCTAGAACTAGTT
CCGATCTTGATCAA

=>

  ....
GGCTAG      TT
CC      GATCAA
        ....

=>

GGCTAGTT
CCGATCAA

}

Variant-caller software can represent the
same deletion in several
different, but completely equivalent, ways.

\preformatted{

GGC------TAGTT GGCTAGTT GGC[TAGAAC]TAGTT
                          * ---  * ---

GGCT------AGTT GGCTAGTT GGCT[AGAACT]AGTT
                          ** --  ** --

GGCTA------GTT GGCTAGTT GGCTA[GAACTA]GTT
                          *** -  *** -

GGCTAG------TT GGCTAGTT GGCTAG[AACTAG]TT
                          ****   ****
}

This function finds:

\enumerate{

\item The maximum match of undeleted sequence to the left
of the deletion that is
identical to the right end of the deleted sequence, and

\item The maximum match of undeleted sequence to the right
of the deletion that
is identical to the left end of the deleted sequence.
}

The microhomology sequence is the concatenation of items
(1) and (2).

\strong{Warning}\cr
A deletion in a \emph{repeat} can also be represented
in several different ways. A deletion in a repeat
is abstractly equivalent to a deletion with microhomology that
spans the entire deleted sequence. For example;

\preformatted{
GACTAGCTAGTT
GACTA----GTT GACTAGTT GACTA[GCTA]GTT
                        *** -*** -
}

is really a repeat

\preformatted{
GACTAG----TT GACTAGTT GACTAG[CTAG]TT
                        **** ----

GACT----AGTT GACTAGTT GACT[AGCT]AGTT
                        ** --** --
}

\strong{This function only flags these
"cryptic repeats" with a -1 return; it does not figure
out the repeat extent.}
}
\section{ID classification}{

See \url{https://github.com/steverozen/ICAMS/blob/v3.0.9-branch/data-raw/PCAWG7_indel_classification_2021_09_03.xlsx}
for additional information on ID (small insertions and deletions) mutation
classification.

See the documentation for \code{\link{Canonicalize1Del}} which first handles
deletions in homopolymers, then handles deletions in simple repeats with
longer repeat units, (e.g. \code{CACACACA}, see
\code{\link{FindMaxRepeatDel}}), and if the deletion is not in a simple
repeat, looks for microhomology (see \code{\link{FindDelMH}}).

See the code for unexported function \code{\link{CanonicalizeID}}
and the functions it calls for handling of insertions.
}

\examples{
# GAGAGG[CTAGAA]CTAGTT
#        ----   ----
FindDelMH("GGAGAGGCTAGAACTAGTTAAAAA", "CTAGAA", 8, trace = 0)  # 4

# A cryptic repeat
# 
# TAAATTATTTATTAATTTATTG
# TAAATTA----TTAATTTATTG = TAAATTATTAATTTATTG
# 
# equivalent to
#
# TAAATTATTTATTAATTTATTG
# TAAAT----TATTAATTTATTG = TAAATTATTAATTTATTG 
# 
# and
#
# TAAATTATTTATTAATTTATTG
# TAAA----TTATTAATTTATTG = TAAATTATTAATTTATTG  

FindDelMH("TAAATTATTTATTAATTTATTG", "TTTA", 8, warn.cryptic = FALSE) # -1
}
