\name{phyloFit}
\alias{phyloFit}
\title{Fit a Phylogenetic model to an alignment...}
\usage{
  phyloFit(msa, tree = NULL, subst.mod = "REV",
    init.mod = NULL, no.opt = c("backgd"),
    init.backgd.from.data = ifelse(is.null(init.mod), TRUE, FALSE),
    features = NULL, scale.only = FALSE,
    scale.subtree = NULL, nrates = NULL, alpha = 1,
    rate.constants = NULL, selection = NULL,
    init.random = FALSE, init.parsimony = FALSE,
    clock = FALSE, EM = FALSE, max.EM.its = NULL,
    precision = "HIGH", ninf.sites = 50, quiet = FALSE,
    bound = NULL, log.file = FALSE)
}
\arguments{
  \item{msa}{An alignment object.  May be altered if passed
  in as a pointer to C memory (see Note).}

  \item{tree}{A character string containing a Newick
  formatted tree defining the topology.  Required if the
  number of species > 3, unless init.mod is specified.  The
  topology must be rooted, although the root is ignored if
  the substitution model is reversible.}

  \item{subst.mod}{The substitution model to use.  Some
  possible models include "REV", "JC69", "K80", "F81",
  "HKY85", "R2", "U2".  Run \code{subst.mods()} for a full
  list; some models are experimental.}

  \item{init.mod}{An object of class \code{tm} used to
  initialize the model.}

  \item{no.opt}{A character vector indicating which
  parameters NOT to optimize (instead hold constant at
  their initial values).  By default, the equilibrium
  frequencies (backgd) are not optimized.  Other parameters
  that may be indicated here are "ratematrix" for the
  entire rate matrix, "kappa" for models with
  transition/transversion ratios, "branches" to hold all
  branch lengths constant, "ratevar" for rate variation
  parameters, "scale" for the tree scaling factor, and
  "scale_sub" for the subtree scaling factor. This argument
  does NOT apply to parameters of a lineage-specific model
  created with \code{add.ls.mod}, though such parameters
  can be held constant by using appropriate arguments when
  the model is created.  See \code{\link{add.ls.mod}} for
  more details about lineage-specific models.}

  \item{init.backgd.from.data}{A logical value; can be
  \code{FALSE} only if init.mod is provided.  If
  \code{TRUE}, use observed base frequencies in data to
  initialize equilibrium frequencies.  Otherwise use the
  values from init.mod.  By default uses init.mod values if
  provided.}

  \item{features}{An object of type \code{feat}.  If given,
  a separate model will be estimated for each feature
  type.}

  \item{scale.only}{A logical value. If \code{TRUE},
  estimate only the scale of the tree.  Branches will be
  held at initial values.  Useful in conjunction with
  init.mod.}

  \item{scale.subtree}{A character string giving the name
  of a node in a tree. This option implies scale.only=TRUE.
  If given, estimate separate scale factors for subtree
  beneath identified node and the rest of the tree.  The
  branch leading to the subtree is included in the
  subtree.}

  \item{nrates}{An integer.  The number of rate categories
  to use. Specifying a value greater than one causes the
  discrete gamma model for rate variation to be used,
  unless rate constants are specified.  The default value
  \code{NULL} implies a single rate category.}

  \item{alpha}{A numeric value > 0, for use with "nrates".
  Initial value for alpha, the shape parameter of the gamma
  distribution.}

  \item{rate.constants}{A numeric vector.  Implies
  \code{nrates = length(rate.constants)}.  Also implies
  \code{EM=TRUE}.  Uses a non-parametric mixture model for
  rates, instead of a gamma distribution. The weight
  associated with each rate will be estimated.  alpha may
  still be used to initialize these weights.}

  \item{selection}{A numeric value.  If provided, use
  selection in the model. The value given will be the
  initial value for selection.  If \code{NULL}, selection
  will not be used unless init.mod is provided and
  indicates a model with selection.  selection scales the
  rate matrix by s/(1-exp(-s)). Selection is applied after
  the rate matrix is scaled so that the expected number of
  substitutions per unit time is 1.  When using codon
  models, selection only scales nonsynonymous
  substitutions.}

  \item{init.random}{A logical value.  If \code{TRUE},
  parameters will be initialized randomly.}

  \item{init.parsimony}{A logical value.  If \code{TRUE},
  branch lengths will be estimated based on parsimony
  counts for the alignments. Currently only works for
  models of order0.}

  \item{clock}{A logical value.  If \code{TRUE}, assume a
  molecular clock in estimation.}

  \item{EM}{A logical value.  If \code{TRUE}, the model is
  fit using EM rather than the default BFGS quasi-Newton
  algorithm.  Not available for all models/options.}

  \item{max.EM.its}{An integer value; only applies if
  \code{EM==TRUE}.  The maximum number of EM iterations to
  perform.  The EM algorithm may quit earlier if other
  convergence criteria are met.}

  \item{precision}{A character vector, one of "HIGH",
  "MED", or "LOW", denoting the level of precision to use
  in estimating model parameters. Affects convergence
  criteria for iterative algorithms: higher precision means
  more iterations and longer execution time.}

  \item{ninf.sites}{An integer.  Require at least this many
  "informative" sites in order to estimate a model.  An
  informative site as an alignment column with at least two
  non-gap and non-missing-data characers.}

  \item{quiet}{A logical value.  If \code{TRUE}, do not
  report progress to screen.}

  \item{bound}{Defines boundaries for parameters (see
  Details below).}

  \item{log.file}{If TRUE, write log of optimization to the
  screen.  If a character string, write log of optimization
  to the named file.  Otherwise write no optimization log.}
}
\value{
  An object of class \code{tm} (tree model), or (if several
  models are computed, as is possible with the features or
  windows options), a list of objects of class \code{tm}.
}
\description{
  Fit a Phylogenetic model to an alignment
}
\note{
  If msa or features object are passed in as pointers to C
  memory, they may be altered by this function!  Use
  \code{copy.msa(msa)} or \code{copy.feat(features)} to
  avoid this behavior!
}
\section{Parameter boundaries}{
  Boundaries can be set for some parameters using the bound
  argument.  The bound argument should be a vector of
  character strings, each element defines the boundaries
  for a single parameter.  The boundaries are best
  explained by example.  A value of \code{c("scale[0,1]",
  "scale_sub[1,]", "kappa[,3]")} would imply to keep the
  scale between 0 and 1, the subtree scale between 1 and
  infinity, and kappa between 0 and 3.  The blank entries
  in the subtree_scale upper bound and kappa's lower bound
  indicate not to set this boundary, in which case the
  normal default boundary will be used for that parameter.
  (Most parameters are defined between 0 and infinity).
  Most of the parameters listed in the description of
  no.opt can also have their boundaries set in this way.
}
\examples{
exampleArchive <- system.file("extdata", "examples.zip", package="rphast")
files <- c("ENr334-100k.maf", "ENr334-100k.fa", "gencode.ENr334-100k.gff", "rev.mod")
unzip(exampleArchive, files)
m <- read.msa("ENr334-100k.maf")
mod <- phyloFit(m, tree="((hg18, (mm9, rn4)), canFam2)")
mod
phyloFit(m, init.mod=mod)
likelihood.msa(m, mod)
mod$likelihood
print(mod$likelihood, digits=10)
f <- read.feat("gencode.ENr334-100k.gff")
mod <- phyloFit(m, tree="((hg18, (mm9, rn4)), canFam2)",
                features=f, quiet=TRUE)
names(mod)
mod$other
mod[["5'flank"]]
phyloFit(m, init.mod=mod$AR, nrates=3, alpha=4.0)
phyloFit(m, init.mod=mod$AR, rate.constants=c(10, 5, 1))
# background frequencies options

# this should use the background frequencies from the initial mod
phyloFit(m, init.mod=mod$AR, quiet=TRUE)$backgd
mod$AR$backgd

# this should use the background frequencies from the data
phyloFit(m, init.mod=mod$AR, init.backgd.from.data=TRUE, quiet=TRUE)$backgd
mod$AR$backgd

# this should optimize the background frequencies
phyloFit(m, init.mod=mod$AR, no.opt=NULL, quiet=TRUE)$backgd
mod$AR$backgd

unlink(files)
}
\author{
  Melissa J. Hubisz and Adam Siepel
}
\keyword{features}
\keyword{msa}
\keyword{tm}
\keyword{trees}

