\name{TheilSen}
\alias{MannKendall}
\alias{TheilSen}
\title{Tests for trends using Theil-Sen estimates}
\usage{
TheilSen(mydata, pollutant = "nox", deseason = FALSE, type = "default",
avg.time = "month", statistic = "mean", percentile = NA, data.thresh = 0,
alpha = 0.05, dec.place = 2, xlab = "year", lab.frac = 0.99, lab.cex = 0.8,
x.relation = "same", y.relation = "same", data.col = "cornflowerblue",
line.col = "red", text.col = "darkgreen", cols = NULL, auto.text = TRUE,
autocor = FALSE, slope.percent = FALSE, date.breaks = 7,...)

      MannKendall(mydata,...)
}
\arguments{
  \item{mydata}{A data frame containing the field
  \code{date} and at least one other parameter for which a
  trend test is required; typically (but not necessarily) a
  pollutant.}

  \item{pollutant}{The parameter for which a trend test is
  required.  Mandatory.}

  \item{deseason}{Should the data be de-deasonalized first?
  If \code{TRUE} the function \code{stl} is used (seasonal
  trend decomposition using loess). Note that if
  \code{TRUE} missing data are first linearly interpolated
  because \code{stl} cannot handle missing data.}

  \item{type}{\code{type} determines how the data are split
  i.e. conditioned, and then plotted. The default is will
  produce a single plot using the entire data. Type can be
  one of the built-in types as detailed in \code{cutData}
  e.g. \dQuote{season}, \dQuote{year}, \dQuote{weekday} and
  so on. For example, \code{type = "season"} will produce
  four plots --- one for each season.

  It is also possible to choose \code{type} as another
  variable in the data frame. If that variable is numeric,
  then the data will be split into four quantiles (if
  possible) and labelled accordingly. If type is an
  existing character or factor variable, then those
  categories/levels will be used directly. This offers
  great flexibility for understanding the variation of
  different variables and how they depend on one another.

  Type can be up length two e.g. \code{type = c("season",
  "weekday")} will produce a 2x2 plot split by season and
  day of the week. Note, when two types are provided the
  first forms the columns and the second the rows.}

  \item{avg.time}{Can be \dQuote{month} (the default),
  \dQuote{season} or \dQuote{year}. Determines the time
  over which data should be averaged. Note that for
  \dQuote{year}, six or more years are required. For
  \dQuote{season} the data are split up into spring: March,
  April, May etc. Note that December is considered as
  belonging to winter of the following year.}

  \item{statistic}{Statistic used for calculating monthly
  values. Default is \dQuote{mean}, but can also be
  \dQuote{percentile}. See \code{timeAverage} for more
  details.}

  \item{percentile}{Single percentile value to use if
  \code{statistic = "percentile"} is chosen.}

  \item{data.thresh}{The data capture threshold to use (%)
  when aggregating the data using \code{avg.time}. A value
  of zero means that all available data will be used in a
  particular period regardless if of the number of values
  available. Conversely, a value of 100 will mean that all
  data will need to be present for the average to be
  calculated, else it is recorded as \code{NA}.}

  \item{alpha}{For the confidence interval calculations of
  the slope. The default is 0.05. To show 99\% confidence
  intervals for the value of the trend, choose alpha = 0.01
  etc.}

  \item{dec.place}{The number of decimal places to display
  the trend estimate at. The default is 2.}

  \item{xlab}{x-axis label, by default \code{"year"}.}

  \item{lab.frac}{Fraction along the y-axis that the trend
  information should be printed at, default 0.99.}

  \item{lab.cex}{Size of text for trend information.}

  \item{x.relation}{This determines how the x-axis scale is
  plotted. \dQuote{same} ensures all panels use the same
  scale and \dQuote{free} will use panel-specfic scales.
  The latter is a useful setting when plotting data with
  very different values.}

  \item{y.relation}{This determines how the y-axis scale is
  plotted. \dQuote{same} ensures all panels use the same
  scale and \dQuote{free} will use panel-specfic scales.
  The latter is a useful setting when plotting data with
  very different values.}

  \item{data.col}{Colour name for the data}

  \item{line.col}{Colour name for the slope and uncertainty
  estimates}

  \item{text.col}{Colour name for the slope/uncertainty
  numeric estimates}

  \item{cols}{Predefined colour scheme, currently only
  enabled for \code{"greyscale"}.}

  \item{auto.text}{Either \code{TRUE} (default) or
  \code{FALSE}. If \code{TRUE} titles and axis labels will
  automatically try and format pollutant names and units
  properly e.g.  by subscripting the \sQuote{2} in NO2.}

  \item{autocor}{Should autocorrelation be considered in
  the trend uncertainty estimates? The default is
  \code{FALSE}. Generally, accounting for autocorrelation
  increases the uncertainty of the trend estimate ---
  sometimes by a large amount.}

  \item{slope.percent}{Should the slope and the slope
  uncertainties be expressed as a percentage change per
  year? The default is \code{FALSE} and the slope is
  expressed as an average units/year change e.g. ppb.
  Percentage changes can often be confusing and should be
  clearly defined.  Here the percentage change is expressed
  as 100 * (C.end/C.start - 1) / (end.year - start.year).
  Where C.start is the concentration at the start date and
  C.end is the concentration at the end date.

  For \code{avg.time = "year"} (end.year - start.year) will
  be the total number of years - 1. For example, given a
  concentration in year 1 of 100 units and a percentage
  reduction of 5%/yr, after 5 years there will be 75 units
  but the actual time span will be 6 years i.e. year 1 is
  used as a reference year. Things are slightly different
  for monthly values e.g.  \code{avg.time = "month"}, which
  will use the total number of months as a basis of the
  time span and is therefore able to deal with partial
  years.  There can be slight differences in the %/yr trend
  estimate therefore, depending on whether monthly or
  annual values are considered.}

  \item{date.breaks}{Number of major x-axis intervals to
  use. The function will try and choose a sensible number
  of dates/times as well as formatting the date/time
  appropriately to the range being considered.  This does
  not always work as desired automatically. The user can
  therefore increase or decrease the number of intervals by
  adjusting the value of \code{date.breaks} up or down.}

  \item{...}{Other graphical parameters passed onto
  \code{cutData} and \code{lattice:xyplot}. For example,
  \code{TheilSen} passes the option \code{hemisphere =
  "southern"} on to \code{cutData} to provide southern
  (rather than default northern) hemisphere handling of
  \code{type = "season"}.  Similarly, common axis and title
  labelling options (such as \code{xlab}, \code{ylab},
  \code{main}) are passed to \code{xyplot} via
  \code{quickText} to handle routine formatting.}
}
\value{
As well as generating the plot itself, \code{TheilSen} also
returns an object of class ``openair''. The object includes
three main components: \code{call}, the command used to
generate the plot; \code{data}, the data frame of
summarised information used to make the plot; and
\code{plot}, the plot itself. If retained, e.g. using
\code{output <- TheilSen(mydata, "nox")}, this output can
be used to recover the data, reproduce or rework the
original plot or undertake further analysis.

An openair output can be manipulated using a number of
generic operations, including \code{print}, \code{plot} and
\code{summary}. See \code{\link{openair.generics}} for
further details.

The \code{data} component of the \code{TheilSen} output
includes two subsets: \code{main.data}, the monthly data
\code{res2} the trend statistics. For \code{output <-
TheilSen(mydata, "nox")}, these can be extracted as
\code{object$data$main.data} and \code{object$data$res2},
respectively.

Note: In the case of the intercept, it is assumed the
y-axis crosses the x-axis on 1/1/1970.
}
\description{
Theil-Sen slope estimates and tests for trend.
}
\details{
The \code{TheilSen} function provides a collection of
functions to analyse trends in air pollution data. The
Mann-Kendall test is a commonly used test in environmental
sciences to detect the presence of a trend. It is often
used with the Theil-Sen (or just Sen) estimate of slope.
See references.  The \code{TheilSen} function is flexible
in the sense that it can be applied to data in many ways
e.g. by day of the week, hour of day and wind direction.
This flexibility makes it much easier to draw inferences
from data e.g. why is there a strong downward trend in
concentration from one wind sector and not another, or why
trends on one day of the week or a certain time of day are
unexpected.

For data that are strongly seasonal, perhaps from a
background site, or a pollutant such as ozone, it will be
important to deseasonalise the data (using the option
\code{deseason = TRUE}.Similarly, for data that increase,
then decrease, or show sharp changes it may be better to
use \code{\link{smoothTrend}}.

Note! that since version 0.5-11 openair uses Theil-Sen to
derive the p values also. This is to ensure there is
consistency between the calculated p value and other trend
parameters i.e. slope estimates and uncertainties. This
change may slightly affect some of the p-estimates
previously given by openair because the p estimates are now
calculated using bootstrap resampling by default and
previously they were not. However, users can still for the
moment call the \code{TheilSen} function using
\code{MannKendall}.

Note that the symbols shown next to each trend estimate
relate to how statistically significant the trend estimate
is: p $<$ 0.001 = ***, p $<$ 0.01 = **, p $<$ 0.05 = * and
p $<$ 0.1 = $+$.

Some of the code used in \code{TheilSen} is based on that
from Rand Wilcox \url{http://www-rcf.usc.edu/~rwilcox/}.
This mostly relates to the Theil-Sen slope estimates and
uncertainties. Further modifications have been made to take
account of correlated data based on Kunsch (1989). The
basic function has been adapted to take account of
auto-correlated data using block bootstrap simulations if
\code{autocor = TRUE} (Kunsch, 1989). We follow the
suggestion of Kunsch (1989) of setting the block length to
n(1/3) where n is the length of the time series.

The slope estimate and confidence intervals in the slope
are plotted and numerical information presented.
}
\examples{
# load example data from package
data(mydata)

# trend plot for nox
TheilSen(mydata, pollutant = "nox")

# trend plot for ozone with p=0.01 i.e. uncertainty in slope shown at
# 99 \% confidence interval

\dontrun{TheilSen(mydata, pollutant = "o3", ylab = "o3 (ppb)", alpha = 0.01)}

# trend plot by each of 8 wind sectors
\dontrun{TheilSen(mydata, pollutant = "o3", type = "wd", ylab = "o3 (ppb)")}

# and for a subset of data (from year 2000 onwards)
\dontrun{TheilSen(select.by.date(mydata, year = 2000:2005), pollutant = "o3", ylab = "o3 (ppb)")}
}
\author{
David Carslaw with some trend code from Rand Wilcox
}
\references{
Helsel, D., Hirsch, R., 2002. Statistical methods in water
resources. US Geological Survey.
\url{http://pubs.usgs.gov/twri/twri4a3/}. Note that this is
a very good resource for statistics as applied to
environmental data.

Hirsch, R. M., Slack, J. R., Smith, R. A., 1982. Techniques
of trend analysis for monthly water-quality data. Water
Resources Research 18 (1), 107-121.

Kunsch, H. R., 1989. The jackknife and the bootstrap for
general stationary observations. Annals of Statistics 17
(3), 1217-1241.

Sen, P. K., 1968. Estimates of regression coefficient based
on Kendall's tau. Journal of the American Statistical
Association 63(324).

Theil, H., 1950. A rank invariant method of linear and
polynomial regression analysis, i, ii, iii. Proceedings of
the Koninklijke Nederlandse Akademie Wetenschappen, Series
A - Mathematical Sciences 53, 386-392, 521-525, 1397-1412.

\dots{} see also several of the Air Quality Expert Group
(AQEG) reports for the use of similar tests applied to
UK/European air quality data, see
\url{http://www.defra.gov.uk/ENVIRONMENT/airquality/panels/aqeg/}.
}
\seealso{
See \code{\link{smoothTrend}} for a flexible approach to
estimating trends using nonparametric regression. The
\code{smoothTrend} function is suitable for cases where
trends are not monotonic and is probably better for
exploring the shape of trends.
}
\keyword{methods}

