% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/extremevalue.R
\name{KRDetect.outliers.EV}
\alias{KRDetect.outliers.EV}
\title{Identification of outliers using extreme value theory}
\usage{
KRDetect.outliers.EV(x, perform.smoothing = TRUE,
  bandwidth.type = "local", bandwidth.value = NULL,
  extremal.index.min = NULL, extremal.index.max = NULL,
  block.length = round(sqrt(length(na.omit(x)))), threshold.min = NULL,
  threshold.max = NULL, return.period = 120)
}
\arguments{
\item{x}{a numeric vector of observations.}

\item{perform.smoothing}{a logical value specifying if data smoothing is performed. If \code{TRUE} (default), data are smoothed.}

\item{bandwidth.type}{a character string specifying the type of bandwidth, must be \code{"local"} (default) or \code{"global"}.}

\item{bandwidth.value}{a local bandwidth array (for \code{bandwidth.type = "local"}) or global bandwidth value (for \code{bandwidth.type = "global"}) for kernel regression estimation. If \code{bandwidth.type = "NULL"} (default) a data-adaptive local plug-in (Herrmann, 1997) (for \code{bandwidth.type = "local"}) or data-adaptive global plug-in (Gasser et al., 1991) (for \code{bandwidth.type = "global"}) bandwidth is used instead.}

\item{extremal.index.min}{a numeric value giving the extremal index for identification of outliers with extremely low value. If \code{extremal.index.min = NULL} (default), estimate of (Gomes, 1993) is used.}

\item{extremal.index.max}{a numeric value giving the extremal index for identification of outliers with extremely high value. If \code{extremal.index.max = NULL} (default), estimate of (Gomes, 1993) is used.}

\item{block.length}{a numeric value giving the length of blocks for estimation of extremal index. Default is \eqn{round(sqrt(length(na.omit(x))))}.}

\item{threshold.min}{a threshold value for residuals with low values, that is used to estimate the parameters of Generalized Pareto distribution. If \code{threshold.min = NULL} (default), threshold is estimated as 90\% quantile of smoothing residuals.}

\item{threshold.max}{a threshold value for residuals with high values, that is used to estimate the parameters of Generalized Pareto distribution. If \code{threshold.max = NULL} (default), threshold is estimated as 90\% quantile of smoothing residuals.}

\item{return.period}{a positive numeric value giving return period. Default is \code{r = 120}, which means that observations whose values are exceeded on average once every 120 observations are detected as outliers.}
}
\value{
A list is returned with elements:
\item{method.type}{a character string giving the type of method used for outlier idetification}
\item{x}{a numeric vector of observations}
\item{index}{a numeric vector of index design points assigned to individual observations}
\item{smoothed}{a numeric vector of estimates of the kernel regression function (smoothed data)}
\item{sigma_u.min}{a numeric value giving scale parameter of Generalised Pareto distribution used for identification of outliers with extremely low value}
\item{sigma_u.max}{a numeric value giving scale parameter of Generalised Pareto distribution used for identification of outliers with extremely high value}
\item{xi.min}{a numeric value giving shape parameter of Generalised Pareto distribution used for identification of outliers with extremely low value}
\item{xi.max}{a numeric value giving shape parameter of Generalised Pareto distribution used for identification of outliers with extremely high value}
\item{lambda_u.min}{a numeric value giving relative frequency of the number of threshold value exceedances and identification of outliers with extremely low value}
\item{lambda_u.max}{a numeric value giving relative frequency of the number of threshold value exceedances and identification of outliers with extremely high value}
\item{extremal.index.min}{a numeric value giving extremal index used for identification of outliers with extremely low value}
\item{extremal.index.max}{a numeric value giving extremal index used for identification of outliers with extremely high value}
\item{threshold.min}{a numeric value giving threshold used in Peaks Over Threshold model and identification of outlier with extremely low value}
\item{threshold.max}{a numeric value giving threshold used in Peaks Over Threshold model and identification of outlier with extremely high value}
\item{return.level.min}{a numeric value giving return level used for identification of outliers with extremely low value}
\item{return.level.max}{a numeric value giving return level used for identification of outliers with extremely high value}
\item{outlier.min}{a logical vector specyfing the identified outliers with extremely low value. \code{TRUE} means that corresponding observation from vector \code{x} is detected as outlier}
\item{outlier.max}{a logical vector specyfing the identified outliers with extremely high value. \code{TRUE} means that corresponding observation from vector \code{x} is detected as outlier}
\item{outlier}{a logical vector specyfing the identified outliers with both extremely low and extremely high value. \code{TRUE} means that corresponding observation from vector \code{x} is detected as outlier}
}
\description{
Identification of outliers in environmental data using semiparametric method based on kernel smoothing and extreme value theory (Holesovsky et al., 2018). The outliers are identified as observations whose values are exceeded on average once a given period that is specified by the user.
}
\details{
This function identifies outliers in time series using two-step procedure (Holesovsky et al., 2018). The procedure consists of kernel smoothing and extreme value estimation of high threshold exceedances for smoothing residuals.
Outliers with both extremely high and extremely low values are identified.
Crucial for the method is the choice of return period - parameter defining the criterion for outliers detection.
The outliers with extremely high values are detected as observations whose values are exceeded on average once a given return.period of observations. Analogous, the outliers with extremely low values are identified.
}
\examples{
data("mydata", package = "openair")
x = mydata$o3[format(mydata$date, "\%m \%Y") == "12 2002"]
result = KRDetect.outliers.EV(x)
KRDetect.outliers.plot(result)
}
\references{
Holesovsky J, Campulova M, Michalek J (2018). Semiparametric Outlier Detection in Nonstationary Times Series: Case Study for Atmospheric Pollution in Brno, Czech Republic. Atmospheric Pollution Research, 9(1).

Theo Gasser, Alois Kneip & Walter Koehler (1991) A flexible and fast method for automatic smoothing. Journal of the American Statistical Association 86, 643-652. https://doi.org/10.2307/2290393

E. Herrmann (1997) Local bandwidth choice in kernel regression estimation. Journal of Graphical and Computational Statistics 6, 35-54.

Herrmann E, Maechler M (2013). lokern: Kernel Regression Smoothing with Local or Global Plug-in Bandwidth. R package version 1.1-5, URL http://CRAN.R-project.org/package=lokern.

Gomes M (1993). On the estimation of parameter of rare events in environmental time series. In Statistics for the Environment, volume 2 of Water Related Issues, pp. 225-241. Wiley.

Heffernan JE, Stephenson AG (2016). ismev: An Introduction to Statistical Modeling of Extreme Values. R package version 1.41, URL http://CRAN.R-project.org/package=ismev.

Coles S (2001). An Introduction to Statistical Modeling of Extreme Values. 3 edition. London: Springer. ISBN 1-85233-459-2.

Pickands J (1975). Statistical inference using extreme order statistics. The Annals of Statistics, 3(1), 119-131.
}
