% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/cleanTS.R
\name{cleanTS}
\alias{cleanTS}
\title{Clean univariate time-series data}
\usage{
cleanTS(
  data,
  date_format,
  imp_methods = c("na_interpolation", "na_locf", "na_ma", "na_kalman"),
  time = NULL,
  value = NULL,
  replace_outliers = TRUE
)
}
\arguments{
\item{data}{A data frame containing the input data. By default, it considers
that the first column to contain the timestamps and the second column
contains the observations.If that is not the case or if it contains more than
two columns then specify the names of time and value columns using the
\code{time} and \code{value} arguments.}

\item{date_format}{Format of timestamps used in the data. It uses lubridate
formats as mentioned
\href{https://lubridate.tidyverse.org/reference/parse_date_time.html#details}{here}.
More than one formats can be using a vectors of strings.}

\item{imp_methods}{The imputation methods to be used.}

\item{time}{Optional, the name of column in provided data to be used as
time column.}

\item{value}{Optional, the name of column in provided data, to be used as
value column.}

\item{replace_outliers}{Boolean, if \code{TRUE} then the outliers found will be
removed and imputed using the given imputation methods.}
}
\value{
A \code{cleanTS} object which contains:
\itemize{
\item Cleaned data
\item Missing timestamps
\item Duplicate timestamps
\item Imputation errors
\item Outliers
\item Outlier imputation errors
}
}
\description{
\code{cleanTS()}is the main function of the package which creates a cleanTS
object. It performs all the different data cleaning tasks, such as
converting the timestamps to proper format, imputation of missing values,
handling outliers, etc. It is a wrapper function that calls all the other
internal functions to performs different data cleaning tasks.
}
\details{
The first task is to check the input time series data for structural and
data type-related errors. Since the functions need univariate time series
data, the input data is checked for the number of columns. By default, the
first column is considered to be the time column, and the second column to
be the observations. Alternatively, if the time and value arguments are
given, then those columns are used. The time column is converted to a POSIX
object. The value column is converted to a numeric type. The column names
are also changed to time and value. All the data is converted to a
\emph{data.table} object. This data is then passed to other functions to
check for missing and duplicate timestamps. If duplicate timestamps are
found, then the observation values are checked. If the observations are the
same, then only one copy of that observation is kept. But if the observations
are different, then it is not possible to find the correct one, so the
observation is set to NA. This data is the passed to a function for finding
and handling missing observations. The methods given in the imp_methods
argument are compared and selected. The MCAR and MAR values are handled
seperately. After the best methods are found, imputation is performed using
those methods. The user can also pass user-defined functions for comparison.
The user-defined function should follow the structure as the default
functions. It should take a numeric vector containing missing values as
input, and return a numeric vector of the same length without missing values
as output. Once the missing values are handled the data is checked for
outliers. If the replace_outliers parameter is set to TRUE in the cleanTS()
function, then the outliers are replaced by NA and imputed using the
procedure mentioned for imputing missing values. Then it creates a cleanTS
object which contains the cleaned data, missing timestamps, duplicate
timestamps, imputation methods, MCAR imputation error, MAR imputation error,
outliers, and if the outliers are replaced then imputation errors for those
imputations are also included. The \emph{cleanTS} object is returned by the
function.
}
\examples{
# Convert sunspots.month to dataframe
data <- timetk::tk_tbl(sunspot.month)
print(data)

# Randomly insert missing values to simulate missing value imputation
set.seed(10)
ind <- sample(nrow(data), 100)
data$value[ind] <- NA

# Perform cleaning
cts <- cleanTS(data, date_format = "my", time = "index", value = "value")
print(cts)
}
