Type: | Package |
Title: | Smoothing Long-Memory Time Series |
Version: | 2.0.1 |
Description: | The nonparametric trend and its derivatives in equidistant time series (TS) with long-memory errors can be estimated. The estimation is conducted via local polynomial regression using an automatically selected bandwidth obtained by a built-in iterative plug-in algorithm or a bandwidth fixed by the user. The smoothing methods of the package are described in Letmathe, S., Beran, J. and Feng, Y., (2023) <doi:10.1080/03610926.2023.2276049>. |
License: | GPL-3 |
Encoding: | UTF-8 |
LazyData: | true |
Imports: | fracdiff, stats, utils, smoots, graphics, grDevices, Rcpp, future, furrr, ggplot2 |
Depends: | R (≥ 2.10) |
LinkingTo: | Rcpp, RcppArmadillo |
URL: | https://wiwi.uni-paderborn.de/en/dep4/feng/ |
Acknowledgments: | This work was supported by the German DFG project GZ-FE-1500-2-1. |
RoxygenNote: | 7.2.3 |
NeedsCompilation: | yes |
Packaged: | 2024-05-07 10:12:01 UTC; Dominik Schulz |
Author: | Yuanhua Feng [aut] (Paderborn University, Germany), Jan Beran [aut] (University of Konstanz, Germany), Sebastian Letmathe [aut] (Paderborn University, Germany), Dominik Schulz [aut, cre] (Paderborn University, Germany) |
Maintainer: | Dominik Schulz <dominik.schulz@uni-paderborn.de> |
Repository: | CRAN |
Date/Publication: | 2024-05-07 10:40:03 UTC |
esemifar: A package for data-driven nonparametric estimation of the trend and its derivatives in equidistant time series.
Description
The esemifar
package provides different applicable functions for the
estimation of the trend or its derivatives in equidistant time series.
The main functions include an automated bandwidth selection method for time
series with long-memory errors.
An alternative method to create an ESEMIFAR estimation object stitched together from a nonparametric and a parametric part.
Usage
esemifar(nonpar_model, par_model)
Arguments
nonpar_model |
an estimation object returned by |
par_model |
an estimation object returned by |
Details
The main function tsmoothlm
already returns a fully estimated
ESEMIFAR model. In some instances, alternative specifications of the
nonparametric and parametric model parts are needed, for which
tsmoothlm
with its automated estimation algorithm does not
provide sufficient flexibility. Therefore, this function allows to stitch
together a nonparametric model part returned by gsmooth
and
a FARIMA part for the residuals obtained via fracdiff
.
The resulting object can then be used for forecasting.
Value
The function returns a list of class "esemifar"
with elements
nonpar_model
and par_model
.
Functions (version 1.0.0)
The esemifar
functions are either meant for calculating nonparametric
estimates of the trend of a time series or its derivatives.
dsmoothlm
is a function that calculates the derivatives of the
trend after obtaining the optimal bandwidth by an iterative plug-in
algorithm.
tsmoothlm
is the central function of the package. It allows
the user to conduct a local polynomial regression of the trend based on
an optimal bandwidth that is obtained by an iterative plug-in algorithm.
Inflation rate (and other factors) can be manually
and individually adjusted as arguments in the function
(see also: tsmoothlm
).
critMatlm
is a quick tool for the calculation of information criteria
for FARIMA(p,d,q
) models with different order combinations p
and
q
. The function returns a matrix with the obtained values of the
selected criterion for the different combinations of p
and q
(see also: critMatlm
).
Datasets
The package includes two datasets: airLDN
(see also:
airLDN
) with daily observations of individual air pollutants
from 2014 to 2020 and gdpG7
(see also: gdpG7
) that has
data concerning the quarterly G7 GDP between Q1 1962 and Q4 2019.
License
The package is distributed under the General Public License v3 ([GPL-3](https://tldrlegal.com/license/gnu-general-public-license-v3-(gpl-3))).
Author(s)
Yuanhua Feng (Department of Economics, Paderborn University),
Author of the Algorithms
Website: https://wiwi.uni-paderborn.de/en/dep4/feng/Sebastian Letmathe,
Author (and Original Package Creator and Maintainer)Dominik Schulz,
Author and Current Package Creator and Maintainer
Dominik Schulz (Scientific Employee) (Department of Economics, Paderborn University),
Author
References
Beran, J. and Y. Feng (2002a). Iterative plug-in algorithms for SEMIFAR models - definition, convergence, and asymptotic properties. Journal of Computational and Graphical Statistics 11(3), 690-713.
Beran, J. and Feng, Y. (2002b). Local polynomial fitting with long-memory, short-memory and antipersistent errors. Annals of the Institute of Statistical Mathematics, 54(2), 291-311.
Beran, J. and Feng, Y. (2002c). SEMIFAR models - a semiparametric approach to modelling trends, longrange dependence and nonstationarity. Computational Statistics & Data Analysis 40(2), 393-419.
Letmathe, S., Beran, J. and Feng, Y. (2023). An extended exponential SEMIFAR model with application in R. Communications in Statistics - Theory and Methods: 1-13.
Examples
lgdp <- log(esemifar::gdpG7$gdp)
nonpar <- gsmooth(lgdp, b = 0.15)
res <- nonpar$res
par <- fracdiff::fracdiff(res, nar = 1, nma = 1)
model <- esemifar(nonpar_model = nonpar, par_model = par)
model
Daily Observations of the Air Quality Index of London (Britain)
Description
A dataset that contains daily observations of individual air pollutants from 2014 to 2020.
Usage
airLDN
Format
A data frame with 2552 rows and 8 variables:
- date
the observation date
- pm25
particle matter with an aerodynamic diameter smaller than 2.5
\mu
m- pm10
particle matter with an aerodynamic diameter smaller than 10
\mu
m- o3
ozone or trioxygen
- no2
nitrogen dioxide
- so2
sulphur dioxide
- co
carbon monoxide
- AQI
composite air quality index
Source
The data can be obtained from the World Air Quality Project.
AR Representation of an ARMA Model
Description
Output has representation with positive signs (on the right-hand side of the equation); inputs are both with positive signs (on right-hand side of equation).
Usage
arma_to_ar(ar = numeric(0), ma = numeric(0), max_i = 1000)
Arguments
ar |
the AR-coefficient series ordered by lag. |
ma |
the MA-coefficient series ordered by lag. |
max_i |
the maximum index up until which to return the coefficient series. |
Details
Consider the ARMA model
X_t = ar_1 X_{t-1} + ... + ar_p X_{t-p}+ma_1 e_{t-1}+...+ma_q e_{t-q}+e_t,
where e_t
are the innovations. ar_i
, i=1, ..., p
, are the AR-coefficients to pass to the
argument ar
, ma_j
, j = 1, ..., q
, are the MA-coefficients
to pass to the argument ma
.The function then returns the coefficients
from the corresponding infinite-order AR-representation
-e_t = c_0 X_t + c_1 X_{t-1}+c_2 X_{t-2} + c_3 X_{t-3} + ...,
where c_l
, l = 0, 1, 2, ...
, are the coefficients. Following this
notation, c_0 = -1
by definition.
Value
A numeric vector is returned.
Author(s)
Dominik Schulz (Scientific Employee) (Department of Economics, Paderborn University),
Author
Examples
arma_to_ar(ar = 0.75, ma = 0.5, max_i = 100)
MA Representation of an ARMA Model
Description
Output has representation with positive signs (on the right-hand side of the equation); inputs are both with positive signs (on right-hand side of equation).
Usage
arma_to_ma(ar = numeric(0), ma = numeric(0), max_i = 1000)
Arguments
ar |
the AR-coefficient series ordered by lag. |
ma |
the MA-coefficient series ordered by lag. |
max_i |
the maximum index up until which to return the coefficient series. |
Details
Consider the ARMA model
X_t = ar_1 X_{t-1} + ... + ar_p X_{t-p}+ma_1 e_{t-1}+...+ma_q e_{t-q}+e_t,
where e_t
are the innovations. ar_i
, i=1, ..., p
, are the AR-coefficients to pass to the
argument ar
, ma_j
, j = 1, ..., q
, are the MA-coefficients
to pass to the argument ma
.The function then returns the coefficients
from the corresponding infinite-order MA-representation
X_t = c_0 e_t + c_1 e_{t-1}+c_2 e_{t-2} + c_3 e_{t-3} + ...,
where c_l
, l = 0, 1, 2, ...
, are the coefficients. Following this
notation, c_0 = 1
by definition.
Value
A numeric vector is returned.
Author(s)
Dominik Schulz (Scientific Employee) (Department of Economics, Paderborn University),
Author
Examples
arma_to_ma(ar = 0.75, ma = 0.5, max_i = 100)
FARIMA Order Selection Matrix
Description
An information criterion is calculated for different orders of a fractionally integrated autoregressive-moving-average (FARIMA) model.
Usage
critMatlm(X, p.max = 5, q.max = 5, criterion = c("bic", "aic"))
Arguments
X |
a numeric vector that contains the observed time series ordered from past to present; the series is assumed to follow an FARIMA process. |
p.max |
an integer value |
q.max |
an integer value |
criterion |
a character value that defines the information criterion
that will be calculated; the Bayesian Information Criterion ( |
Details
The series passed to X
is assumed to follow an
FARIMA(p,d,q
) model. A p.max + 1
by q.max + 1
matrix is
calculated for this series. More precisely, the criterion chosen via the
argument criterion
is calculated for all combinations of orders
p = 0, 1, ..., p_{max}
and
q = 0, 1, ..., q_{max}
.
Within the function, two information criteria are supported: the Bayesian Information Criterion (BIC) and Akaike's Information Criterion (AIC). The AIC is given by
AIC_{p,q} := \ln(\hat{\sigma}_{p,q}^{2}) + \frac{2(p+q)}{n},
where \hat{\sigma}_{p,q}^{2}
is the estimated
innovation variance, p
and q
are the ARMA orders and n
is
the number of observations.
The BIC, on the other hand, is defined by
BIC_{p,q} := k \ln(n) - 2\ln(\hat{L})
with k
being the number of estimated parameters and
\hat{L}
being the estimated Log-Likelihood. Since the parameter
k
only differs with respect to the orders p
and q
for all
estimated models, the term k \ln(n)
is reduced to
(p + q) \ln(n)
within the function. Exemplary,
if the mean of the series is estimated as well, it is usually considered
within the parameter k
when calculating the BIC.
However, since the mean is estimated for all models, not considering this
estimated parameter within the calculation of the BIC will reduce all BIC
values by the same amount of \ln(n)
. Therefore, the selection
via this simplified criterion is still valid, if the number of the estimated
parameters only differs with respect to p
and q
between the
models that the BIC is obtained for.
The optimal orders are considered to be the ones which minimize either the BIC or the AIC. The use of the BIC is however recommended, because the BIC is consistent, whereas the AIC is not.
NOTE:
Within this function, the fracdiff
function of the
fracdiff
package is used throughout for
the estimation of FARIMA models.
Value
The function returns a p.max + 1
by q.max + 1
matrix, where the
rows represent the AR orders from p = 0
to p = p_{max}
and the columns represent the MA orders from q = 0
to
q = q_{max}
. The values within the matrix are the values of
the previously selected information criterion for the different combinations
of p
and q
.
Author(s)
Dominik Schulz (Scientific Employee) (Department of Economics, Paderborn University),
Author
Examples
# Simulate a FARIMA(2, 0.3 ,1) process
set.seed(1)
X.sim <- fracdiff::fracdiff.sim(n = 1000, ar = c(1.2, -0.71), ma = -0.46,
d = 0.3)$series
# Application of the function with BIC-criterion
BIC_mat <- critMatlm(X.sim)
BIC_mat
# determining the optimal order
smoots::optOrd(BIC_mat)
Filter Coefficients of the Fractional Differencing Operator
Description
Output is with positive signs on the left-hand side of the equation.
Usage
d_to_coef(d, max_i = 1000)
Arguments
d |
the fractional differencing coefficient. |
max_i |
the maximum index up until which to return the coefficient series. |
Details
Consider the FARIMA model
(1-B)^d Y_t = ar_1 X_{t-1} + ... + ar_p X_{t-p}+ma_1 e_{t-1}+...+ma_q e_{t-q}+e_t,
where e_t
are the innovations and where X_t=(1-B)^d Y_t
.
d
is the fractional differencing
coefficient.
The fractional differencing operator (1-B)^d
can alternatively be expressed
as an infinite coefficient series, so that
(1-B)^d=\sum_{l=0}^{\infty}b_l B^k,
where B
is the backshift operator and where b_l
, l=0,1,2,...
,
are the coefficients. Note that b_0=1
by definition.
The function returns the series of coefficients \{b_l, l =0,1,2,...\}
.
Value
A numeric vector is returned.
Author(s)
Dominik Schulz (Scientific Employee) (Department of Economics, Paderborn University),
Author
Examples
d_to_coef(d = 0.3, max_i = 100)
Data-driven Local Polynomial for the Trend's Derivatives in Equidistant Time Series
Description
This function runs through an iterative process in order to find the optimal bandwidth for the nonparametric estimation of the first or second derivative of the trend in an equidistant time series (with long-memory errors) and subsequently employs the obtained bandwidth via local polynomial regression.
Usage
dsmoothlm(
y,
d = c(1, 2),
pmin = c(0, 1, 2, 3, 4, 5),
pmax = c(0, 1, 2, 3, 4, 5),
qmin = c(0, 1, 2, 3, 4, 5),
qmax = c(0, 1, 2, 3, 4, 5),
mu = c(0, 1, 2, 3),
mu.p = c(0, 1, 2, 3),
pp = c(1, 3),
bStart.p = 0.15,
InfR.p = c("Opt", "Nai", "Var")
)
Arguments
y |
a numeric vector that contains the time series ordered from past to present. | ||||||||||
d |
an integer | ||||||||||
pmin |
an integer value | ||||||||||
pmax |
an integer value | ||||||||||
qmin |
an integer value | ||||||||||
qmax |
an integer value | ||||||||||
mu |
an integer | ||||||||||
mu.p |
an integer
| ||||||||||
pp |
an integer | ||||||||||
bStart.p |
a numeric object that indicates the starting value of the
bandwidth for the iterative process to obtain initial estimates for | ||||||||||
InfR.p |
a character object that represents the inflation
rate in the form |
Details
The trend is estimated based on the additive nonparametric regression model for an equidistant time series
y_t = m(x_t) + \epsilon_t,
where y_t
is the observed time series, x_t
is the rescaled time
on the interval [0, 1]
, m(x_t)
is a smooth and deterministic
trend function and \epsilon_t
are stationary errors with
E(\epsilon_t) = 0
and is assumed to follow a FARIMA(p, d, q
)
model (see also Beran and Feng, 2002).
The iterative-plug-in (IPI) algorithm, which numerically minimizes the Asymptotic Mean Squared Error (AMISE), is based on the proposal of Beran and Feng (2002a).
The variance factor c_f
, the long memory parameter d
and the
starting bandwidth b0
are first obtained from a pilot-estimation of
the time series' nonparametric trend (\nu = 0
) with polynomial order
p_p
. The estimate is then plugged into the iterative procedure for
estimating the first or second derivative (\nu = 1
or \nu = 2
).
For further details on the asymptotic theory or the algorithm, we refer the
user to Letmathe, Beran and Feng (2023).
The function itself is applicable in the following way: Based on a data input
y
, an order of polynomial pp
for the variance factor estimation
procedure, a starting value for the relative bandwidth bStart.p
in the
variance factor estimation procedure and a kernel function defined by the
smoothness parameter mu
, an optimal bandwidth is numerically calculated
for the trend's derivative of order d
. In fact, aside from the input
vector y
, every argument has a default setting that can be adjusted for
the individual case. However, it is recommended to initially use the default
values for the estimation of the
first derivative and adjust the argument d
to d = 2
for the
estimation of the second derivative.
The initial bandwidth does not affect the resulting optimal bandwidth in
theory. However in practice, local minima of the AMISE can influence the
results. For more specific information on the input arguments consult the
section Arguments.
After the bandwidth estimation, the nonparametric derivative of the series is calculated with respect to the obtained optimal bandwidth by means of a local polynomial regression. The output object is then a list that contains, among other components, the original time series, the estimates of the derivative and the estimated optimal bandwidth.
The default print method for this function delivers key numbers such as
the iteration steps and the generated optimal bandwidth rounded to the fourth
decimal. The exact numbers and results such as the estimated nonparametric
trend series are saved within the output object and can be addressed via the
$
sign.
Value
The function returns a list with different components:
- b0
the optimal bandwidth chosen by the IPI-algorithm.
- bStart.p
the starting bandwidth for the nonparametric trend estimation that leads to the initial estimates; input argument.
- cf0
the estimated variance factor.
- InfR.p
the inflation rate setting.
- iterations
the bandwidths of the single iterations steps
- mu.p
the smoothness parameter of the second order kernel; input argument.
- n
the number of observations.
- niterations
the total number of iterations until convergence.
- orig
the original input series; input argument.
- p
the order of polynomial for the local polynomial regression used within derivative estimation procedure.
- pp
the order of polynomial for the local polynomial regression used in the pilot estimation; input argument.
- v
the considered order of the trend's derivative; input argument
d
.- ws
the weighting system matrix used within the local polynomial regression; this matrix is a condensed version of a complete weighting system matrix; in each row of
ws
, the weights for conducting the smoothing procedure at a specific observation time point can be found; the first[nb + 0.5]
rows, wheren
corresponds to the number of observations,b
is the bandwidth considered for smoothing and[.]
denotes the integer part, contain the weights at the[nb + 0.5]
left-hand boundary points; the weights in row[nb + 0.5] + 1
are representative for the estimation at all interior points and the remaining rows contain the weights for the right-hand boundary points; each row has exactly2[nb + 0.5] + 1
elements, more specifically the weights for observations of the nearest2[nb + 0.5] + 1
time points; moreover, the weights are normalized, i.e. the weights are obtained under consideration of the time pointsx_t = t/n
, wheret = 1, 2, ..., n
.- ye
the nonparametric estimates of the derivative.
Author(s)
Yuanhua Feng (Department of Economics, Paderborn University),
Author of the Algorithms
Website: https://wiwi.uni-paderborn.de/en/dep4/feng/Sebastian Letmathe (Scientific Employee) (Department of Economics, Paderborn University),
Package Creator and MaintainerDominik Schulz (Scientific Employee) (Department of Economics, Paderborn University),
Author
References
Letmathe, S., Beran, J. and Feng, Y. (2023). An extended exponential SEMIFAR model with application in R. Communications in Statistics - Theory and Methods: 1-13.
Examples
# Logarithm of test data
test_data <- gdpG7
y <- log(test_data$gdp)
n <- length(y)
t <- seq(from = 1962, to = 2020, length.out = n)
# Applied dsmooth function for the trend's first derivative
result_d <- dsmoothlm(y, d = 1, pp = 1, pmax = 1, qmax = 1, InfR.p = "Opt")
estim <- result_d$ye
# Plot of the results
plot(t, estim, xlab = "Year", ylab = "First derivative", type = "l",
main = paste0("Estimated first derivative of the trend for log-quarterly ",
"G7-GDP, Q1 1962 - Q4 2019"), cex.axis = 0.8, cex.main = 0.8,
cex.lab = 0.8, bty = "n")
# Print result
result_d
# The main function "dsmoothlm"------------------------------------------
AR Representation of a FARIMA Model
Description
Output has representation with positive signs (on the right-hand side of the equation); inputs are both with positive signs (on right-hand side of equation).
Usage
farima_to_ar(ar = numeric(0), ma = numeric(0), d = 0, max_i = 1000)
Arguments
ar |
the AR-coefficient series ordered by lag. |
ma |
the MA-coefficient series ordered by lag. |
d |
the fractional differencing coefficient. |
max_i |
the maximum index up until which to return the coefficient series. |
Details
Consider the FARIMA model
(1-B)^d Y_t = ar_1 X_{t-1} + ... + ar_p X_{t-p}+ma_1 e_{t-1}+...+ma_q e_{t-q}+e_t,
where e_t
are the innovations and where X_t=(1-B)^d Y_t
.
ar_i
, i=1, ..., p
, are the AR-coefficients to pass to the
argument ar
, ma_j
, j = 1, ..., q
, are the MA-coefficients
to pass to the argument ma
. d
is the fractional differencing
coefficient. The function then returns the coefficients
from the corresponding infinite-order AR-representation
-e_t = c_0 Y_t + c_1 Y_{t-1}+c_2 Y_{t-2} + c_3 Y_{t-3} + ...,
where c_l
, l = 0, 1, 2, ...
, are the coefficients. Following this
notation, c_0 = -1
by definition.
Value
A numeric vector is returned.
Author(s)
Dominik Schulz (Scientific Employee) (Department of Economics, Paderborn University),
Author
Examples
farima_to_ar(ar = 0.75, ma = 0.5, d = 0.3, max_i = 100)
MA Representation of a FARIMA Model
Description
Output has representation with positive signs (on the right-hand side of the equation); inputs are both with positive signs (on right-hand side of equation).
Usage
farima_to_ma(ar = numeric(0), ma = numeric(0), d = 0, max_i = 1000)
Arguments
ar |
the AR-coefficient series ordered by lag. |
ma |
the MA-coefficient series ordered by lag. |
d |
the fractional differencing coefficient. |
max_i |
the maximum index up until which to return the coefficient series. |
Details
Consider the FARIMA model
(1-B)^d Y_t = ar_1 X_{t-1} + ... + ar_p X_{t-p}+ma_1 e_{t-1}+...+ma_q e_{t-q}+e_t,
where e_t
are the innovations and where X_t=(1-B)^d Y_t
.
ar_i
, i=1, ..., p
, are the AR-coefficients to pass to the
argument ar
, ma_j
, j = 1, ..., q
, are the MA-coefficients
to pass to the argument ma
. d
is the fractional differencing coefficient.
The function then returns the coefficients
from the corresponding infinite-order AR-representation
Y_t = c_0 e_t + c_1 e_{t-1}+c_2 e_{t-2} + c_3 e_{t-3} + ...,
where c_l
, l = 0, 1, 2, ...
, are the coefficients. Following this
notation, c_0 = 1
by definition.
Value
A numeric vector is returned.
Author(s)
Dominik Schulz (Scientific Employee) (Department of Economics, Paderborn University),
Author
Examples
farima_to_ma(ar = 0.75, ma = 0.5, d = 0.3, max_i = 100)
Extract Model Fitted Values
Description
Generic function which extracts fitted values from a esemifar
class
object. Both fitted
and fitted.values
can be called.
Usage
## S3 method for class 'esemifar'
fitted(object, ...)
Arguments
object |
an object from the |
... |
included for consistency with the generic function. |
Value
Fitted values extracted from a esemifar
class object.
Author(s)
Sebastian Letmathe (Scientific Employee) (Department of Economics, Paderborn University),
Quarterly G7 GDP, Q1 1962 to Q4 2019
Description
A dataset that contains the (seasonally adjusted) Gross Domestic Product of the G7 nations from the first quarter of 1962 to the fourth quarter of 2019
Usage
gdpG7
Format
A data frame with 232 rows and 3 variables:
- year
the observation year
- quarter
the observation quarter in the given year
- gdp
the volume Index of the Gross Domestic Product of the G7 nations
Source
The data was obtained from the Organization for Economic Co-operation and Development (OECD)
https://data.oecd.org/gdp/quarterly-gdp.htm#indicator-chart
Estimation of Trends and their Derivatives via Local Polynomial Regression
Description
This function is an R function for estimating the trend function and its derivatives in an equidistant time series with local polynomial regression and a fixed bandwidth given beforehand.
Usage
gsmooth(y, v = 0, p = v + 1, mu = 1, b = 0.15, bb = c(0, 1))
Arguments
y |
a numeric vector that contains the time series data ordered from past to present. | |||||||||||||||||||||||||
v |
an integer
| |||||||||||||||||||||||||
p |
an integer Exemplary for
| |||||||||||||||||||||||||
mu |
an integer
| |||||||||||||||||||||||||
b |
a real number | |||||||||||||||||||||||||
bb |
can be set to
|
Details
The trend or its derivatives are estimated based on the additive nonparametric regression model for an equidistant time series
y_t = m(x_t) + \epsilon_t,
where y_t
is the observed time series, x_t
is the rescaled time
on the interval [0, 1]
, m(x_t)
is a smooth and deterministic
trend function and \epsilon_t
are stationary errors with
E(\epsilon_t) = 0
(see also Beran and Feng, 2002).
This function is part of the package smoots
and is used in
the field of analyzing equidistant time series data. It applies the local
polynomial regression method to the input data with an arbitrarily
selectable bandwidth. By these means, the trend as well as its derivatives
can be estimated nonparametrically, even though the result will strongly
depend on the bandwidth given beforehand as an input.
NOTE:
The estimates are obtained with regard to the rescaled time points on the
interval [0, 1]
. Thus, if \nu > 0
, the estimates might not
reflect the values for the actual time points. To rescale the estimates, we
refer the user to the rescale
function of the smoots
package.
With package version 1.1.0, this function implements C++ code by means
of the Rcpp
and
RcppArmadillo
packages for
better performance.
Value
The output object is a list with different components:
- b
the chosen (relative) bandwidth; input argument.
- bb
the chosen bandwidth option at the boundaries; input argument.
- mu
the chosen smoothness parameter for the second order kernel; input argument.
- n
the number of observations.
- orig
the original input series; input argument.
- p
the chosen order of polynomial; input argument.
- res
a vector with the estimated residual series; is set to
NULL
forv > 0
.- v
the order of derivative; input argument.
- ws
the weighting system matrix used within the local polynomial regression; this matrix is a condensed version of a complete weighting system matrix; in each row of
ws
, the weights for conducting the smoothing procedure at a specific observation time point can be found; the first[nb + 0.5]
rows, wheren
corresponds to the number of observations,b
is the bandwidth considered for smoothing and[.]
denotes the integer part, contain the weights at the[nb + 0.5]
left-hand boundary points; the weights in row[nb + 0.5] + 1
are representative for the estimation at all interior points and the remaining rows contain the weights for the right-hand boundary points; each row has exactly2[nb + 0.5] + 1
elements, more specifically the weights for observations of the nearest2[nb + 0.5] + 1
time points; moreover, the weights are normalized, i.e. the weights are obtained under consideration of the time pointsx_t = t/n
, wheret = 1, 2, ..., n
.- ye
a vector with the estimates of the selected nonparametric order of derivative on the rescaled time interval
[0, 1]
.
Author(s)
Yuanhua Feng (Department of Economics, Paderborn University),
Author of the Algorithms
Website: https://wiwi.uni-paderborn.de/en/dep4/feng/Dominik Schulz (Research Assistant) (Department of Economics, Paderborn University),
Package Creator and Maintainer
References
Beran, J. and Feng, Y. (2002). Local polynomial fitting with long-memory, short-memory and antipersistent errors. Annals of the Institute of Statistical Mathematics, 54(2), 291-311.
Feng, Y., Gries, T. and Fritz, M. (2020). Data-driven local polynomial for the trend and its derivatives in economic time series. Journal of Nonparametric Statistics, 32:2, 510-533.
Feng, Y., Gries, T., Letmathe, S. and Schulz, D. (2019). The smoots package in R for semiparametric modeling of trend stationary time series. Discussion Paper. Paderborn University. Unpublished.
Examples
# Logarithm of test data
y <- log(esemifar::gdpG7$gdp)
# Applied gsmooth function for the trend with two different bandwidths
results1 <- gsmooth(y, v = 0, p = 1, mu = 1, b = 0.28, bb = 1)
results2 <- gsmooth(y, v = 0, p = 1, mu = 1, b = 0.11, bb = 1)
trend1 <- results1$ye
trend2 <- results2$ye
# Plot of the results
t <- seq(from = 1962, to = 2019.75, by = 0.25)
plot(t, y, type = "l", xlab = "Year", ylab = "log(US-GDP)", bty = "n",
lwd = 2,
main = "Estimated trend for log-quarterly US-GDP, Q1 1947 - Q2 2019")
points(t, trend1, type = "l", col = "red", lwd = 1)
points(t, trend2, type = "l", col = "blue", lwd = 1)
legend("bottomright", legend = c("Trend (b = 0.28)", "Trend (b = 0.11)"),
fill = c("red", "blue"), cex = 0.6)
title(sub = expression(italic("Figure 1")), col.sub = "gray47",
cex.sub = 0.6, adj = 0)
Plot Method for the Package 'esemifar'
Description
This function regulates how objects created by the package esemifar
are
plotted.
Usage
## S3 method for class 'esemifar'
plot(x, t = NULL, rescale = TRUE, which = NULL, ...)
Arguments
x |
an input object of class |
t |
an optional vector with time points that will be considered for
the x-axis within the plot; is set to NULL by default and uses a vector
|
rescale |
a single logical value; is set to |
which |
one of 1 (observations), 2 (trend), 3 (residuals) or 4
(observations with trend); the default |
... |
additional arguments of the standard plot method. |
Value
None
Author(s)
Dominik Schulz (Research Assistant) (Department of Economics, Paderborn University),
Plot Method for Class "esemifar_fc"
Description
Create basic R plots for forecasting objects of class "esemifar_fc"
.
Usage
## S3 method for class 'esemifar_fc'
plot(x, y = NULL, t = NULL, ...)
Arguments
x |
an object of class |
y |
currently without use; for compatibility only. |
t |
a numeric vector with series of time points for observations and forecasts. |
... |
further arguments of |
Details
This is a plot method to visualize the forecasting results for an ESEMIFAR model. Common plot arguments can be implemented to change the appearance.
Value
This method returns NULL
.
Author(s)
Dominik Schulz (Research Assistant) (Department of Economics, Paderborn University),
Author and Package Creator
Examples
lgdp <- log(esemifar::gdpG7$gdp)
est <- tsmoothlm(lgdp, pmax = 1, qmax = 1)
# Under normality
fc <- predict(est, n.ahead = 10, method = "norm", expo = TRUE)
plot(fc)
ESEMIFAR Prediction Method
Description
Point and interval forecasts (under the normality assumption or via a bootstrap) for fitted ESEMIFAR models.
Usage
## S3 method for class 'esemifar'
predict(
object,
n.ahead = 5,
alpha = c(0.95, 0.99),
method = c("norm", "boot"),
bootMethod = c("simple", "advanced"),
npaths = 5000,
quant.type = 8,
boot_progress = TRUE,
expo = FALSE,
trend_extrap = c("linear", "constant"),
future = TRUE,
num_cores = future::availableCores() - 1,
...
)
Arguments
object |
|
n.ahead |
a single numeric value that represents the forecasting horizon. |
alpha |
a numeric vector with confidence levels for the forecasting
intervals; the default |
method |
whether to obtain the forecasting intervals under the
normality assumption ( |
bootMethod |
only for |
npaths |
only for |
quant.type |
only for |
boot_progress |
only for |
expo |
whether to exponentiate all results at the end. |
trend_extrap |
how to extrapolate the estimated trend into the future:
linearly ( |
future |
only for |
num_cores |
only for |
... |
no purpose; for compatibility only. |
Details
Produce point and interval forecasts based on ESEMIFAR models. Throughout, the infinite-order AR-representation of the parametric FARIMA part is considered to produce point forecasts and future paths of the series. The trend is usually extrapolated linearly (or constantly as an alternative).
Value
The function returns a list of class "esemifar"
with elements
nonpar_model
and par_model
.
A list with various elements is returned.
- obs
the observed series.
- mean
the point forecasts.
- lower
the lower bounds of the forecasting intervals.
- upper
the upper bounds of the forecasting intervals.
- model
the fitted ESEMIFAR model object.
- level
the confidence levels for the forecasting intervals.
Author(s)
Dominik Schulz (Scientific Employee) (Department of Economics, Paderborn University),
Author
Examples
lgdp <- log(esemifar::gdpG7$gdp)
est <- tsmoothlm(lgdp, pmax = 1, qmax = 1)
# Under normality
fc <- predict(est, n.ahead = 10, method = "norm", expo = TRUE)
fc$mean
fc$lower
fc$upper
Print Method for the Package 'esemifar'
Description
This function regulates how objects created by the package esemifar
are
printed.
Usage
## S3 method for class 'esemifar'
print(x, ...)
Arguments
x |
an input object of class |
... |
included for compatibility; additional arguments will however not affect the output. |
Value
None
Author(s)
Dominik Schulz (Scientific employee) (Department of Economics, Paderborn University),
Extract Model Residuals
Description
Generic function which extracts model residuals from a esemifar
class
object. Both residuals
and its abbreviated form resid
can be called.
Usage
## S3 method for class 'esemifar'
residuals(object, ...)
Arguments
object |
an object from the |
... |
included for consistency with the generic function. |
Value
Residuals extracted from a esemifar
class object.
Author(s)
Sebastian Letmathe (Scientific Employee) (Department of Economics, Paderborn University),
Advanced Data-driven Nonparametric Regression for the Trend in Equidistant Time Series
Description
This function runs an iterative plug-in algorithm to find the optimal bandwidth for the estimation of the nonparametric trend in equidistant time series (with long-memory errors) and then employs the resulting bandwidth via either local polynomial or kernel regression.
Usage
tsmoothlm(
y,
pmin = c(0, 1, 2, 3, 4, 5),
pmax = c(0, 1, 2, 3, 4, 5),
qmin = c(0, 1, 2, 3, 4, 5),
qmax = c(0, 1, 2, 3, 4, 5),
p = c(1, 3),
mu = c(0, 1, 2, 3),
InfR = c("Opt", "Nai", "Var"),
bStart = 0.15,
bb = c(0, 1),
cb = 0.05,
method = c("lpr", "kr")
)
Arguments
y |
a numeric vector that contains the time series ordered from past to present. | ||||||||||
pmin |
an integer value | ||||||||||
pmax |
an integer value | ||||||||||
qmin |
an integer value | ||||||||||
qmax |
an integer value | ||||||||||
p |
an integer | ||||||||||
mu |
an integer
| ||||||||||
InfR |
a character object that represents the inflation
rate in the form
| ||||||||||
bStart |
a numeric object that indicates the starting value of the
bandwidth for the iterative process; should be | ||||||||||
bb |
can be set to
| ||||||||||
cb |
a numeric value that indicates the percentage of omitted
observations on each side of the observation period for the automated
bandwidth selection; is set to | ||||||||||
method |
the final smoothing approach; |
Details
The trend is estimated based on the additive nonparametric regression model for an equidistant time series
y_t = m(x_t) + \epsilon_t,
where y_t
is the observed time series, x_t
is the rescaled time
on the interval [0, 1]
, m(x_t)
is a smooth and deterministic
trend function and \epsilon_t
are stationary errors with
E(\epsilon_t) = 0
and is assumed to follow a FARIMA(p, d, q
)
model (see also Beran and Feng, 2002a, Beran and Feng, 2002b and Beran
and Feng, 2002c).
The iterative-plug-in (IPI) algorithm, which numerically minimizes the Asymptotic Mean Squared Error (AMISE), is based on the proposal of Beran and Feng (2002a).
The function calculates suitable estimates for c_f
, the variance
factor, and I[m^{(k)}]
over different iterations. In each
iteration, a bandwidth is obtained in accordance with the AMISE that once
more serves as an input for the following iteration. The process repeats
until either convergence or the 40th iteration is reached. For further
details on the asymptotic theory or the algorithm, please see Letmathe et
al., 2023.
To apply the function, the following arguments are needed: a data input
y
, an order of polynomial p
, a kernel weighting function
defined by the smoothness parameter mu
, an inflation rate setting
InfR
(see also Beran and Feng, 2002b), a starting value for the
relative bandwidth bStart
, a
boundary method bb
, a boundary cut-off percentage cb
and a
final smoothing method method
. In fact, aside from the input vector
y
, every argument has a default setting that can be adjusted for the
individual case. Theoretically, the initial bandwidth does not affect the
selected optimal bandwidth. However, in practice local minima of the AMISE
might exist and influence the selected bandwidth. Therefore, the default
setting is bStart = 0.15
. In the rare
case of a clearly unsuitable optimal bandwidth, a starting bandwidth that
differs from the default value is a first possible approach to obtain a
better result. Other argument adjustments can be tried as well. For more
specific information on the input arguments consult the section
Arguments.
When applying the function, an optimal bandwidth is obtained based on a
strongly modified version of the IPI algorithm of Beran and Feng (2002a). In
a second step, the nonparametric trend of the series is calculated with
respect to the chosen bandwidth and the selected regression method (lpf
or kr
). Please note that method = "lpf"
is strongly recommended
by the authors. Moreover, it is notable that p
is automatically set to
1
for method = "kr"
. The output object is then a list that
contains, among other components, the original time series, the estimated
trend values and the series without the trend.
The default print method for this function delivers only key numbers such as
the iteration steps and the generated optimal bandwidth rounded to the fourth
decimal. The exact numbers and results such as the estimated nonparametric
trend series are saved within the output object and can be addressed via the
$
sign.
Value
The function returns a list with different components:
- FARIMA.BIC
the Bayesian Information Criterion of the optimal FARIMA(
p,d,q
) model.- cb
the percentage of omitted observations on each side of the observation period.
- b0
the optimal bandwidth chosen by the IPI-algorithm.
- bb
the boundary bandwidth method used within the IPI; always equal to 1.
- bStart
the starting value of the (relative) bandwidth; input argument.
- cf0
the estimated variance factor; in contrast to the definitions given in the Details section, this object actually contains an estimated value of
2\pi c_f
, i.e. it corresponds to the estimated sum of autocovariances.- d.BIC
the long-memory parameter of the optimal FARIMA(
p,d,q
) model.- FARMA.BIC
the model fit of the selected FARIMA(
p,d,q
model.- I2
the estimated value of
I[m^{(k)}]
.- InfR
the setting for the inflation rate according to the chosen algorithm.
- iterations
the bandwidths of the single iterations steps
- mu
the smoothness parameter of the second order kernel; input argument.
- n
the number of observations.
- niterations
the total number of iterations until convergence.
- orig
the original input series; input argument.
- p.BIC
the order p of the optimal FARIMA(
p,d,q
) model.- p
the order of polynomial used in the IPI-algorithm; also used for the final smoothing, if
method = "lpr"
; input argument.- q.BIC
the order
q
of the optimal FARIMA(p,d,q
) model.- res
the estimated residual series.
- v
the considered order of derivative of the trend; is always zero for this function.
- ws
the weighting system matrix used within the local polynomial regression; this matrix is a condensed version of a complete weighting system matrix; in each row of
ws
, the weights for conducting the smoothing procedure at a specific observation time point can be found; the first[nb + 0.5]
rows, wheren
corresponds to the number of observations,b
is the bandwidth considered for smoothing and[.]
denotes the integer part, contain the weights at the[nb + 0.5]
left-hand boundary points; the weights in row[nb + 0.5] + 1
are representative for the estimation at all interior points and the remaining rows contain the weights for the right-hand boundary points; each row has exactly2[nb + 0.5] + 1
elements, more specifically the weights for observations of the nearest2[nb + 0.5] + 1
time points; moreover, the weights are normalized, i.e. the weights are obtained under consideration of the time pointsx_t = t/n
, wheret = 1, 2, ..., n
.- ye
the nonparametric estimates of the trend.
Author(s)
Yuanhua Feng (Department of Economics, Paderborn University),
Author of the Algorithms
Website: https://wiwi.uni-paderborn.de/en/dep4/feng/Sebastian Letmathe (Scientific Employee) (Department of Economics, Paderborn University),
Package Creator and MaintainerDominik Schulz (Scientific Employee) (Department of Economics, Paderborn University),
Author
References
Beran, J. and Y. Feng (2002a). Iterative plug-in algorithms for SEMIFAR models - definition, convergence, and asymptotic properties. Journal of Computational and Graphical Statistics 11(3), 690-713.
Beran, J. and Feng, Y. (2002b). Local polynomial fitting with long-memory, short-memory and antipersistent errors. Annals of the Institute of Statistical Mathematics, 54(2), 291-311.
Beran, J. and Feng, Y. (2002c). SEMIFAR models - a semiparametric approach to modelling trends, longrange dependence and nonstationarity. Computational Statistics & Data Analysis 40(2), 393-419.
Letmathe, S., Beran, J. and Feng, Y. (2023). An extended exponential SEMIFAR model with application in R. Communications in Statistics - Theory and Methods: 1-13.
Examples
### Example 1: G7-GDP ###
# Logarithm of test data
# -> the logarithm of the data is assumed to follow the additive model
test_data <- gdpG7
y <- log(test_data$gdp)
n <- length(y)
# Applied tsmooth function for the trend
result <- tsmoothlm(y, p = 1, pmax = 1, qmax = 1, InfR = "Opt")
trend1 <- result$ye
# Plot of the results
t <- seq(from = 1962, to = 2020, length.out = n)
plot(t, y, type = "l", xlab = "Year", ylab = "log(G7-GDP)", bty = "n",
lwd = 1, lty = 3,
main = "Estimated trend for log-quarterly G7-GDP, Q1 1962 - Q4 2019")
points(t, trend1, type = "l", col = "red", lwd = 1)
title(sub = expression(italic("Figure 1")), col.sub = "gray47",
cex.sub = 0.6, adj = 0)
result