Type: | Package |
Title: | Penalized Composite Link Model for Efficient Estimation of Smooth Distributions from Coarsely Binned Data |
Version: | 1.4.4 |
Description: | Versatile method for ungrouping histograms (binned count data) assuming that counts are Poisson distributed and that the underlying sequence on a fine grid to be estimated is smooth. The method is based on the composite link model and estimation is achieved by maximizing a penalized likelihood. Smooth detailed sequences of counts and rates are so estimated from the binned counts. Ungrouping binned data can be desirable for many reasons: Bins can be too coarse to allow for accurate analysis; comparisons can be hindered when different grouping approaches are used in different histograms; and the last interval is often wide and open-ended and, thus, covers a lot of information in the tail area. Age-at-death distributions grouped in age classes and abridged life tables are examples of binned data. Because of modest assumptions, the approach is suitable for many demographic and epidemiological applications. For a detailed description of the method and applications see Rizzi et al. (2015) <doi:10.1093/aje/kwv020>. |
License: | MIT + file LICENSE |
LazyData: | TRUE |
Depends: | R (≥ 3.4.0) |
Imports: | pbapply (≥ 1.3), Rcpp (≥ 0.12.0), Rdpack (≥ 0.8), Matrix |
LinkingTo: | Rcpp, RcppEigen |
Suggests: | MortalityLaws (≥ 1.5.0), knitr (≥ 1.20), rmarkdown (≥ 1.10), testthat (≥ 2.0.0) |
RdMacros: | Rdpack |
URL: | https://github.com/mpascariu/ungroup |
BugReports: | https://github.com/mpascariu/ungroup/issues |
VignetteBuilder: | knitr |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.0 |
NeedsCompilation: | yes |
Packaged: | 2024-01-29 15:00:04 UTC; mpascariu |
Author: | Marius D. Pascariu
|
Maintainer: | Marius D. Pascariu <rpascariu@outlook.com> |
Repository: | CRAN |
Date/Publication: | 2024-01-31 13:00:02 UTC |
ungroup: Penalized Composite Link Model for Efficient Estimation of Smooth Distributions from Coarsely Binned Data
Description
Versatile method for ungrouping histograms (binned count data) assuming that counts are Poisson distributed and that the underlying sequence on a fine grid to be estimated is smooth. The method is based on the composite link model and estimation is achieved by maximizing a penalized likelihood. Smooth detailed sequences of counts and rates are so estimated from the binned counts. Ungrouping binned data can be desirable for many reasons: Bins can be too coarse to allow for accurate analysis; comparisons can be hindered when different grouping approaches are used in different histograms; and the last interval is often wide and open-ended and, thus, covers a lot of information in the tail area. Age-at-death distributions grouped in age classes and abridged life tables are examples of binned data. Because of modest assumptions, the approach is suitable for many demographic and epidemiological applications. For a detailed description of the method and applications see Rizzi et al. (2015) doi:10.1093/aje/kwv020.
Details
To learn more about the package, start with the vignettes:
browseVignettes(package = "ungroup")
Author(s)
Maintainer: Marius D. Pascariu rpascariu@outlook.com (ORCID)
Authors:
Silvia Rizzi srizzi@health.sdu.dk
Jonas Schoeley (ORCID)
Maciej J. Danko Danko@demogr.mpg.de (ORCID)
References
Currie ID, Durban M, Eilers PH (2004).
“Smoothing and forecasting mortality rates.”
Statistical modelling, 4(4), 279–298.
Eilers PH (2007).
“Ill-posed problems with counts, the composite link model and penalized likelihood.”
Statistical Modelling, 7(3), 239-254.
doi:10.1177/1471082X0700700302.
Hastie TJ, Tibshirani RJ (1990).
“Generalized additive models.”
Monographs on Statistics and Applied Probability, 43.
Human Mortality Database (2018).
“University of California, Berkeley (USA), and Max Planck Institute for Demographic Research (Germany). Data downloaded on 17/01/2018.”
https://www.mortality.org.
Pascariu MD (2018).
MortalityLaws: Parametric Mortality Models, Life Tables and HMD.
R package version 1.6.0, https://github.com/mpascariu/MortalityLaws.
Rizzi S, Gampe J, Eilers PHC (2015).
“Efficient Estimation of Smooth Distributions From Coarsely Grouped Data.”
American Journal of Epidemiology, 182(2), 138-147.
doi:10.1093/aje/kwv020.
Rizzi S, Halekoh U, Thinggaard M, Engholm G, Christensen N, Johannesen TB, Lindahl-Jacobsen R (2019).
“How to estimate mortality trends from grouped vital statistics.”
International Journal of Epidemiology, 48(2), 571–582.
doi:10.1093/ije/dyy183.
Rizzi S, Thinggaard M, Engholm G, Christensen N, Johannesen TB, Vaupel JW, Lindahl-Jacobsen R (2016).
“Comparison of non-parametric methods for ungrouping coarsely aggregated data.”
BMC medical research methodology, 16(1), 59.
doi:10.1186/s12874-016-0157-8.
Thompson R, Baker RJ (1981).
“Composite link functions in generalized linear models.”
Applied Statistics, 125–131.
See Also
Useful links:
PCLM Akaike Information Criterion
Description
Generic function calculating Akaike's ‘An Information Criterion’ for
one or several fitted model objects for which a log-likelihood value
can be obtained, according to the formula
-2 \mbox{log-likelihood} + k n_{par}
,
where n_{par}
represents the number of parameters in the
fitted model, and k = 2
for the usual AIC, or
k = \log(n)
(n
being the number of observations) for the so-called BIC or SBC
(Schwarz's Bayesian criterion).
Usage
## S3 method for class 'pclm'
AIC(object, ..., k = 2)
Arguments
object |
a fitted model object for which there exists a
|
... |
optionally more fitted model objects. |
k |
numeric, the penalty per parameter to be used; the
default |
Details
When comparing models fitted by maximum likelihood to the same data, the smaller the AIC or BIC, the better the fit.
The theory of AIC requires that the log-likelihood has been maximized: whereas AIC can be computed for models not fitted by maximum likelihood, their AIC values should not be compared.
Examples of models not ‘fitted to the same data’ are where the response is transformed (accelerated-life models are fitted to log-times) and where contingency tables have been used to summarize data.
These are generic functions (with S4 generics defined in package
stats4): however methods should be defined for the
log-likelihood function logLik
rather than these
functions: the action of their default methods is to call logLik
on all the supplied objects and assemble the results. Note that in
several common cases logLik
does not return the value at
the MLE: see its help page.
The log-likelihood and hence the AIC/BIC is only defined up to an
additive constant. Different constants have conventionally been used
for different purposes and so extractAIC
and AIC
may give different values (and do for models of class "lm"
: see
the help for extractAIC
). Particular care is needed
when comparing fits of different classes (with, for example, a
comparison of a Poisson and gamma GLM being meaningless since one has
a discrete response, the other continuous).
BIC
is defined as
AIC(object, ..., k = log(nobs(object)))
.
This needs the number of observations to be known: the default method
looks first for a "nobs"
attribute on the return value from the
logLik
method, then tries the nobs
generic, and if neither succeed returns BIC as NA
.
Value
If just one object is provided, a numeric value with the corresponding
AIC (or BIC, or ..., depending on k
).
If multiple objects are provided, a data.frame
with rows
corresponding to the objects and columns representing the number of
parameters in the model (df
) and the AIC or BIC.
References
Sakamoto, Y., Ishiguro, M., and Kitagawa G. (1986). Akaike Information Criterion Statistics. D. Reidel Publishing Company.
PCLM-2D Akaike Information Criterion
Description
Generic function calculating Akaike's ‘An Information Criterion’ for
one or several fitted model objects for which a log-likelihood value
can be obtained, according to the formula
-2 \mbox{log-likelihood} + k n_{par}
,
where n_{par}
represents the number of parameters in the
fitted model, and k = 2
for the usual AIC, or
k = \log(n)
(n
being the number of observations) for the so-called BIC or SBC
(Schwarz's Bayesian criterion).
Usage
## S3 method for class 'pclm2D'
AIC(object, ..., k = 2)
Arguments
object |
a fitted model object for which there exists a
|
... |
optionally more fitted model objects. |
k |
numeric, the penalty per parameter to be used; the
default |
Details
When comparing models fitted by maximum likelihood to the same data, the smaller the AIC or BIC, the better the fit.
The theory of AIC requires that the log-likelihood has been maximized: whereas AIC can be computed for models not fitted by maximum likelihood, their AIC values should not be compared.
Examples of models not ‘fitted to the same data’ are where the response is transformed (accelerated-life models are fitted to log-times) and where contingency tables have been used to summarize data.
These are generic functions (with S4 generics defined in package
stats4): however methods should be defined for the
log-likelihood function logLik
rather than these
functions: the action of their default methods is to call logLik
on all the supplied objects and assemble the results. Note that in
several common cases logLik
does not return the value at
the MLE: see its help page.
The log-likelihood and hence the AIC/BIC is only defined up to an
additive constant. Different constants have conventionally been used
for different purposes and so extractAIC
and AIC
may give different values (and do for models of class "lm"
: see
the help for extractAIC
). Particular care is needed
when comparing fits of different classes (with, for example, a
comparison of a Poisson and gamma GLM being meaningless since one has
a discrete response, the other continuous).
BIC
is defined as
AIC(object, ..., k = log(nobs(object)))
.
This needs the number of observations to be known: the default method
looks first for a "nobs"
attribute on the return value from the
logLik
method, then tries the nobs
generic, and if neither succeed returns BIC as NA
.
Value
If just one object is provided, a numeric value with the corresponding
AIC (or BIC, or ..., depending on k
).
If multiple objects are provided, a data.frame
with rows
corresponding to the objects and columns representing the number of
parameters in the model (df
) and the AIC or BIC.
References
Sakamoto, Y., Ishiguro, M., and Kitagawa G. (1986). Akaike Information Criterion Statistics. D. Reidel Publishing Company.
PCLM Bayes Information Criterion
Description
Generic function calculating Akaike's ‘An Information Criterion’ for
one or several fitted model objects for which a log-likelihood value
can be obtained, according to the formula
-2 \mbox{log-likelihood} + k n_{par}
,
where n_{par}
represents the number of parameters in the
fitted model, and k = 2
for the usual AIC, or
k = \log(n)
(n
being the number of observations) for the so-called BIC or SBC
(Schwarz's Bayesian criterion).
Usage
## S3 method for class 'pclm'
BIC(object, ...)
Arguments
object |
a fitted model object for which there exists a
|
... |
optionally more fitted model objects. |
Details
When comparing models fitted by maximum likelihood to the same data, the smaller the AIC or BIC, the better the fit.
The theory of AIC requires that the log-likelihood has been maximized: whereas AIC can be computed for models not fitted by maximum likelihood, their AIC values should not be compared.
Examples of models not ‘fitted to the same data’ are where the response is transformed (accelerated-life models are fitted to log-times) and where contingency tables have been used to summarize data.
These are generic functions (with S4 generics defined in package
stats4): however methods should be defined for the
log-likelihood function logLik
rather than these
functions: the action of their default methods is to call logLik
on all the supplied objects and assemble the results. Note that in
several common cases logLik
does not return the value at
the MLE: see its help page.
The log-likelihood and hence the AIC/BIC is only defined up to an
additive constant. Different constants have conventionally been used
for different purposes and so extractAIC
and AIC
may give different values (and do for models of class "lm"
: see
the help for extractAIC
). Particular care is needed
when comparing fits of different classes (with, for example, a
comparison of a Poisson and gamma GLM being meaningless since one has
a discrete response, the other continuous).
BIC
is defined as
AIC(object, ..., k = log(nobs(object)))
.
This needs the number of observations to be known: the default method
looks first for a "nobs"
attribute on the return value from the
logLik
method, then tries the nobs
generic, and if neither succeed returns BIC as NA
.
Value
If just one object is provided, a numeric value with the corresponding
AIC (or BIC, or ..., depending on k
).
If multiple objects are provided, a data.frame
with rows
corresponding to the objects and columns representing the number of
parameters in the model (df
) and the AIC or BIC.
References
Sakamoto, Y., Ishiguro, M., and Kitagawa G. (1986). Akaike Information Criterion Statistics. D. Reidel Publishing Company.
PCLM-2D Akaike Information Criterion
Description
Generic function calculating Akaike's ‘An Information Criterion’ for
one or several fitted model objects for which a log-likelihood value
can be obtained, according to the formula
-2 \mbox{log-likelihood} + k n_{par}
,
where n_{par}
represents the number of parameters in the
fitted model, and k = 2
for the usual AIC, or
k = \log(n)
(n
being the number of observations) for the so-called BIC or SBC
(Schwarz's Bayesian criterion).
Usage
## S3 method for class 'pclm2D'
BIC(object, ..., k = 2)
Arguments
object |
a fitted model object for which there exists a
|
... |
optionally more fitted model objects. |
k |
numeric, the penalty per parameter to be used; the
default |
Details
When comparing models fitted by maximum likelihood to the same data, the smaller the AIC or BIC, the better the fit.
The theory of AIC requires that the log-likelihood has been maximized: whereas AIC can be computed for models not fitted by maximum likelihood, their AIC values should not be compared.
Examples of models not ‘fitted to the same data’ are where the response is transformed (accelerated-life models are fitted to log-times) and where contingency tables have been used to summarize data.
These are generic functions (with S4 generics defined in package
stats4): however methods should be defined for the
log-likelihood function logLik
rather than these
functions: the action of their default methods is to call logLik
on all the supplied objects and assemble the results. Note that in
several common cases logLik
does not return the value at
the MLE: see its help page.
The log-likelihood and hence the AIC/BIC is only defined up to an
additive constant. Different constants have conventionally been used
for different purposes and so extractAIC
and AIC
may give different values (and do for models of class "lm"
: see
the help for extractAIC
). Particular care is needed
when comparing fits of different classes (with, for example, a
comparison of a Poisson and gamma GLM being meaningless since one has
a discrete response, the other continuous).
BIC
is defined as
AIC(object, ..., k = log(nobs(object)))
.
This needs the number of observations to be known: the default method
looks first for a "nobs"
attribute on the return value from the
logLik
method, then tries the nobs
generic, and if neither succeed returns BIC as NA
.
Value
If just one object is provided, a numeric value with the corresponding
AIC (or BIC, or ..., depending on k
).
If multiple objects are provided, a data.frame
with rows
corresponding to the objects and columns representing the number of
parameters in the model (df
) and the AIC or BIC.
References
Sakamoto, Y., Ishiguro, M., and Kitagawa G. (1986). Akaike Information Criterion Statistics. D. Reidel Publishing Company.
Construct B-spline basis
Description
This is an internal function of package MortalitySmooth which creates equally-spaced B-splines basis over an abscissa of data.
Usage
MortSmooth_bbase(x, xl, xr, ndx, deg)
Arguments
x |
vector for the abscissa of data; |
xl |
left boundary; |
xr |
right boundary; |
ndx |
number of internal knots minus one or number of internal intervals; |
deg |
degree of the splines. |
Details
The function reproduce an algorithm presented by Eilers and Marx (2010) using differences of truncated power functions (see MortSmooth_tpower). The final matrix has a single B-spline for each of the [ndx + deg] columns. The number of rows is equal to the length of x.
The function differs from bs in the package splines since it automatically constructed B-splines with identical shape. This would allow a simple interpretation of coefficients and application of simple differencing.
Value
A matrix containing equally-spaced B-splines of degree deg along x for each column.
Author(s)
Carlo G Camarda
Truncated p-th Power Function
Description
This is an internal function of package MortalitySmooth which constructs a truncated p-th power function along an abscissa within the function MortSmooth_bbase
Usage
MortSmooth_tpower(x, tt, p)
Arguments
x |
for the abscissa of data; |
tt |
vector of truncation points; |
p |
degree of the power |
Details
Internal function used in MortSmooth_bbase. The vector t contains the knots. The simplest system of truncated power functions uses p = 0 and it consists of step functions with jumps of size 1 at the truncation points t.
Author(s)
Carlo G Camarda
Construct B-spline basis This is an internal function which constructs B-spline basis to be used in pclm estimation
Description
Construct B-spline basis This is an internal function which constructs B-spline basis to be used in pclm estimation
Usage
build_B_spline_basis(X, Y, kr, deg, diff, type)
Arguments
X |
vector with ages |
Y |
vector with years |
kr |
Knot ratio. Number of internal intervals used for defining 1 knot in
B-spline basis construction. See |
deg |
Degree of the splines needed to create equally-spaced B-splines basis over an abscissa of data. |
diff |
An integer indicating the order of differences of the components of PCLM coefficients. Default value: 2. |
type |
Type of PCLM model. Options: |
See Also
Build Composition Matrices
Description
Build Composition Matrices
Usage
build_C_matrix(x, y, nlast, offset, out.step, type)
Arguments
x |
Vector containing the starting values of the input intervals/bins.
For example: if we have 3 bins |
y |
Vector with counts to be ungrouped. It must have the same dimension
as |
nlast |
Length of the last interval. In the example above |
offset |
Optional offset term to calculate smooth mortality rates. A vector of the same length as x and y. See Rizzi et al. (2015) for further details. |
out.step |
Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1. |
type |
Type of PCLM model. Options: |
Construct Penalty Matrix
Description
Construct Penalty Matrix
Usage
build_P_matrix(BA, BY, lambda, type)
Arguments
BA |
B-spline basis object for age axis |
BY |
B-spline basis object for year axis |
lambda |
Smoothing parameter to be used in pclm estimation.
If |
type |
Type of PCLM model. Options: |
Compute Standard Errors and Confidence Intervals
Description
Compute Standard Errors and Confidence Intervals
Usage
compute_standard_errors(B, QmQ, QmQP)
Arguments
B |
B-spline basis matrix |
QmQ |
Object generated by pclm.fit |
QmQP |
Object generated by pclm.fit |
Auxiliary for Controlling pclm
Fitting
Description
Auxiliary for Controlling pclm
Fitting
Usage
control.pclm(lambda = NA,
kr = 2,
deg = 3,
int.lambda = c(0.1, 1e+5),
diff = 2,
opt.method = c("BIC", "AIC"),
max.iter = 1e+3,
tol = 1e-3)
Arguments
lambda |
Smoothing parameter to be used in pclm estimation.
If |
kr |
Knot ratio. Number of internal intervals used for defining 1 knot in
B-spline basis construction. See |
deg |
Degree of the splines needed to create equally-spaced B-splines basis over an abscissa of data. |
int.lambda |
If |
diff |
An integer indicating the order of differences of the components of PCLM coefficients. Default value: 2. |
opt.method |
Selection criterion of the model.
Possible values are |
max.iter |
Maximal number of iterations used in fitting procedure. |
tol |
Relative tolerance in PCLM fitting procedure. Default: 0.1% i.e. the estimated aggregate bins should be in the 0.1% error margin. |
Value
A list with exactly eight control parameters.
See Also
Examples
control.pclm()
Auxiliary for Controlling pclm2D
Fitting
Description
Auxiliary for Controlling pclm2D
Fitting
Usage
control.pclm2D(lambda = c(1, 1),
kr = 7,
deg = 3,
int.lambda = c(0.1, 1e+3),
diff = 2,
opt.method = c("BIC", "AIC"),
max.iter = 1e+3,
tol = 1e-3)
Arguments
lambda |
Smoothing parameter to be used in pclm estimation.
If |
kr |
Knot ratio. Number of internal intervals used for defining 1 knot in
B-spline basis construction. See |
deg |
Degree of the splines needed to create equally-spaced B-splines basis over an abscissa of data. |
int.lambda |
If |
diff |
An integer indicating the order of differences of the components of PCLM coefficients. Default value: 2. |
opt.method |
Selection criterion of the model.
Possible values are |
max.iter |
Maximal number of iterations used in fitting procedure. |
tol |
Relative tolerance in PCLM fitting procedure. Default: 0.1% i.e. the estimated aggregate bins should be in the 0.1% error margin. |
Value
A list with exactly eight control parameters.
See Also
Examples
control.pclm2D()
Create an additional bin with a small value at the end. Improves convergence.
Description
Create an additional bin with a small value at the end. Improves convergence.
Usage
create.artificial.bin(i, vy = 1, vo = 1.01)
Arguments
i |
A list of input values corresponding to pclm or pclm2D; |
vy |
Numerical value of the bin created for |
vo |
Numerical values of the bin created for |
Delete from results the last group added artificially in pclm and pclm2D
Description
Delete from results the last group added artificially in pclm and pclm2D
Usage
delete.artificial.bin(M)
Arguments
M |
A pclm.fit object |
Extract Fractional Part of a Number
Description
Extract Fractional Part of a Number
Usage
frac(x)
Arguments
x |
A numeric value, vector or matrix |
Map groups and borders
Description
We assume no missing values between the bins
Usage
map.bins(x, nlast, out.step)
Arguments
x |
Vector containing the starting values of the input intervals/bins.
For example: if we have 3 bins |
nlast |
Length of the last interval. In the example above |
out.step |
Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1. |
Objective function
Description
Objective function
Usage
ofun(L, I, type)
Arguments
L |
lambda.hat |
I |
Input object from pclm function |
type |
Type of PCLM model. Options: |
Optimize Smoothing Parameters
This function optimize searches of lambda, kr
and deg
.
See control.pclm
to see what is their meaning.
The optimization process works in steps. Simultaneous optimization was
tested and found inefficient.
Description
Optimize Smoothing Parameters
This function optimize searches of lambda, kr
and deg
.
See control.pclm
to see what is their meaning.
The optimization process works in steps. Simultaneous optimization was
tested and found inefficient.
Usage
optimize_par(I, type)
Arguments
I |
Input object from pclm function |
type |
Type of PCLM model. Options: |
Univariate Penalized Composite Link Model (PCLM)
Description
Fit univariate penalized composite link model (PCLM) to ungroup binned count data, e.g. age-at-death distributions grouped in age classes.
Usage
pclm(
x,
y,
nlast,
offset = NULL,
out.step = 1,
ci.level = 95,
verbose = FALSE,
control = list()
)
Arguments
x |
Vector containing the starting values of the input intervals/bins.
For example: if we have 3 bins |
y |
Vector with counts to be ungrouped. It must have the same dimension
as |
nlast |
Length of the last interval. In the example above |
offset |
Optional offset term to calculate smooth mortality rates. A vector of the same length as x and y. See Rizzi et al. (2015) for further details. |
out.step |
Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1. |
ci.level |
Level of significance for computing confidence intervals.
Default: |
verbose |
Logical value. Indicates whether a progress bar should be
shown or not.
Default: |
control |
List with additional parameters:
|
Details
The PCLM method is based on the composite link model, which extends standard generalized linear models. It implements the idea that the observed counts, interpreted as realizations from Poisson distributions, are indirect observations of a finer (ungrouped) but latent sequence. This latent sequence represents the distribution of expected means on a fine resolution and has to be estimated from the aggregated data. Estimates are obtained by maximizing a penalized likelihood. This maximization is performed efficiently by a version of the iteratively reweighted least-squares algorithm. Optimal values of the smoothing parameter are chosen by minimizing Bayesian or Akaike's Information Criterion.
Value
The output is a list with the following components:
input |
A list with arguments provided in input. Saved for convenience. |
fitted |
The fitted values of the PCLM model. |
ci |
Confidence intervals around fitted values. |
goodness.of.fit |
A list containing goodness of fit measures: standard errors, AIC and BIC. |
smoothPar |
Estimated smoothing parameters: |
bins.definition |
Additional values to identify the bins limits and location in input and output objects. |
deep |
A list of objects created in the fitting process. Useful in diagnosis of possible issues. |
call |
An unevaluated function call, that is, an unevaluated expression which consists of the named function applied to the given arguments. |
References
Rizzi S, Gampe J, Eilers PHC (2015). “Efficient Estimation of Smooth Distributions From Coarsely Grouped Data.” American Journal of Epidemiology, 182(2), 138-147. doi:10.1093/aje/kwv020.
See Also
Examples
# Data
x <- c(0, 1, seq(5, 85, by = 5))
y <- c(294, 66, 32, 44, 170, 284, 287, 293, 361, 600, 998,
1572, 2529, 4637, 6161, 7369, 10481, 15293, 39016)
offset <- c(114, 440, 509, 492, 628, 618, 576, 580, 634, 657,
631, 584, 573, 619, 530, 384, 303, 245, 249) * 1000
nlast <- 26 # the size of the last interval
# Example 1 ----------------------
M1 <- pclm(x, y, nlast)
ls(M1)
summary(M1)
fitted(M1)
plot(M1)
# Example 2 ----------------------
# ungroup even in smaller intervals
M2 <- pclm(x, y, nlast, out.step = 0.5)
head(fitted(M1))
plot(M1, type = "s")
# Note, in example 1 we are estimating intervals of length 1. In example 2
# we are estimating intervals of length 0.5 using the same aggregate data.
# Example 3 ----------------------
# Do not optimise smoothing parameters; choose your own. Faster.
M3 <- pclm(x, y, nlast, out.step = 0.5,
control = list(lambda = 100, kr = 10, deg = 10))
plot(M3)
summary(M2)
summary(M3) # not the smallest BIC here, but sometimes is not important.
# Example 4 -----------------------
# Grouped x & grouped offset (estimate death rates)
M4 <- pclm(x, y, nlast, offset)
plot(M4, type = "s")
# Example 5 -----------------------
# Grouped x & ungrouped offset (estimate death rates)
ungroupped_Ex <- pclm(x, y = offset, nlast, offset = NULL)$fitted # ungroupped offset data
M5 <- pclm(x, y, nlast, offset = ungroupped_Ex)
Compute Confidence Intervals for PCLM output
Description
Compute Confidence Intervals for PCLM output
Usage
pclm.confidence(fit, out.step, y, SE, ci.level, type, offset)
Arguments
fit |
Fitted values from pclm.fit |
out.step |
Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1. |
y |
Vector with counts to be ungrouped. It must have the same dimension
as |
SE |
Standard Errors |
ci.level |
Level of significance for computing confidence intervals.
Default: |
type |
Type of PCLM model. Options: |
offset |
Optional offset term to calculate smooth mortality rates. A vector of the same length as x and y. See Rizzi et al. (2015) for further details. |
Compute Confidence Intervals for estimated distribution
Description
Compute Confidence Intervals for estimated distribution
Usage
pclm.confidence.dx(X)
Arguments
X |
List of inputs from pclm.confidence function |
Compute Confidence Intervals for estimated hazard
Description
Compute Confidence Intervals for estimated hazard
Usage
pclm.confidence.mx(X)
Arguments
X |
List of inputs from pclm.confidence function |
Fit PCLM Models
Description
This is an internal function used to estimate PCLM model. It is used by
pclm
and pclm2D
functions
Usage
pclm.fit(
x,
y,
nlast,
offset,
out.step,
verbose,
lambda,
kr,
deg,
diff,
max.iter,
tol,
type
)
Arguments
x |
Vector containing the starting values of the input intervals/bins.
For example: if we have 3 bins |
y |
Vector with counts to be ungrouped. It must have the same dimension
as |
nlast |
Length of the last interval. In the example above |
offset |
Optional offset term to calculate smooth mortality rates. A vector of the same length as x and y. See Rizzi et al. (2015) for further details. |
out.step |
Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1. |
verbose |
Logical value. Indicates whether a progress bar should be
shown or not.
Default: |
lambda |
Smoothing parameter to be used in pclm estimation.
If |
kr |
Knot ratio. Number of internal intervals used for defining 1 knot in
B-spline basis construction. See |
deg |
Degree of the splines needed to create equally-spaced B-splines basis over an abscissa of data. |
diff |
An integer indicating the order of differences of the components of PCLM coefficients. Default value: 2. |
max.iter |
Maximal number of iterations used in fitting procedure. |
tol |
Relative tolerance in PCLM fitting procedure. Default: 0.1% i.e. the estimated aggregate bins should be in the 0.1% error margin. |
type |
Type of PCLM model. Options: |
Validate input values
Description
Validate input values
Usage
pclm.input.check(X, pclm.type)
Arguments
X |
A list with input arguments provided in |
Two-Dimensional Penalized Composite Link Model (PCLM-2D)
Description
Fit two-dimensional penalized composite link model (PCLM-2D), e.g. simultaneous ungrouping of age-at-death distributions grouped in age classes for adjacent years. The PCLM can be extended to a two-dimensional regression problem. This is particularly suitable for mortality analysis when mortality surfaces are to be estimated to capture both age-specific trajectories of coarsely grouped distributions and time trends (Rizzi et al. 2019).
Usage
pclm2D(
x,
y,
nlast,
offset = NULL,
out.step = 1,
ci.level = 95,
verbose = TRUE,
control = list()
)
Arguments
x |
Vector containing the starting values of the input intervals/bins.
For example: if we have 3 bins |
y |
|
nlast |
Length of the last interval. In the example above |
offset |
Optional offset term to calculate smooth mortality rates. A vector of the same length as x and y. See Rizzi et al. (2015) for further details. |
out.step |
Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1. |
ci.level |
Level of significance for computing confidence intervals.
Default: |
verbose |
Logical value. Indicates whether a progress bar should be
shown or not. Default: |
control |
List with additional parameters:
|
Value
The output is a list with the following components:
input |
A list with arguments provided in input. Saved for convenience. |
fitted |
The fitted values of the PCLM model. |
ci |
Confidence intervals around fitted values. |
goodness.of.fit |
A list containing goodness of fit measures: standard errors, AIC and BIC. |
smoothPar |
Estimated smoothing parameters: |
bins.definition |
Additional values to identify the bins limits and location in input and output objects. |
deep |
A list of objects created in the fitting process. Useful in diagnosis of possible issues. |
call |
An unevaluated function call, that is, an unevaluated expression which consists of the named function applied to the given arguments. |
References
Rizzi S, Gampe J, Eilers PHC (2015).
“Efficient Estimation of Smooth Distributions From Coarsely Grouped Data.”
American Journal of Epidemiology, 182(2), 138-147.
doi:10.1093/aje/kwv020.
Rizzi S, Halekoh U, Thinggaard M, Engholm G, Christensen N, Johannesen TB, Lindahl-Jacobsen R (2019).
“How to estimate mortality trends from grouped vital statistics.”
International Journal of Epidemiology, 48(2), 571–582.
doi:10.1093/ije/dyy183.
See Also
Examples
# Input data
Dx <- ungroup.data$Dx
Ex <- ungroup.data$Ex
# Aggregate data to be ungrouped in the examples below
# Select a 10y data frame
x <- c(0, 1, seq(5, 85, by = 5))
nlast <- 26
n <- c(diff(x), nlast)
group <- rep(x, n)
y <- aggregate(Dx, by = list(group), FUN = "sum")[, 2:10]
offset <- aggregate(Ex, by = list(group), FUN = "sum")[, 2:10]
# Example 1 ----------------------
# Fit model and ungroup data using PCLM-2D
P1 <- pclm2D(x, y, nlast)
summary(P1)
# Plot fitted values
plot(P1)
# Plot input data
plot(P1, "observed")
# NOTE: pclm2D does not search for optimal smoothing parameters by default
# (like pclm does) because it is more time consuming. If optimization is
# required set lambda = c(NA, NA):
P1 <- pclm2D(x, y, nlast, control = list(lambda = c(NA, NA)))
# Example 2 ----------------------
# Ungroup and build a mortality surface
P2 <- pclm2D(x, y, nlast, offset)
summary(P2)
plot(P2, type = "observed")
plot(P2, type = "fitted")
plot(P2, type = "fitted", colors = c("blue", "red"))
Generic Plot for pclm Class
Description
Generic Plot for pclm Class
Usage
## S3 method for class 'pclm'
plot(x, xlab, ylab, ylim, type, lwd, col, legend, legend.position, ...)
Arguments
x |
An object of class |
xlab |
a label for the x axis, defaults to a description of |
ylab |
a label for the y axis, defaults to a description of |
ylim |
the y limits of the plot. |
type |
1-character string giving the type of plot desired. The following values are possible, for details, see plot: "p" for points, "l" for lines, "b" for both points and lines, "c" for empty points joined by lines, "o" for overplotted points and lines, "s" and "S" for stair steps and "h" for histogram-like vertical lines. Finally, "n" does not produce any points or lines. |
lwd |
Line width, a positive number, defaulting to 2. |
col |
Three colours to be used in the plot for observed values, fitted values and confidence intervals. |
legend |
a character or expression vector
of length |
legend.position |
Legend position, or the x and y co-ordinates to be used to position the legend. |
... |
other graphical parameters (see par for more details). |
See Also
Examples
# See complete examples in pclm help page
Generic Plot for pclm2D Class
Description
The generic plot for a pclm2D
object is constructed using
persp
method.
Usage
## S3 method for class 'pclm2D'
plot(
x,
type = c("fitted", "observed"),
colors = c("#b6e3db", "#e5d9c2", "#b5ba61", "#725428"),
nbcol = 25,
xlab = "x",
ylab = "y",
zlab = "values",
phi = 30,
theta = 210,
border = "grey50",
ticktype = "simple",
...
)
Arguments
x |
an object of class |
type |
chart type. Defines which data are plotted, |
colors |
colors to interpolate; must be a valid argument to
|
nbcol |
dimension of the color palette. Number of colors. Default: 25. |
xlab , ylab , zlab |
titles for the axes. N.B. These must be character strings; expressions are not accepted. Numbers will be coerced to character strings. |
theta , phi |
angles defining the viewing direction.
|
border |
the color of the line drawn around the surface facets.
The default, |
ticktype |
character: |
... |
any other argument to be passed to
|
See Also
Examples
# See complete examples in pclm2D help page
Print for pclm method
Description
Print for pclm method
Usage
## S3 method for class 'pclm'
print(x, ...)
Arguments
x |
An object of class |
... |
further arguments passed to or from other methods. |
Print method for pclm2D
Description
Print method for pclm2D
Usage
## S3 method for class 'pclm2D'
print(x, ...)
Arguments
x |
An object of class |
... |
further arguments passed to or from other methods. |
Print for summary.pclm method
Description
Print for summary.pclm method
Usage
## S3 method for class 'summary.pclm'
print(x, ...)
Arguments
x |
An object of class |
... |
further arguments passed to or from other methods. |
Print method for summary.pclm2D
Description
Print method for summary.pclm2D
Usage
## S3 method for class 'summary.pclm2D'
print(x, ...)
Arguments
x |
An object of class |
... |
further arguments passed to or from other methods. |
Print function for ungroup.data
Description
Print function for ungroup.data
Usage
## S3 method for class 'ungroup.data'
print(x, ...)
Arguments
x |
An |
... |
further arguments passed to or from other methods. |
Extract PCLM Deviance Residuals
Description
Extract PCLM Deviance Residuals
Usage
## S3 method for class 'pclm'
residuals(object, ...)
Arguments
object |
an object for which the extraction of model residuals is meaningful. |
... |
other arguments. |
Value
Residuals extracted from the object object
.
Examples
x <- c(0, 1, seq(5, 85, by = 5))
y <- c(294, 66, 32, 44, 170, 284, 287, 293, 361, 600, 998,
1572, 2529, 4637, 6161, 7369, 10481, 15293, 39016)
M1 <- pclm(x, y, nlast = 26)
residuals(M1)
Extract PCLM-2D Deviance Residuals
Description
Extract PCLM-2D Deviance Residuals
Usage
## S3 method for class 'pclm2D'
residuals(object, ...)
Arguments
object |
an object for which the extraction of model residuals is meaningful. |
... |
other arguments. |
Value
Residuals extracted from the object object
.
Examples
Dx <- ungroup.data$Dx
Ex <- ungroup.data$Ex
# Aggregate data to ungroup it in the example below
x <- c(0, 1, seq(5, 85, by = 5))
nlast <- 26
n <- c(diff(x), nlast)
group <- rep(x, n)
y <- aggregate(Dx, by = list(group), FUN = "sum")[, -1]
# Example
P1 <- pclm2D(x, y, nlast)
residuals(P1)
Sequence function with last value
Description
Sequence function with last value
Usage
seqlast(from, to, by)
Arguments
from , to |
the starting and (maximal) end values of the
sequence. Of length |
by |
number: increment of the sequence. |
Suggest values of out.step
that do not
require an adjustment of nlast
Description
Suggest values of out.step
that do not
require an adjustment of nlast
Usage
suggest.valid.out.step(len, increment = 0.01)
Arguments
len |
Interval length |
increment |
Increment |
Summary for pclm method
Description
Summary for pclm method
Usage
## S3 method for class 'pclm'
summary(object, ...)
Arguments
object |
an object for which a summary is desired. |
... |
additional arguments affecting the summary produced. |
Summary method for pclm2D
Generic function used to produce result summaries of the results produced
by pclm2D
.
Description
Summary method for pclm2D
Generic function used to produce result summaries of the results produced
by pclm2D
.
Usage
## S3 method for class 'pclm2D'
summary(object, ...)
Arguments
object |
an object for which a summary is desired. |
... |
additional arguments affecting the summary produced. |
Test Dataset in the Package
Description
Dataset containing death counts (Dx) and exposures (Ex) by age for a certain population between 1980 and 2014. The data-set is provided for testing purposes only and might be altered and outdated. Download actual demographic data free of charge from Human Mortality Database (2018). Once a username and a password is created on the website the MortalityLaws R package can be used to extract data in R format.
Usage
ungroup.data
Format
An object of class ungroup.data
of length 2.
Source
References
Human Mortality Database (2018).
“University of California, Berkeley (USA), and Max Planck Institute for Demographic Research (Germany). Data downloaded on 17/01/2018.”
https://www.mortality.org.
Pascariu MD (2018).
MortalityLaws: Parametric Mortality Models, Life Tables and HMD.
R package version 1.6.0, https://github.com/mpascariu/MortalityLaws.
See Also
Check if nlast
needs to be adjusted in order to accommodate
out.step
Description
Check if nlast
needs to be adjusted in order to accommodate
out.step
Usage
validate.nlast(x, nlast, out.step)
Arguments
x |
Vector containing the starting values of the input intervals/bins.
For example: if we have 3 bins |
nlast |
Length of the last interval. In the example above |
out.step |
Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1. |