Type: Package
Title: Penalized Composite Link Model for Efficient Estimation of Smooth Distributions from Coarsely Binned Data
Version: 1.4.4
Description: Versatile method for ungrouping histograms (binned count data) assuming that counts are Poisson distributed and that the underlying sequence on a fine grid to be estimated is smooth. The method is based on the composite link model and estimation is achieved by maximizing a penalized likelihood. Smooth detailed sequences of counts and rates are so estimated from the binned counts. Ungrouping binned data can be desirable for many reasons: Bins can be too coarse to allow for accurate analysis; comparisons can be hindered when different grouping approaches are used in different histograms; and the last interval is often wide and open-ended and, thus, covers a lot of information in the tail area. Age-at-death distributions grouped in age classes and abridged life tables are examples of binned data. Because of modest assumptions, the approach is suitable for many demographic and epidemiological applications. For a detailed description of the method and applications see Rizzi et al. (2015) <doi:10.1093/aje/kwv020>.
License: MIT + file LICENSE
LazyData: TRUE
Depends: R (≥ 3.4.0)
Imports: pbapply (≥ 1.3), Rcpp (≥ 0.12.0), Rdpack (≥ 0.8), Matrix
LinkingTo: Rcpp, RcppEigen
Suggests: MortalityLaws (≥ 1.5.0), knitr (≥ 1.20), rmarkdown (≥ 1.10), testthat (≥ 2.0.0)
RdMacros: Rdpack
URL: https://github.com/mpascariu/ungroup
BugReports: https://github.com/mpascariu/ungroup/issues
VignetteBuilder: knitr
Encoding: UTF-8
RoxygenNote: 7.3.0
NeedsCompilation: yes
Packaged: 2024-01-29 15:00:04 UTC; mpascariu
Author: Marius D. Pascariu ORCID iD [aut, cre], Silvia Rizzi [aut], Jonas Schoeley ORCID iD [aut], Maciej J. Danko ORCID iD [aut]
Maintainer: Marius D. Pascariu <rpascariu@outlook.com>
Repository: CRAN
Date/Publication: 2024-01-31 13:00:02 UTC

ungroup: Penalized Composite Link Model for Efficient Estimation of Smooth Distributions from Coarsely Binned Data

Description

Versatile method for ungrouping histograms (binned count data) assuming that counts are Poisson distributed and that the underlying sequence on a fine grid to be estimated is smooth. The method is based on the composite link model and estimation is achieved by maximizing a penalized likelihood. Smooth detailed sequences of counts and rates are so estimated from the binned counts. Ungrouping binned data can be desirable for many reasons: Bins can be too coarse to allow for accurate analysis; comparisons can be hindered when different grouping approaches are used in different histograms; and the last interval is often wide and open-ended and, thus, covers a lot of information in the tail area. Age-at-death distributions grouped in age classes and abridged life tables are examples of binned data. Because of modest assumptions, the approach is suitable for many demographic and epidemiological applications. For a detailed description of the method and applications see Rizzi et al. (2015) doi:10.1093/aje/kwv020.

Details

To learn more about the package, start with the vignettes: browseVignettes(package = "ungroup")

Author(s)

Maintainer: Marius D. Pascariu rpascariu@outlook.com (ORCID)

Authors:

References

Currie ID, Durban M, Eilers PH (2004). “Smoothing and forecasting mortality rates.” Statistical modelling, 4(4), 279–298.

Eilers PH (2007). “Ill-posed problems with counts, the composite link model and penalized likelihood.” Statistical Modelling, 7(3), 239-254. doi:10.1177/1471082X0700700302.

Hastie TJ, Tibshirani RJ (1990). “Generalized additive models.” Monographs on Statistics and Applied Probability, 43.

Human Mortality Database (2018). “University of California, Berkeley (USA), and Max Planck Institute for Demographic Research (Germany). Data downloaded on 17/01/2018.” https://www.mortality.org.

Pascariu MD (2018). MortalityLaws: Parametric Mortality Models, Life Tables and HMD. R package version 1.6.0, https://github.com/mpascariu/MortalityLaws.

Rizzi S, Gampe J, Eilers PHC (2015). “Efficient Estimation of Smooth Distributions From Coarsely Grouped Data.” American Journal of Epidemiology, 182(2), 138-147. doi:10.1093/aje/kwv020.

Rizzi S, Halekoh U, Thinggaard M, Engholm G, Christensen N, Johannesen TB, Lindahl-Jacobsen R (2019). “How to estimate mortality trends from grouped vital statistics.” International Journal of Epidemiology, 48(2), 571–582. doi:10.1093/ije/dyy183.

Rizzi S, Thinggaard M, Engholm G, Christensen N, Johannesen TB, Vaupel JW, Lindahl-Jacobsen R (2016). “Comparison of non-parametric methods for ungrouping coarsely aggregated data.” BMC medical research methodology, 16(1), 59. doi:10.1186/s12874-016-0157-8.

Thompson R, Baker RJ (1981). “Composite link functions in generalized linear models.” Applied Statistics, 125–131.

See Also

Useful links:


PCLM Akaike Information Criterion

Description

Generic function calculating Akaike's ‘An Information Criterion’ for one or several fitted model objects for which a log-likelihood value can be obtained, according to the formula -2 \mbox{log-likelihood} + k n_{par}, where n_{par} represents the number of parameters in the fitted model, and k = 2 for the usual AIC, or k = \log(n) (n being the number of observations) for the so-called BIC or SBC (Schwarz's Bayesian criterion).

Usage

## S3 method for class 'pclm'
AIC(object, ..., k = 2)

Arguments

object

a fitted model object for which there exists a logLik method to extract the corresponding log-likelihood, or an object inheriting from class logLik.

...

optionally more fitted model objects.

k

numeric, the penalty per parameter to be used; the default k = 2 is the classical AIC.

Details

When comparing models fitted by maximum likelihood to the same data, the smaller the AIC or BIC, the better the fit.

The theory of AIC requires that the log-likelihood has been maximized: whereas AIC can be computed for models not fitted by maximum likelihood, their AIC values should not be compared.

Examples of models not ‘fitted to the same data’ are where the response is transformed (accelerated-life models are fitted to log-times) and where contingency tables have been used to summarize data.

These are generic functions (with S4 generics defined in package stats4): however methods should be defined for the log-likelihood function logLik rather than these functions: the action of their default methods is to call logLik on all the supplied objects and assemble the results. Note that in several common cases logLik does not return the value at the MLE: see its help page.

The log-likelihood and hence the AIC/BIC is only defined up to an additive constant. Different constants have conventionally been used for different purposes and so extractAIC and AIC may give different values (and do for models of class "lm": see the help for extractAIC). Particular care is needed when comparing fits of different classes (with, for example, a comparison of a Poisson and gamma GLM being meaningless since one has a discrete response, the other continuous).

BIC is defined as AIC(object, ..., k = log(nobs(object))). This needs the number of observations to be known: the default method looks first for a "nobs" attribute on the return value from the logLik method, then tries the nobs generic, and if neither succeed returns BIC as NA.

Value

If just one object is provided, a numeric value with the corresponding AIC (or BIC, or ..., depending on k).

If multiple objects are provided, a data.frame with rows corresponding to the objects and columns representing the number of parameters in the model (df) and the AIC or BIC.

References

Sakamoto, Y., Ishiguro, M., and Kitagawa G. (1986). Akaike Information Criterion Statistics. D. Reidel Publishing Company.


PCLM-2D Akaike Information Criterion

Description

Generic function calculating Akaike's ‘An Information Criterion’ for one or several fitted model objects for which a log-likelihood value can be obtained, according to the formula -2 \mbox{log-likelihood} + k n_{par}, where n_{par} represents the number of parameters in the fitted model, and k = 2 for the usual AIC, or k = \log(n) (n being the number of observations) for the so-called BIC or SBC (Schwarz's Bayesian criterion).

Usage

## S3 method for class 'pclm2D'
AIC(object, ..., k = 2)

Arguments

object

a fitted model object for which there exists a logLik method to extract the corresponding log-likelihood, or an object inheriting from class logLik.

...

optionally more fitted model objects.

k

numeric, the penalty per parameter to be used; the default k = 2 is the classical AIC.

Details

When comparing models fitted by maximum likelihood to the same data, the smaller the AIC or BIC, the better the fit.

The theory of AIC requires that the log-likelihood has been maximized: whereas AIC can be computed for models not fitted by maximum likelihood, their AIC values should not be compared.

Examples of models not ‘fitted to the same data’ are where the response is transformed (accelerated-life models are fitted to log-times) and where contingency tables have been used to summarize data.

These are generic functions (with S4 generics defined in package stats4): however methods should be defined for the log-likelihood function logLik rather than these functions: the action of their default methods is to call logLik on all the supplied objects and assemble the results. Note that in several common cases logLik does not return the value at the MLE: see its help page.

The log-likelihood and hence the AIC/BIC is only defined up to an additive constant. Different constants have conventionally been used for different purposes and so extractAIC and AIC may give different values (and do for models of class "lm": see the help for extractAIC). Particular care is needed when comparing fits of different classes (with, for example, a comparison of a Poisson and gamma GLM being meaningless since one has a discrete response, the other continuous).

BIC is defined as AIC(object, ..., k = log(nobs(object))). This needs the number of observations to be known: the default method looks first for a "nobs" attribute on the return value from the logLik method, then tries the nobs generic, and if neither succeed returns BIC as NA.

Value

If just one object is provided, a numeric value with the corresponding AIC (or BIC, or ..., depending on k).

If multiple objects are provided, a data.frame with rows corresponding to the objects and columns representing the number of parameters in the model (df) and the AIC or BIC.

References

Sakamoto, Y., Ishiguro, M., and Kitagawa G. (1986). Akaike Information Criterion Statistics. D. Reidel Publishing Company.


PCLM Bayes Information Criterion

Description

Generic function calculating Akaike's ‘An Information Criterion’ for one or several fitted model objects for which a log-likelihood value can be obtained, according to the formula -2 \mbox{log-likelihood} + k n_{par}, where n_{par} represents the number of parameters in the fitted model, and k = 2 for the usual AIC, or k = \log(n) (n being the number of observations) for the so-called BIC or SBC (Schwarz's Bayesian criterion).

Usage

## S3 method for class 'pclm'
BIC(object, ...)

Arguments

object

a fitted model object for which there exists a logLik method to extract the corresponding log-likelihood, or an object inheriting from class logLik.

...

optionally more fitted model objects.

Details

When comparing models fitted by maximum likelihood to the same data, the smaller the AIC or BIC, the better the fit.

The theory of AIC requires that the log-likelihood has been maximized: whereas AIC can be computed for models not fitted by maximum likelihood, their AIC values should not be compared.

Examples of models not ‘fitted to the same data’ are where the response is transformed (accelerated-life models are fitted to log-times) and where contingency tables have been used to summarize data.

These are generic functions (with S4 generics defined in package stats4): however methods should be defined for the log-likelihood function logLik rather than these functions: the action of their default methods is to call logLik on all the supplied objects and assemble the results. Note that in several common cases logLik does not return the value at the MLE: see its help page.

The log-likelihood and hence the AIC/BIC is only defined up to an additive constant. Different constants have conventionally been used for different purposes and so extractAIC and AIC may give different values (and do for models of class "lm": see the help for extractAIC). Particular care is needed when comparing fits of different classes (with, for example, a comparison of a Poisson and gamma GLM being meaningless since one has a discrete response, the other continuous).

BIC is defined as AIC(object, ..., k = log(nobs(object))). This needs the number of observations to be known: the default method looks first for a "nobs" attribute on the return value from the logLik method, then tries the nobs generic, and if neither succeed returns BIC as NA.

Value

If just one object is provided, a numeric value with the corresponding AIC (or BIC, or ..., depending on k).

If multiple objects are provided, a data.frame with rows corresponding to the objects and columns representing the number of parameters in the model (df) and the AIC or BIC.

References

Sakamoto, Y., Ishiguro, M., and Kitagawa G. (1986). Akaike Information Criterion Statistics. D. Reidel Publishing Company.


PCLM-2D Akaike Information Criterion

Description

Generic function calculating Akaike's ‘An Information Criterion’ for one or several fitted model objects for which a log-likelihood value can be obtained, according to the formula -2 \mbox{log-likelihood} + k n_{par}, where n_{par} represents the number of parameters in the fitted model, and k = 2 for the usual AIC, or k = \log(n) (n being the number of observations) for the so-called BIC or SBC (Schwarz's Bayesian criterion).

Usage

## S3 method for class 'pclm2D'
BIC(object, ..., k = 2)

Arguments

object

a fitted model object for which there exists a logLik method to extract the corresponding log-likelihood, or an object inheriting from class logLik.

...

optionally more fitted model objects.

k

numeric, the penalty per parameter to be used; the default k = 2 is the classical AIC.

Details

When comparing models fitted by maximum likelihood to the same data, the smaller the AIC or BIC, the better the fit.

The theory of AIC requires that the log-likelihood has been maximized: whereas AIC can be computed for models not fitted by maximum likelihood, their AIC values should not be compared.

Examples of models not ‘fitted to the same data’ are where the response is transformed (accelerated-life models are fitted to log-times) and where contingency tables have been used to summarize data.

These are generic functions (with S4 generics defined in package stats4): however methods should be defined for the log-likelihood function logLik rather than these functions: the action of their default methods is to call logLik on all the supplied objects and assemble the results. Note that in several common cases logLik does not return the value at the MLE: see its help page.

The log-likelihood and hence the AIC/BIC is only defined up to an additive constant. Different constants have conventionally been used for different purposes and so extractAIC and AIC may give different values (and do for models of class "lm": see the help for extractAIC). Particular care is needed when comparing fits of different classes (with, for example, a comparison of a Poisson and gamma GLM being meaningless since one has a discrete response, the other continuous).

BIC is defined as AIC(object, ..., k = log(nobs(object))). This needs the number of observations to be known: the default method looks first for a "nobs" attribute on the return value from the logLik method, then tries the nobs generic, and if neither succeed returns BIC as NA.

Value

If just one object is provided, a numeric value with the corresponding AIC (or BIC, or ..., depending on k).

If multiple objects are provided, a data.frame with rows corresponding to the objects and columns representing the number of parameters in the model (df) and the AIC or BIC.

References

Sakamoto, Y., Ishiguro, M., and Kitagawa G. (1986). Akaike Information Criterion Statistics. D. Reidel Publishing Company.


Construct B-spline basis

Description

This is an internal function of package MortalitySmooth which creates equally-spaced B-splines basis over an abscissa of data.

Usage

MortSmooth_bbase(x, xl, xr, ndx, deg)

Arguments

x

vector for the abscissa of data;

xl

left boundary;

xr

right boundary;

ndx

number of internal knots minus one or number of internal intervals;

deg

degree of the splines.

Details

The function reproduce an algorithm presented by Eilers and Marx (2010) using differences of truncated power functions (see MortSmooth_tpower). The final matrix has a single B-spline for each of the [ndx + deg] columns. The number of rows is equal to the length of x.

The function differs from bs in the package splines since it automatically constructed B-splines with identical shape. This would allow a simple interpretation of coefficients and application of simple differencing.

Value

A matrix containing equally-spaced B-splines of degree deg along x for each column.

Author(s)

Carlo G Camarda


Truncated p-th Power Function

Description

This is an internal function of package MortalitySmooth which constructs a truncated p-th power function along an abscissa within the function MortSmooth_bbase

Usage

MortSmooth_tpower(x, tt, p)

Arguments

x

for the abscissa of data;

tt

vector of truncation points;

p

degree of the power

Details

Internal function used in MortSmooth_bbase. The vector t contains the knots. The simplest system of truncated power functions uses p = 0 and it consists of step functions with jumps of size 1 at the truncation points t.

Author(s)

Carlo G Camarda


Construct B-spline basis This is an internal function which constructs B-spline basis to be used in pclm estimation

Description

Construct B-spline basis This is an internal function which constructs B-spline basis to be used in pclm estimation

Usage

build_B_spline_basis(X, Y, kr, deg, diff, type)

Arguments

X

vector with ages

Y

vector with years

kr

Knot ratio. Number of internal intervals used for defining 1 knot in B-spline basis construction. See MortSmooth_bbase.

deg

Degree of the splines needed to create equally-spaced B-splines basis over an abscissa of data.

diff

An integer indicating the order of differences of the components of PCLM coefficients. Default value: 2.

type

Type of PCLM model. Options: "1D", "2D" for univariate and two-dimensional model respectively.

See Also

MortSmooth_bbase


Build Composition Matrices

Description

Build Composition Matrices

Usage

build_C_matrix(x, y, nlast, offset, out.step, type)

Arguments

x

Vector containing the starting values of the input intervals/bins. For example: if we have 3 bins [0,5), [5,10) and [10, 15), x will be defined by the vector: c(0, 5, 10).

y

Vector with counts to be ungrouped. It must have the same dimension as x.

nlast

Length of the last interval. In the example above nlast would be 5.

offset

Optional offset term to calculate smooth mortality rates. A vector of the same length as x and y. See Rizzi et al. (2015) for further details.

out.step

Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1.

type

Type of PCLM model. Options: "1D", "2D" for univariate and two-dimensional model respectively.


Construct Penalty Matrix

Description

Construct Penalty Matrix

Usage

build_P_matrix(BA, BY, lambda, type)

Arguments

BA

B-spline basis object for age axis

BY

B-spline basis object for year axis

lambda

Smoothing parameter to be used in pclm estimation. If lambda = NA an algorithm will find the optimal values.

type

Type of PCLM model. Options: "1D", "2D" for univariate and two-dimensional model respectively.


Compute Standard Errors and Confidence Intervals

Description

Compute Standard Errors and Confidence Intervals

Usage

compute_standard_errors(B, QmQ, QmQP)

Arguments

B

B-spline basis matrix

QmQ

Object generated by pclm.fit

QmQP

Object generated by pclm.fit


Auxiliary for Controlling pclm Fitting

Description

Auxiliary for Controlling pclm Fitting

Usage

control.pclm(lambda     = NA,
             kr         = 2,
             deg        = 3,
             int.lambda = c(0.1, 1e+5),
             diff       = 2,
             opt.method = c("BIC", "AIC"),
             max.iter   = 1e+3,
             tol        = 1e-3)

Arguments

lambda

Smoothing parameter to be used in pclm estimation. If lambda = NA an algorithm will find the optimal values.

kr

Knot ratio. Number of internal intervals used for defining 1 knot in B-spline basis construction. See MortSmooth_bbase.

deg

Degree of the splines needed to create equally-spaced B-splines basis over an abscissa of data.

int.lambda

If lambda is optimized an interval to be searched needs to be specified. Format: vector containing the end-points.

diff

An integer indicating the order of differences of the components of PCLM coefficients. Default value: 2.

opt.method

Selection criterion of the model. Possible values are "AIC" and "BIC". Default: "BIC".

max.iter

Maximal number of iterations used in fitting procedure.

tol

Relative tolerance in PCLM fitting procedure. Default: 0.1% i.e. the estimated aggregate bins should be in the 0.1% error margin.

Value

A list with exactly eight control parameters.

See Also

pclm

Examples

control.pclm()

Auxiliary for Controlling pclm2D Fitting

Description

Auxiliary for Controlling pclm2D Fitting

Usage

control.pclm2D(lambda     = c(1, 1),
               kr         = 7,
               deg        = 3,
               int.lambda = c(0.1, 1e+3),
               diff       = 2,
               opt.method = c("BIC", "AIC"),
               max.iter   = 1e+3,
               tol        = 1e-3)

Arguments

lambda

Smoothing parameter to be used in pclm estimation. If lambda = NA an algorithm will find the optimal values.

kr

Knot ratio. Number of internal intervals used for defining 1 knot in B-spline basis construction. See MortSmooth_bbase.

deg

Degree of the splines needed to create equally-spaced B-splines basis over an abscissa of data.

int.lambda

If lambda is optimized an interval to be searched needs to be specified. Format: vector containing the end-points.

diff

An integer indicating the order of differences of the components of PCLM coefficients. Default value: 2.

opt.method

Selection criterion of the model. Possible values are "AIC" and "BIC". Default: "BIC".

max.iter

Maximal number of iterations used in fitting procedure.

tol

Relative tolerance in PCLM fitting procedure. Default: 0.1% i.e. the estimated aggregate bins should be in the 0.1% error margin.

Value

A list with exactly eight control parameters.

See Also

pclm2D

Examples

control.pclm2D()

Create an additional bin with a small value at the end. Improves convergence.

Description

Create an additional bin with a small value at the end. Improves convergence.

Usage

create.artificial.bin(i, vy = 1, vo = 1.01)

Arguments

i

A list of input values corresponding to pclm or pclm2D;

vy

Numerical value of the bin created for y input;

vo

Numerical values of the bin created for offset input (if the case).


Delete from results the last group added artificially in pclm and pclm2D

Description

Delete from results the last group added artificially in pclm and pclm2D

Usage

delete.artificial.bin(M)

Arguments

M

A pclm.fit object


Extract Fractional Part of a Number

Description

Extract Fractional Part of a Number

Usage

frac(x)

Arguments

x

A numeric value, vector or matrix


Map groups and borders

Description

We assume no missing values between the bins

Usage

map.bins(x, nlast, out.step)

Arguments

x

Vector containing the starting values of the input intervals/bins. For example: if we have 3 bins [0,5), [5,10) and [10, 15), x will be defined by the vector: c(0, 5, 10).

nlast

Length of the last interval. In the example above nlast would be 5.

out.step

Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1.


Objective function

Description

Objective function

Usage

ofun(L, I, type)

Arguments

L

lambda.hat

I

Input object from pclm function

type

Type of PCLM model. Options: "1D", "2D" for univariate and two-dimensional model respectively.


Optimize Smoothing Parameters This function optimize searches of lambda, kr and deg. See control.pclm to see what is their meaning. The optimization process works in steps. Simultaneous optimization was tested and found inefficient.

Description

Optimize Smoothing Parameters This function optimize searches of lambda, kr and deg. See control.pclm to see what is their meaning. The optimization process works in steps. Simultaneous optimization was tested and found inefficient.

Usage

optimize_par(I, type)

Arguments

I

Input object from pclm function

type

Type of PCLM model. Options: "1D", "2D" for univariate and two-dimensional model respectively.


Univariate Penalized Composite Link Model (PCLM)

Description

Fit univariate penalized composite link model (PCLM) to ungroup binned count data, e.g. age-at-death distributions grouped in age classes.

Usage

pclm(
  x,
  y,
  nlast,
  offset = NULL,
  out.step = 1,
  ci.level = 95,
  verbose = FALSE,
  control = list()
)

Arguments

x

Vector containing the starting values of the input intervals/bins. For example: if we have 3 bins [0,5), [5,10) and [10, 15), x will be defined by the vector: c(0, 5, 10).

y

Vector with counts to be ungrouped. It must have the same dimension as x.

nlast

Length of the last interval. In the example above nlast would be 5.

offset

Optional offset term to calculate smooth mortality rates. A vector of the same length as x and y. See Rizzi et al. (2015) for further details.

out.step

Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1.

ci.level

Level of significance for computing confidence intervals. Default: 95.

verbose

Logical value. Indicates whether a progress bar should be shown or not. Default: FALSE.

control

List with additional parameters:

  • lambda – Smoothing parameter to be used in pclm estimation. If lambda = NA an algorithm will find the optimal values.

  • kr – Knot ratio. Number of internal intervals used for defining 1 knot in B-spline basis construction. See MortSmooth_bbase.

  • deg – Degree of the splines needed to create equally-spaced B-splines basis over an abscissa of data.

  • int.lambda – If lambda is optimized an interval to be searched needs to be specified. Format: vector containing the end-points.

  • diff – An integer indicating the order of differences of the components of PCLM coefficients.

  • opt.method – Selection criterion of the model. Possible values are "AIC" and "BIC".

  • max.iter – Maximal number of iterations used in fitting procedure.

  • tol – Relative tolerance in PCLM fitting procedure.

Details

The PCLM method is based on the composite link model, which extends standard generalized linear models. It implements the idea that the observed counts, interpreted as realizations from Poisson distributions, are indirect observations of a finer (ungrouped) but latent sequence. This latent sequence represents the distribution of expected means on a fine resolution and has to be estimated from the aggregated data. Estimates are obtained by maximizing a penalized likelihood. This maximization is performed efficiently by a version of the iteratively reweighted least-squares algorithm. Optimal values of the smoothing parameter are chosen by minimizing Bayesian or Akaike's Information Criterion.

Value

The output is a list with the following components:

input

A list with arguments provided in input. Saved for convenience.

fitted

The fitted values of the PCLM model.

ci

Confidence intervals around fitted values.

goodness.of.fit

A list containing goodness of fit measures: standard errors, AIC and BIC.

smoothPar

Estimated smoothing parameters: lambda, kr and deg.

bins.definition

Additional values to identify the bins limits and location in input and output objects.

deep

A list of objects created in the fitting process. Useful in diagnosis of possible issues.

call

An unevaluated function call, that is, an unevaluated expression which consists of the named function applied to the given arguments.

References

Rizzi S, Gampe J, Eilers PHC (2015). “Efficient Estimation of Smooth Distributions From Coarsely Grouped Data.” American Journal of Epidemiology, 182(2), 138-147. doi:10.1093/aje/kwv020.

See Also

control.pclm plot.pclm

Examples

# Data  
x <- c(0, 1, seq(5, 85, by = 5))
y <- c(294, 66, 32, 44, 170, 284, 287, 293, 361, 600, 998, 
       1572, 2529, 4637, 6161, 7369, 10481, 15293, 39016)
offset <- c(114, 440, 509, 492, 628, 618, 576, 580, 634, 657, 
            631, 584, 573, 619, 530, 384, 303, 245, 249) * 1000
nlast <- 26 # the size of the last interval

# Example 1 ----------------------
M1 <- pclm(x, y, nlast)
ls(M1)
summary(M1)
fitted(M1)
plot(M1)

# Example 2 ----------------------
# ungroup even in smaller intervals
M2 <- pclm(x, y, nlast, out.step = 0.5)
head(fitted(M1))
plot(M1, type = "s")
# Note, in example 1 we are estimating intervals of length 1. In example 2 
# we are estimating intervals of length 0.5 using the same aggregate data.

# Example 3 ----------------------
# Do not optimise smoothing parameters; choose your own. Faster.
M3 <- pclm(x, y, nlast, out.step = 0.5, 
           control = list(lambda = 100, kr = 10, deg = 10))
plot(M3)

summary(M2)
summary(M3) # not the smallest BIC here, but sometimes is not important.

# Example 4 -----------------------
# Grouped x & grouped offset (estimate death rates)
M4 <- pclm(x, y, nlast, offset)
plot(M4, type = "s")

# Example 5 -----------------------
# Grouped x & ungrouped offset (estimate death rates)

ungroupped_Ex <- pclm(x, y = offset, nlast, offset = NULL)$fitted # ungroupped offset data

M5 <- pclm(x, y, nlast, offset = ungroupped_Ex)

Compute Confidence Intervals for PCLM output

Description

Compute Confidence Intervals for PCLM output

Usage

pclm.confidence(fit, out.step, y, SE, ci.level, type, offset)

Arguments

fit

Fitted values from pclm.fit

out.step

Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1.

y

Vector with counts to be ungrouped. It must have the same dimension as x.

SE

Standard Errors

ci.level

Level of significance for computing confidence intervals. Default: 95.

type

Type of PCLM model. Options: "1D", "2D" for univariate and two-dimensional model respectively.

offset

Optional offset term to calculate smooth mortality rates. A vector of the same length as x and y. See Rizzi et al. (2015) for further details.


Compute Confidence Intervals for estimated distribution

Description

Compute Confidence Intervals for estimated distribution

Usage

pclm.confidence.dx(X)

Arguments

X

List of inputs from pclm.confidence function


Compute Confidence Intervals for estimated hazard

Description

Compute Confidence Intervals for estimated hazard

Usage

pclm.confidence.mx(X)

Arguments

X

List of inputs from pclm.confidence function


Fit PCLM Models

Description

This is an internal function used to estimate PCLM model. It is used by pclm and pclm2D functions

Usage

pclm.fit(
  x,
  y,
  nlast,
  offset,
  out.step,
  verbose,
  lambda,
  kr,
  deg,
  diff,
  max.iter,
  tol,
  type
)

Arguments

x

Vector containing the starting values of the input intervals/bins. For example: if we have 3 bins [0,5), [5,10) and [10, 15), x will be defined by the vector: c(0, 5, 10).

y

Vector with counts to be ungrouped. It must have the same dimension as x.

nlast

Length of the last interval. In the example above nlast would be 5.

offset

Optional offset term to calculate smooth mortality rates. A vector of the same length as x and y. See Rizzi et al. (2015) for further details.

out.step

Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1.

verbose

Logical value. Indicates whether a progress bar should be shown or not. Default: FALSE.

lambda

Smoothing parameter to be used in pclm estimation. If lambda = NA an algorithm will find the optimal values.

kr

Knot ratio. Number of internal intervals used for defining 1 knot in B-spline basis construction. See MortSmooth_bbase.

deg

Degree of the splines needed to create equally-spaced B-splines basis over an abscissa of data.

diff

An integer indicating the order of differences of the components of PCLM coefficients. Default value: 2.

max.iter

Maximal number of iterations used in fitting procedure.

tol

Relative tolerance in PCLM fitting procedure. Default: 0.1% i.e. the estimated aggregate bins should be in the 0.1% error margin.

type

Type of PCLM model. Options: "1D", "2D" for univariate and two-dimensional model respectively.


Validate input values

Description

Validate input values

Usage

pclm.input.check(X, pclm.type)

Arguments

X

A list with input arguments provided in pclm function


Two-Dimensional Penalized Composite Link Model (PCLM-2D)

Description

Fit two-dimensional penalized composite link model (PCLM-2D), e.g. simultaneous ungrouping of age-at-death distributions grouped in age classes for adjacent years. The PCLM can be extended to a two-dimensional regression problem. This is particularly suitable for mortality analysis when mortality surfaces are to be estimated to capture both age-specific trajectories of coarsely grouped distributions and time trends (Rizzi et al. 2019).

Usage

pclm2D(
  x,
  y,
  nlast,
  offset = NULL,
  out.step = 1,
  ci.level = 95,
  verbose = TRUE,
  control = list()
)

Arguments

x

Vector containing the starting values of the input intervals/bins. For example: if we have 3 bins [0,5), [5,10) and [10, 15), x will be defined by the vector: c(0, 5, 10).

y

data.frame with counts to be ungrouped. The number of rows should be equal with the length of x.

nlast

Length of the last interval. In the example above nlast would be 5.

offset

Optional offset term to calculate smooth mortality rates. A vector of the same length as x and y. See Rizzi et al. (2015) for further details.

out.step

Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1.

ci.level

Level of significance for computing confidence intervals. Default: 95.

verbose

Logical value. Indicates whether a progress bar should be shown or not. Default: TRUE.

control

List with additional parameters:

  • lambda – Smoothing parameter to be used in pclm estimation. If lambda = NA an algorithm will find the optimal values.

  • kr – Knot ratio. Number of internal intervals used for defining 1 knot in B-spline basis construction. See MortSmooth_bbase.

  • deg – Degree of the splines needed to create equally-spaced B-splines basis over an abscissa of data.

  • int.lambda – If lambda is optimized an interval to be searched needs to be specified. Format: vector containing the end-points.

  • diff – An integer indicating the order of differences of the components of PCLM coefficients.

  • opt.method – Selection criterion of the model. Possible values are "AIC" and "BIC".

  • max.iter – Maximal number of iterations used in fitting procedure.

  • tol – Relative tolerance in PCLM fitting procedure.

Value

The output is a list with the following components:

input

A list with arguments provided in input. Saved for convenience.

fitted

The fitted values of the PCLM model.

ci

Confidence intervals around fitted values.

goodness.of.fit

A list containing goodness of fit measures: standard errors, AIC and BIC.

smoothPar

Estimated smoothing parameters: lambda, kr and deg.

bins.definition

Additional values to identify the bins limits and location in input and output objects.

deep

A list of objects created in the fitting process. Useful in diagnosis of possible issues.

call

An unevaluated function call, that is, an unevaluated expression which consists of the named function applied to the given arguments.

References

Rizzi S, Gampe J, Eilers PHC (2015). “Efficient Estimation of Smooth Distributions From Coarsely Grouped Data.” American Journal of Epidemiology, 182(2), 138-147. doi:10.1093/aje/kwv020.

Rizzi S, Halekoh U, Thinggaard M, Engholm G, Christensen N, Johannesen TB, Lindahl-Jacobsen R (2019). “How to estimate mortality trends from grouped vital statistics.” International Journal of Epidemiology, 48(2), 571–582. doi:10.1093/ije/dyy183.

See Also

control.pclm2D plot.pclm2D

Examples

# Input data
Dx <- ungroup.data$Dx
Ex <- ungroup.data$Ex

# Aggregate data to be ungrouped in the examples below
# Select a 10y data frame
x      <- c(0, 1, seq(5, 85, by = 5))
nlast  <- 26
n      <- c(diff(x), nlast)
group  <- rep(x, n)
y      <- aggregate(Dx, by = list(group), FUN = "sum")[, 2:10]
offset <- aggregate(Ex, by = list(group), FUN = "sum")[, 2:10]

# Example 1 ---------------------- 
# Fit model and ungroup data using PCLM-2D
P1 <- pclm2D(x, y, nlast)
summary(P1)

# Plot fitted values
plot(P1)

# Plot input data
plot(P1, "observed")

# NOTE: pclm2D does not search for optimal smoothing parameters by default
# (like pclm does) because it is more time consuming. If optimization is 
# required set lambda = c(NA, NA):

P1 <- pclm2D(x, y, nlast, control = list(lambda = c(NA, NA)))

# Example 2 ---------------------- 
# Ungroup and build a mortality surface
P2 <- pclm2D(x, y, nlast, offset)
summary(P2)

plot(P2, type = "observed")
plot(P2, type = "fitted")
plot(P2, type = "fitted", colors = c("blue", "red"))

Generic Plot for pclm Class

Description

Generic Plot for pclm Class

Usage

## S3 method for class 'pclm'
plot(x, xlab, ylab, ylim, type, lwd, col, legend, legend.position, ...)

Arguments

x

An object of class pclm

xlab

a label for the x axis, defaults to a description of x.

ylab

a label for the y axis, defaults to a description of y.

ylim

the y limits of the plot.

type

1-character string giving the type of plot desired. The following values are possible, for details, see plot: "p" for points, "l" for lines, "b" for both points and lines, "c" for empty points joined by lines, "o" for overplotted points and lines, "s" and "S" for stair steps and "h" for histogram-like vertical lines. Finally, "n" does not produce any points or lines.

lwd

Line width, a positive number, defaulting to 2.

col

Three colours to be used in the plot for observed values, fitted values and confidence intervals.

legend

a character or expression vector of length \ge 1 to appear in the legend. Other objects will be coerced by as.graphicsAnnot.

legend.position

Legend position, or the x and y co-ordinates to be used to position the legend.

...

other graphical parameters (see par for more details).

See Also

pclm

Examples

# See complete examples in pclm help page

Generic Plot for pclm2D Class

Description

The generic plot for a pclm2D object is constructed using persp method.

Usage

## S3 method for class 'pclm2D'
plot(
  x,
  type = c("fitted", "observed"),
  colors = c("#b6e3db", "#e5d9c2", "#b5ba61", "#725428"),
  nbcol = 25,
  xlab = "x",
  ylab = "y",
  zlab = "values",
  phi = 30,
  theta = 210,
  border = "grey50",
  ticktype = "simple",
  ...
)

Arguments

x

an object of class pclm2D.

type

chart type. Defines which data are plotted, "fitted" values or "observed" input data. Default: "fitted".

colors

colors to interpolate; must be a valid argument to col2rgb().

nbcol

dimension of the color palette. Number of colors. Default: 25.

xlab, ylab, zlab

titles for the axes. N.B. These must be character strings; expressions are not accepted. Numbers will be coerced to character strings.

theta, phi

angles defining the viewing direction. theta gives the azimuthal direction and phi the colatitude.

border

the color of the line drawn around the surface facets. The default, NULL, corresponds to par("fg"). A value of NA will disable the drawing of borders: this is sometimes useful when the surface is shaded.

ticktype

character: "simple" draws just an arrow parallel to the axis to indicate direction of increase; "detailed" draws normal ticks as per 2D plots.

...

any other argument to be passed to persp.

See Also

pclm2D

Examples

# See complete examples in pclm2D help page

Print for pclm method

Description

Print for pclm method

Usage

## S3 method for class 'pclm'
print(x, ...)

Arguments

x

An object of class "pclm"

...

further arguments passed to or from other methods.


Print method for pclm2D

Description

Print method for pclm2D

Usage

## S3 method for class 'pclm2D'
print(x, ...)

Arguments

x

An object of class "pclm2D"

...

further arguments passed to or from other methods.


Print for summary.pclm method

Description

Print for summary.pclm method

Usage

## S3 method for class 'summary.pclm'
print(x, ...)

Arguments

x

An object of class "summary.pclm"

...

further arguments passed to or from other methods.


Print method for summary.pclm2D

Description

Print method for summary.pclm2D

Usage

## S3 method for class 'summary.pclm2D'
print(x, ...)

Arguments

x

An object of class "summary.pclm2D"

...

further arguments passed to or from other methods.


Print function for ungroup.data

Description

Print function for ungroup.data

Usage

## S3 method for class 'ungroup.data'
print(x, ...)

Arguments

x

An ungroup.data object

...

further arguments passed to or from other methods.


Extract PCLM Deviance Residuals

Description

Extract PCLM Deviance Residuals

Usage

## S3 method for class 'pclm'
residuals(object, ...)

Arguments

object

an object for which the extraction of model residuals is meaningful.

...

other arguments.

Value

Residuals extracted from the object object.

Examples

x <- c(0, 1, seq(5, 85, by = 5))
y <- c(294, 66, 32, 44, 170, 284, 287, 293, 361, 600, 998, 
       1572, 2529, 4637, 6161, 7369, 10481, 15293, 39016)
M1 <- pclm(x, y, nlast = 26)

residuals(M1)

Extract PCLM-2D Deviance Residuals

Description

Extract PCLM-2D Deviance Residuals

Usage

## S3 method for class 'pclm2D'
residuals(object, ...)

Arguments

object

an object for which the extraction of model residuals is meaningful.

...

other arguments.

Value

Residuals extracted from the object object.

Examples


Dx <- ungroup.data$Dx
Ex <- ungroup.data$Ex

# Aggregate data to ungroup it in the example below
x      <- c(0, 1, seq(5, 85, by = 5))
nlast  <- 26
n      <- c(diff(x), nlast)
group  <- rep(x, n)
y      <- aggregate(Dx, by = list(group), FUN = "sum")[, -1]

# Example
P1 <- pclm2D(x, y, nlast)

residuals(P1)

Sequence function with last value

Description

Sequence function with last value

Usage

seqlast(from, to, by)

Arguments

from, to

the starting and (maximal) end values of the sequence. Of length 1 unless just from is supplied as an unnamed argument.

by

number: increment of the sequence.


Suggest values of out.step that do not require an adjustment of nlast

Description

Suggest values of out.step that do not require an adjustment of nlast

Usage

suggest.valid.out.step(len, increment = 0.01)

Arguments

len

Interval length

increment

Increment


Summary for pclm method

Description

Summary for pclm method

Usage

## S3 method for class 'pclm'
summary(object, ...)

Arguments

object

an object for which a summary is desired.

...

additional arguments affecting the summary produced.


Summary method for pclm2D Generic function used to produce result summaries of the results produced by pclm2D.

Description

Summary method for pclm2D Generic function used to produce result summaries of the results produced by pclm2D.

Usage

## S3 method for class 'pclm2D'
summary(object, ...)

Arguments

object

an object for which a summary is desired.

...

additional arguments affecting the summary produced.


Test Dataset in the Package

Description

Dataset containing death counts (Dx) and exposures (Ex) by age for a certain population between 1980 and 2014. The data-set is provided for testing purposes only and might be altered and outdated. Download actual demographic data free of charge from Human Mortality Database (2018). Once a username and a password is created on the website the MortalityLaws R package can be used to extract data in R format.

Usage

ungroup.data

Format

An object of class ungroup.data of length 2.

Source

Human Mortality Database

References

Human Mortality Database (2018). “University of California, Berkeley (USA), and Max Planck Institute for Demographic Research (Germany). Data downloaded on 17/01/2018.” https://www.mortality.org.

Pascariu MD (2018). MortalityLaws: Parametric Mortality Models, Life Tables and HMD. R package version 1.6.0, https://github.com/mpascariu/MortalityLaws.

See Also

ReadHMD


Check if nlast needs to be adjusted in order to accommodate out.step

Description

Check if nlast needs to be adjusted in order to accommodate out.step

Usage

validate.nlast(x, nlast, out.step)

Arguments

x

Vector containing the starting values of the input intervals/bins. For example: if we have 3 bins [0,5), [5,10) and [10, 15), x will be defined by the vector: c(0, 5, 10).

nlast

Length of the last interval. In the example above nlast would be 5.

out.step

Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1.