Help for package DLSSM

Type:

Package

Title:

Dynamic Logistic State Space Prediction Model

Version:

1.1.1

Maintainer:

Jiakun Jiang <jiakunj@bnu.edu.cn>

Description:

Implements the dynamic logistic state space model for binary outcome data proposed by Jiang et al. (2021) <doi:10.1111/biom.13593>. It provides a computationally efficient way to update the prediction whenever new data becomes available. It allows for both time-varying and time-invariant coefficients, and use cubic smoothing splines to model varying coefficients. The smoothing parameters are objectively chosen by maximum likelihood. The model is updated using batch data accumulated at pre-specified time intervals.

License:

GPL-3

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.3.0

Imports:

Matrix

Depends:

R (≥ 3.10)

Suggests:

knitr, rmarkdown, testthat (≥ 3.0.0), withr

VignetteBuilder:

knitr

Config/testthat/edition:

SystemRequirements:

Intel MKL (optional for enhanced performance)

NeedsCompilation:

Packaged:

2025-05-18 15:48:47 UTC; jiakun

Author:

Jiakun Jiang [aut, cre], Wei Yang [aut], Wensheng Guo [aut]

Repository:

CRAN

Date/Publication:

2025-05-22 05:10:16 UTC

Combine data into Batched data

Description

The time domain of observation will first be standardized into [0,1]. Then [0,1] will be divided into S equally spaced intervals as described in Jiang et al.(2021, Biometrics). Then those intervals slice the dataset to S batches of data.

Usage

Batched(formula, data, time, S)

Arguments

formula

An object of class "formula" (or one that can be coerced to that class): a symbolic description of response and covariates in the model.

data

Dataset matrix containing the observations (one rows is a sample).

time

The time variable in the dataset. The varying coefficient functions are assumed to be smooth functions of this variable.

S

Number of batches

Value

`batched`	List of batched data, the element of list is matrix with each row representing a sample

`gap.len`	interval length 1/S

Author(s)

Jiakun Jiang, Wei Yang, Wensheng Guo

Combine model training and validation in a integrated function

Description

This combine model training and validation in a integrated automatic function DLSSM().

Usage

DLSSM(data.batched, S0, vary.effects, autotune = TRUE, Lambda = NULL, K)

Arguments

data.batched

A object generated by function Data.batched()

S0

Number of batches of data to be used as training dataset

vary.effects

The names of variables in the dataset assumed to have a time-varying regression effect on the outcome.

autotune

T/F indicates whether or not the automatic tuning procedure desribed in Jiakun et al. (2021) should be applied. Default is true.

Lambda

Specify smoothing parameters if autotune=F

K

Number of steps for ahead prediction

Value

`Lambda:`	smoothing parameters

`Smooth:`	smoothed state vector

`Smooth.var:`	covariance of smoothed state vector in Smooth.

Author(s)

Jiakun Jiang, Wei Yang and Wensheng Guo

Examples

 
set.seed(321)
n=8000
beta0=function(t)   0.1*t-1
beta1=function(t)  cos(2*pi*t)
beta2=function(t)  sin(2*pi*t)
alph1=alph2=1
x=matrix(runif(n*4,min=-4,max=4),nrow=n,ncol=4)
t=sort(runif(n))
coef=cbind(beta0(t),beta1(t),beta2(t),rep(alph1,n),rep(alph2,n))
covar=cbind(rep(1,n),x)
linear=apply(coef*covar,1,sum)
prob=exp(linear)/(1+exp(linear))
y=as.numeric(runif(n)<prob)
sim.data=cbind(y,x,t)
colnames(sim.data)=c("y","x1","x2","x3","x4","t")
formula = y~x1+x2+x3+x4
# Divide the time domain [0,1] into S=100 equally spaced intervals
S=100
S0=75
data.batched=Batched(formula, data=sim.data, time="t", S)

# Take first S0=75 batches as training data, remaining S-S0=25 batches of data as validation data.
 fit1=DLSSM(data.batched, S0, vary.effects=c("x1","x2"), autotune=TRUE, Lambda=NULL, K=1)
 DLSSM.plot(fit1)
 fit2=DLSSM(data.batched, S0, vary.effects=c("x1","x2"), autotune=TRUE, Lambda=NULL, K=2)
 DLSSM.plot(fit2)

Initial model fitting

Description

This function is for tuning smoothing parameters using training data. The likelihood was calculated by Kalman Filter and maximized to estimate the smoothing parameters. For the given smoothing parameters, the model coefficients can be efficiently estimated using a Kalman filtering algorithm.

Usage

DLSSM.init(data.batched, S0, vary.effects, autotune = TRUE, Lambda = NULL)

Arguments

data.batched

A object generated by function Data.batched()

S0

Number of batches of data to be used as training dataset

vary.effects

The names of variables in the dataset assumed to have a time-varying regression effect on the outcome.

autotune

T/F indicates whether or not the automatic tuning procedure described in Jiang et al. (2021) should be applied. Default is true.

Lambda

Specify smoothing parameters if autotune=F

Value

`Lambda:`	smoothing parameters

`Smooth:`	smoothed state vector

`Smooth.var:`	covariance of smoothed state vector in Smooth.

Author(s)

Jiakun Jiang, Wei Yang and Wensheng Guo

Examples


set.seed(321)
n=8000
beta0=function(t)   0.1*t-1
beta1=function(t)  cos(2*pi*t)
beta2=function(t)  sin(2*pi*t)
alph1=alph2=1
x=matrix(runif(n*4,min=-4,max=4),nrow=n,ncol=4)
t=sort(runif(n))
coef=cbind(beta0(t),beta1(t),beta2(t),rep(alph1,n),rep(alph2,n))
covar=cbind(rep(1,n),x)
linear=apply(coef*covar,1,sum)
prob=exp(linear)/(1+exp(linear))
y=as.numeric(runif(n)<prob)
sim.data=cbind(y,x,t)
colnames(sim.data)=c("y","x1","x2","x3","x4","t")
formula = y~x1+x2+x3+x4
# Divide the time domain [0,1] into S=100 equally spaced intervals
S=100
S0=75
data.batched=Batched(formula, data=sim.data, time="t", S)

# using first 75 batches as training dataset to tune smoothing parameters
fit0=DLSSM.init(data.batched, S0, vary.effects=c("x1","x2"))
fit0$Lambda
DLSSM.plot(fit0)

Plot coefficients

Description

Plot smoothed coefficients in the training part and predicted coefficients in validation part, the two parts are divided by vertical dash line.

Usage

DLSSM.plot(fit)

Arguments

fit

fitted object

Details

If argument "fit" is an initial fitted model then only smoothed coefficients part are plotted.

Value

Figures

Author(s)

Jiakun Jiang, Wei Yang and Wensheng Guo

Dynamical prediction on validation dataset

Description

After we have fitted initial model, we can do validation. It is iteratively doing K-steps ahead prediction and model updating (filtering) when a new batch of data becomes available. The validation include K-steps ahead prediction of state vector and probabilities on validation interval.

Usage

DLSSM.valid(fit0, data.batched, K)

Arguments

fit0

Initial fitted model

data.batched

Batched dataset generated by function Batched()

K

Number of steps for ahead prediction

Details

The argument fit could be object of DLSSM or DLSSM.init.

Value

`pred.K:`	K-steps ahead predicted coefficients

`pred.var.K:`	covariance of K-steps ahead predicted coefficients

`pred.prob.K:`	K-steps ahead predicted probabilities

Author(s)

Jiakun Jiang, Wei Yang and Wensheng Guo

Examples


set.seed(321)
n=8000
beta0=function(t)   0.1*t-1
beta1=function(t)  cos(2*pi*t)
beta2=function(t)  sin(2*pi*t)
alph1=alph2=1
x=matrix(runif(n*4,min=-4,max=4),nrow=n,ncol=4)
t=sort(runif(n))
coef=cbind(beta0(t),beta1(t),beta2(t),rep(alph1,n),rep(alph2,n))
covar=cbind(rep(1,n),x)
linear=apply(coef*covar,1,sum)
prob=exp(linear)/(1+exp(linear))
y=as.numeric(runif(n)<prob)
sim.data=cbind(y,x,t)
colnames(sim.data)=c("y","x1","x2","x3","x4","t")
formula = y~x1+x2+x3+x4
# Divide the time domain [0,1] into S=100 equally spaced intervals
S=100
S0=75
data.batched=Batched(formula, data=sim.data, time="t", S)

# using first 75 batches as training dataset to tune smoothing parameters
fit0=DLSSM.init(data.batched, S0, vary.effects=c("x1","x2"))
fit0$Lambda

#After initial model fitting on training data, we move to dynamic prediction
 fit=DLSSM.valid(fit0, data.batched, K=1)
 DLSSM.plot(fit)

Dataset contains information of full comprehensive Australian automobile insurance policies between years 2004 and 2005 A dataset containing the claim and three attributes of 67,856 policies

Description

Dataset contains information of full comprehensive Australian automobile insurance policies between years 2004 and 2005 A dataset containing the claim and three attributes of 67,856 policies

Usage

car.insur

Format

A data frame with 67856 rows and 4 columns

y: Binary vaiable with 0 denote a policy with no claim, and 1 denote a claim policy.
gender: gender of deriver
age: age of deriver
exposure: period from the date of insured to the investigation, with a maximum of one year

References

De Jong P et al. (2008). “Generalized linear models for insurance data.” Cambridge Books.

Examples

data(car.insur)