Type: | Package |
Title: | Accurate Generalized Linear Model |
Version: | 0.4.1 |
Description: | Provides functions to fit Accurate Generalized Linear Model (AGLM) models, visualize them, and predict for new data. AGLM is defined as a regularized GLM which applies a sort of feature transformations using a discretization of numerical features and specific coding methodologies of dummy variables. For more information on AGLM, see Suguru Fujita, Toyoto Tanaka, Kenji Kondo and Hirokazu Iwasawa (2020) https://www.institutdesactuaires.com/global/gene/link.php?doc_id=16273&fg=1. |
URL: | https://github.com/kkondo1981/aglm |
BugReports: | https://github.com/kkondo1981/aglm/issues |
License: | GPL-2 |
Encoding: | UTF-8 |
Language: | en-US |
RoxygenNote: | 7.3.2 |
Depends: | R (≥ 4.0.0), |
Imports: | glmnet (≥ 4.0.2), assertthat, methods, mathjaxr |
Suggests: | testthat, knitr, rmarkdown, MASS, faraway |
RdMacros: | mathjaxr |
NeedsCompilation: | no |
Packaged: | 2025-05-11 15:57:15 UTC; kkondo |
Author: | Kenji Kondo [aut, cre, cph], Kazuhisa Takahashi [ctb], Hikari Banno [ctb] |
Maintainer: | Kenji Kondo <kkondo.odnokk@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-05-12 07:30:02 UTC |
aglm: Accurate Generalized Linear Model
Description
Provides functions to fit Accurate Generalized Linear Model (AGLM) models, visualize them, and predict for new data. AGLM is defined as a regularized GLM which applies a sort of feature transformations using a discretization of numerical features and specific coding methodologies of dummy variables. For more information on AGLM, see Suguru Fujita, Toyoto Tanaka, Kenji Kondo and Hirokazu Iwasawa (2020).
Details
The collection of functions provided by the aglm
package has almost the same structure as the famous glmnet
package,
so users familiar with the glmnet
package will be able to handle it easily.
In fact, this structure is reasonable in implementation, because what the aglm
package does is
applying appropriate transformations to the given data and passing it to the glmnet
package as a backend.
Fitting functions
The aglm
package provides three different fitting functions, depending on how users want to handle hyper-parameters of AGLM models.
Because AGLM is based on regularized GLM, the regularization term of the loss function can be expressed as follows:
\[
R(\lbrace \beta_{jk} \rbrace; \lambda, \alpha)
= \lambda \left\lbrace
(1 - \alpha)\sum_{j=1}^{p} \sum_{k=1}^{m_j}|\beta_{jk}|^2 + \alpha \sum_{j=1}^{p} \sum_{k=1}^{m_j} |\beta_{jk}|
\right\rbrace,
\]
where \beta_jk
is the k-th coefficient of auxiliary variables for the j-th column in data,
\alpha
is a weight which controls how L1 and L2 regularization terms are mixed,
and \lambda
determines the strength of the regularization.
Searching hyper-parameters \alpha
and \lambda
is often useful to get better results, but usually time-consuming.
That's why the aglm
package provides three fitting functions with different strategies for specifying hyper-parameters as follows:
-
aglm: A basic fitting function with given
\alpha
and\lambda
(s). -
cv.aglm: A fitting function with given
\alpha
and cross-validation for\lambda
. -
cva.aglm: A fitting function with cross-validation for both
\alpha
and\lambda
.
Generally speaking, setting an appropriate \lambda
is often important to get meaningful results,
and using cv.aglm()
with default \alpha=1
(LASSO) is usually enough.
Since cva.aglm()
is much time-consuming than cv.aglm()
, it is better to use it only if particularly better results are needed.
The following S4 classes are defined to store results of the fitting functions.
-
AccurateGLM-class: A class for results of
aglm()
andcv.aglm()
-
CVA_AccurateGLM-class: A class for results of
cva.aglm()
Using the fitted model
Users can use models obtained from fitting functions in various ways, by passing them to following functions:
-
predict: Make predictions for new data
-
plot: Plot contribution of each variable and residuals
-
print: Display textual information of the model
-
coef: Get coefficients
-
deviance: Get deviance
-
residuals: Get residuals of various types
We emphasize that plot()
is particularly useful to understand the fitted model,
because it presents a visual representation of how variables in the original data are used by the model.
Other functions
The following functions are basically for internal use, but exported as utility functions for convenience.
Functions for creating feature vectors
Functions for binning
Author(s)
Kenji Kondo,
Kazuhisa Takahashi and Hikari Banno (worked on L-Variable related features)
References
Suguru Fujita, Toyoto Tanaka, Kenji Kondo and Hirokazu Iwasawa. (2020)
AGLM: A Hybrid Modeling Method of GLM and Data Science Techniques,
https://www.institutdesactuaires.com/global/gene/link.php?doc_id=16273&fg=1
Actuarial Colloquium Paris 2020
See Also
Useful links:
S4 class for input
Description
S4 class for input
Slots
vars_info
A list, each of whose element is information of one variable.
data
The original data.
Class for results of aglm()
and cv.aglm()
Description
Class for results of aglm()
and cv.aglm()
Slots
backend_models
The fitted backend
glmnet
model is stored.vars_info
A list, each of whose element is information of one variable.
lambda
Same as in the result of cv.glmnet.
cvm
Same as in the result of cv.glmnet.
cvsd
Same as in the result of cv.glmnet.
cvup
Same as in the result of cv.glmnet.
cvlo
Same as in the result of cv.glmnet.
nzero
Same as in the result of cv.glmnet.
name
Same as in the result of cv.glmnet.
lambda.min
Same as in the result of cv.glmnet.
lambda.1se
Same as in the result of cv.glmnet.
fit.preval
Same as in the result of cv.glmnet.
foldid
Same as in the result of cv.glmnet.
call
An object of class
call
, corresponding to the function call when thisAccurateGLM
object is created.
Author(s)
Kenji Kondo
Class for results of cva.aglm()
Description
Class for results of cva.aglm()
Slots
models_list
A list consists of
cv.glmnet()
's results for all\alpha
values.alpha
Same as in cv.aglm.
nfolds
Same as in cv.aglm.
alpha.min.index
The index of
alpha.min
in the vectoralpha
.alpha.min
The
\alpha
value achieving the minimum loss among all the values ofalpha
.lambda.min
The
\lambda
value achieving the minimum loss when\alpha
is equal toalpha.min
.call
An object of class
call
, corresponding to the function call when thisCVA_AccurateGLM
object is created.
Author(s)
Kenji Kondo
Fit an AGLM model with no cross-validation
Description
A basic fitting function with given \alpha
and \lambda
(s).
See aglm-package for more details on \alpha
and \lambda
.
Usage
aglm(
x,
y,
qualitative_vars_UD_only = NULL,
qualitative_vars_both = NULL,
qualitative_vars_OD_only = NULL,
quantitative_vars = NULL,
use_LVar = FALSE,
extrapolation = "default",
add_linear_columns = TRUE,
add_OD_columns_of_qualitatives = TRUE,
add_interaction_columns = FALSE,
OD_type_of_quantitatives = "C",
nbin.max = NULL,
bins_list = NULL,
bins_names = NULL,
family = c("gaussian", "binomial", "poisson"),
...
)
Arguments
x |
A design matrix.
Usually a
These dummy variables are added to If you need to change the default behavior, use the following options: |
y |
A response variable. |
qualitative_vars_UD_only |
Used to change the default behavior of
|
qualitative_vars_both |
Same as |
qualitative_vars_OD_only |
Same as |
quantitative_vars |
Same as |
use_LVar |
Set to use L-variables.
By default, |
extrapolation |
Used to control values of linear combination for quantitative variables, outside where the data exists.
By default, values of a linear combination outside the data is extended based on the slope of the edges of the region where the data exists.
You can set |
add_linear_columns |
By default, for quantitative variables, |
add_OD_columns_of_qualitatives |
Set to |
add_interaction_columns |
If this parameter is set to |
OD_type_of_quantitatives |
Used to control the shape of linear combinations obtained by O-dummies for quantitative variables (deprecated). |
nbin.max |
An integer representing the maximum number of bins when |
bins_list |
Used to set custom bins for variables with O-dummies. |
bins_names |
Used to set custom bins for variables with O-dummies. |
family |
A |
... |
Other arguments are passed directly when calling |
Value
A model object fitted to the data.
Functions such as predict
and plot
can be applied to the returned object.
See AccurateGLM-class for more details.
Author(s)
Kenji Kondo,
Kazuhisa Takahashi and Hikari Banno (worked on L-Variable related features)
References
Suguru Fujita, Toyoto Tanaka, Kenji Kondo and Hirokazu Iwasawa. (2020)
AGLM: A Hybrid Modeling Method of GLM and Data Science Techniques,
https://www.institutdesactuaires.com/global/gene/link.php?doc_id=16273&fg=1
Actuarial Colloquium Paris 2020
Examples
#################### Gaussian case ####################
library(MASS) # For Boston
library(aglm)
## Read data
xy <- Boston # xy is a data.frame to be processed.
colnames(xy)[ncol(xy)] <- "y" # Let medv be the objective variable, y.
## Split data into train and test
n <- nrow(xy) # Sample size.
set.seed(2018) # For reproducibility.
test.id <- sample(n, round(n/4)) # ID numbders for test data.
test <- xy[test.id,] # test is the data.frame for testing.
train <- xy[-test.id,] # train is the data.frame for training.
x <- train[-ncol(xy)]
y <- train$y
newx <- test[-ncol(xy)]
y_true <- test$y
## Fit the model
model <- aglm(x, y) # alpha=1 (the default value)
## Predict for various alpha and lambda
lambda <- 0.1
y_pred <- predict(model, newx=newx, s=lambda)
rmse <- sqrt(mean((y_true - y_pred)^2))
cat(sprintf("RMSE for lambda=%.2f: %.5f \n\n", lambda, rmse))
lambda <- 1.0
y_pred <- predict(model, newx=newx, s=lambda)
rmse <- sqrt(mean((y_true - y_pred)^2))
cat(sprintf("RMSE for lambda=%.2f: %.5f \n\n", lambda, rmse))
alpha <- 0
model <- aglm(x, y, alpha=alpha)
lambda <- 0.1
y_pred <- predict(model, newx=newx, s=lambda)
rmse <- sqrt(mean((y_true - y_pred)^2))
cat(sprintf("RMSE for alpha=%.2f and lambda=%.2f: %.5f \n\n", alpha, lambda, rmse))
#################### Binomial case ####################
library(aglm)
library(faraway)
## Read data
xy <- nes96
## Split data into train and test
n <- nrow(xy) # Sample size.
set.seed(2018) # For reproducibility.
test.id <- sample(n, round(n/5)) # ID numbders for test data.
test <- xy[test.id,] # test is the data.frame for testing.
train <- xy[-test.id,] # train is the data.frame for training.
x <- train[, c("popul", "TVnews", "selfLR", "ClinLR", "DoleLR", "PID", "age", "educ", "income")]
y <- train$vote
newx <- test[, c("popul", "TVnews", "selfLR", "ClinLR", "DoleLR", "PID", "age", "educ", "income")]
## Fit the model
model <- aglm(x, y, family="binomial")
## Make the confusion matrix
lambda <- 0.1
y_true <- test$vote
y_pred <- levels(y_true)[as.integer(predict(model, newx, s=lambda, type="class"))]
print(table(y_true, y_pred))
#################### use_LVar and extrapolation ####################
library(MASS) # For Boston
library(aglm)
## Randomly created train and test data
set.seed(2021)
sd <- 0.2
x <- 2 * runif(1000) + 1
f <- function(x){x^3 - 6 * x^2 + 13 * x}
y <- f(x) + rnorm(1000, sd = sd)
xy <- data.frame(x=x, y=y)
x_test <- seq(0.75, 3.25, length.out=101)
y_test <- f(x_test) + rnorm(101, sd=sd)
xy_test <- data.frame(x=x_test, y=y_test)
## Plot
nbin.max <- 10
models <- c(cv.aglm(x, y, use_LVar=FALSE, extrapolation="default", nbin.max=nbin.max),
cv.aglm(x, y, use_LVar=FALSE, extrapolation="flat", nbin.max=nbin.max),
cv.aglm(x, y, use_LVar=TRUE, extrapolation="default", nbin.max=nbin.max),
cv.aglm(x, y, use_LVar=TRUE, extrapolation="flat", nbin.max=nbin.max))
titles <- c("O-Dummies with extrapolation=\"default\"",
"O-Dummies with extrapolation=\"flat\"",
"L-Variables with extrapolation=\"default\"",
"L-Variables with extrapolation=\"flat\"")
par.old <- par(mfrow=c(2, 2))
for (i in 1:4) {
model <- models[[i]]
title <- titles[[i]]
pred <- predict(model, newx=x_test, s=model@lambda.min, type="response")
plot(x_test, y_test, pch=20, col="grey", main=title)
lines(x_test, f(x_test), lty="dashed", lwd=2) # the theoretical line
lines(x_test, pred, col="blue", lwd=3) # the smoothed line by the model
}
par(par.old)
Get coefficients
Description
Get coefficients
Usage
## S3 method for class 'AccurateGLM'
coef(object, index = NULL, name = NULL, s = NULL, exact = FALSE, ...)
Arguments
object |
A model object obtained from |
index |
An integer value representing the index of variable whose coefficients are required. |
name |
A string representing the name of variable whose coefficients are required.
Note that if both |
s |
Same as in coef.glmnet. |
exact |
Same as in coef.glmnet. |
... |
Other arguments are passed directly to |
Value
If index
or name
is given, the function returns a list with the one or combination
of the following fields, consisting of coefficients related to the specified variable.
-
coef.linear
: A coefficient of the linear term. (If any) -
coef.OD
: Coefficients of O-dummies. (If any) -
coef.UD
: Coefficients of U-dummies. (If any) -
coef.LV
: Coefficients of L-variables. (If any)
If both index
and name
are not given, the function returns entire coefficients
corresponding to the internal designed matrix.
Author(s)
Kenji Kondo
Create bins (equal frequency binning)
Description
Create bins (equal frequency binning)
Usage
createEqualFreqBins(x_vec, nbin.max)
Arguments
x_vec |
A numeric vector, whose quantiles are used as breaks. |
nbin.max |
The maximum number of bins. |
Value
A numeric vector representing breaks obtained by binning.
Note that the number of bins is equal to min(nbin.max, length(x_vec))
.
Author(s)
Kenji Kondo
Create bins (equal width binning)
Description
Create bins (equal width binning)
Usage
createEqualWidthBins(left, right, nbin)
Arguments
left |
The leftmost value of the interval to be binned. |
right |
The rightmost value of the interval to be binned. |
nbin |
The number of bins. |
Value
A numeric vector representing breaks obtained by binning.
Author(s)
Kenji Kondo
Fit an AGLM model with cross-validation for \lambda
Description
A fitting function with given \alpha
and cross-validation for \lambda
.
See aglm-package for more details on \alpha
and \lambda
.
Usage
cv.aglm(
x,
y,
qualitative_vars_UD_only = NULL,
qualitative_vars_both = NULL,
qualitative_vars_OD_only = NULL,
quantitative_vars = NULL,
use_LVar = FALSE,
extrapolation = "default",
add_linear_columns = TRUE,
add_OD_columns_of_qualitatives = TRUE,
add_interaction_columns = FALSE,
OD_type_of_quantitatives = "C",
nbin.max = NULL,
bins_list = NULL,
bins_names = NULL,
family = c("gaussian", "binomial", "poisson"),
keep = FALSE,
...
)
Arguments
x |
A design matrix. See aglm for more details. |
y |
A response variable. |
qualitative_vars_UD_only |
Same as in aglm. |
qualitative_vars_both |
Same as in aglm. |
qualitative_vars_OD_only |
Same as in aglm. |
quantitative_vars |
Same as in aglm. |
use_LVar |
Same as in aglm. |
extrapolation |
Same as in aglm. |
add_linear_columns |
Same as in aglm. |
add_OD_columns_of_qualitatives |
Same as in aglm. |
add_interaction_columns |
Same as in aglm. |
OD_type_of_quantitatives |
Same as in aglm. |
nbin.max |
Same as in aglm. |
bins_list |
Same as in aglm. |
bins_names |
Same as in aglm. |
family |
Same as in aglm. |
keep |
Set to |
... |
Other arguments are passed directly when calling |
Value
A model object fitted to the data with cross-validation results.
Functions such as predict
and plot
can be applied to the returned object, same as the result of aglm()
.
See AccurateGLM-class for more details.
Author(s)
Kenji Kondo,
Kazuhisa Takahashi and Hikari Banno (worked on L-Variable related features)
References
Suguru Fujita, Toyoto Tanaka, Kenji Kondo and Hirokazu Iwasawa. (2020)
AGLM: A Hybrid Modeling Method of GLM and Data Science Techniques,
https://www.institutdesactuaires.com/global/gene/link.php?doc_id=16273&fg=1
Actuarial Colloquium Paris 2020
Examples
#################### Cross-validation for lambda ####################
library(aglm)
library(faraway)
## Read data
xy <- nes96
## Split data into train and test
n <- nrow(xy) # Sample size.
set.seed(2018) # For reproducibility.
test.id <- sample(n, round(n/5)) # ID numbders for test data.
test <- xy[test.id,] # test is the data.frame for testing.
train <- xy[-test.id,] # train is the data.frame for training.
x <- train[, c("popul", "TVnews", "selfLR", "ClinLR", "DoleLR", "PID", "age", "educ", "income")]
y <- train$vote
newx <- test[, c("popul", "TVnews", "selfLR", "ClinLR", "DoleLR", "PID", "age", "educ", "income")]
# NOTE: Codes bellow will take considerable time, so run it when you have time.
## Fit the model
model <- cv.aglm(x, y, family="binomial")
## Make the confusion matrix
lambda <- model@lambda.min
y_true <- test$vote
y_pred <- levels(y_true)[as.integer(predict(model, newx, s=lambda, type="class"))]
cat(sprintf("Confusion matrix for lambda=%.5f:\n", lambda))
print(table(y_true, y_pred))
Fit an AGLM model with cross-validation for both \alpha
and \lambda
Description
A fitting function with cross-validation for both \alpha
and \lambda
.
See aglm-package for more details on \alpha
and \lambda
.
Usage
cva.aglm(
x,
y,
alpha = seq(0, 1, len = 11)^3,
nfolds = 10,
foldid = NULL,
parallel.alpha = FALSE,
...
)
Arguments
x |
A design matrix. See aglm for more details. |
y |
A response variable. |
alpha |
A numeric vector representing |
nfolds |
An integer value representing the number of folds. |
foldid |
An integer vector with the same length as observations.
Each element should take a value from 1 to |
parallel.alpha |
(not used yet) |
... |
Other arguments are passed directly to |
Value
An object storing fitted models and information of cross-validation. See CVA_AccurateGLM-class for more details.
Author(s)
Kenji Kondo,
Kazuhisa Takahashi and Hikari Banno (worked on L-Variable related features)
References
Suguru Fujita, Toyoto Tanaka, Kenji Kondo and Hirokazu Iwasawa. (2020)
AGLM: A Hybrid Modeling Method of GLM and Data Science Techniques,
https://www.institutdesactuaires.com/global/gene/link.php?doc_id=16273&fg=1
Actuarial Colloquium Paris 2020
Examples
#################### Cross-validation for alpha and lambda ####################
library(aglm)
library(faraway)
## Read data
xy <- nes96
## Split data into train and test
n <- nrow(xy) # Sample size.
set.seed(2018) # For reproducibility.
test.id <- sample(n, round(n/5)) # ID numbders for test data.
test <- xy[test.id,] # test is the data.frame for testing.
train <- xy[-test.id,] # train is the data.frame for training.
x <- train[, c("popul", "TVnews", "selfLR", "ClinLR", "DoleLR", "PID", "age", "educ", "income")]
y <- train$vote
newx <- test[, c("popul", "TVnews", "selfLR", "ClinLR", "DoleLR", "PID", "age", "educ", "income")]
# NOTE: Codes bellow will take considerable time, so run it when you have time.
## Fit the model
cva_result <- cva.aglm(x, y, family="binomial")
alpha <- cva_result@alpha.min
lambda <- cva_result@lambda.min
mod_idx <- cva_result@alpha.min.index
model <- cva_result@models_list[[mod_idx]]
## Make the confusion matrix
y_true <- test$vote
y_pred <- levels(y_true)[as.integer(predict(model, newx, s=lambda, type="class"))]
cat(sprintf("Confusion matrix for alpha=%.5f and lambda=%.5f:\n", alpha, lambda))
print(table(y_true, y_pred))
Get deviance
Description
Get deviance
Usage
## S3 method for class 'AccurateGLM'
deviance(object, ...)
Arguments
object |
A model object obtained from |
... |
Other arguments are passed directly to |
Value
The value of deviance extracted from the object object
.
Author(s)
Kenji Kondo
Binning the data to given bins.
Description
Binning the data to given bins.
Usage
executeBinning(x_vec, breaks = NULL, nbin.max = 100, method = "freq")
Arguments
x_vec |
The data to be binned. |
breaks |
A numeric vector representing breaks of bins (If |
nbin.max |
The maximum number of bins (used only if |
method |
|
Value
A list with the following fields:
-
labels
: An integer vector with same length asx_vec
, wherelabels[i]==k
means the i-th element ofx_vec
is in the k-th bin. -
breaks
: Breaks of bins used for binning.
Author(s)
Kenji Kondo
Create L-variable matrix for one variable
Description
Create L-variable matrix for one variable
Usage
getLVarMatForOneVec(x_vec, breaks = NULL, nbin.max = 100, only_info = FALSE)
Arguments
x_vec |
A numeric vector representing original variable. |
breaks |
A numeric vector representing breaks of bins (If |
nbin.max |
The maximum number of bins (used only if |
only_info |
If |
Value
A list with the following fields:
-
breaks
: Same as input -
dummy_mat
: The created L-variable matrix (only ifonly_info=FALSE
).
Author(s)
Kenji Kondo
Create a O-dummy matrix for one variable
Description
Create a O-dummy matrix for one variable
Usage
getODummyMatForOneVec(
x_vec,
breaks = NULL,
nbin.max = 100,
only_info = FALSE,
dummy_type = NULL
)
Arguments
x_vec |
A numeric vector representing original variable. |
breaks |
A numeric vector representing breaks of bins (If |
nbin.max |
The maximum number of bins (used only if |
only_info |
If |
dummy_type |
Used to control the shape of linear combinations obtained by O-dummies for quantitative variables (deprecated). |
Value
A list with the following fields:
-
breaks
: Same as input -
dummy_mat
: The created O-dummy matrix (only ifonly_info=FALSE
).
Author(s)
Kenji Kondo
Create a U-dummy matrix for one variable
Description
Create a U-dummy matrix for one variable
Usage
getUDummyMatForOneVec(
x_vec,
levels = NULL,
drop_last = TRUE,
only_info = FALSE
)
Arguments
x_vec |
A vector representing original variable.
The class of |
levels |
A character vector representing values of |
drop_last |
If |
only_info |
If |
Value
A list with the following fields:
-
levels
: Same as input. -
drop_last
: Same as input. -
dummy_mat
: The created U-dummy matrix (only ifonly_info=FALSE
).
Author(s)
Kenji Kondo
Plot contribution of each variable and residuals
Description
Plot contribution of each variable and residuals
Usage
## S3 method for class 'AccurateGLM'
plot(
x,
vars = NULL,
verbose = TRUE,
s = NULL,
resid = FALSE,
smooth_resid = TRUE,
smooth_resid_fun = NULL,
ask = TRUE,
layout = c(2, 2),
only_plot = FALSE,
main = "",
add_rug = FALSE,
...
)
Arguments
x |
A model object obtained from |
vars |
Used to specify variables to be plotted (
|
verbose |
Set to |
s |
A numeric value specifying |
resid |
Used to display residuals in plots. This parameter may have one of the following classes:
|
smooth_resid |
Used to display smoothing lines of residuals for quantitative variables. This parameter may have one of the following classes:
|
smooth_resid_fun |
Set if users need custom smoothing functions. |
ask |
By default, |
layout |
Plotting multiple variables for each page is allowed. To achieve this, set it to a pair of integer, which indicating number of rows and columns, respectively. |
only_plot |
Set to |
main |
Used to specify the title of plotting. |
add_rug |
Set to |
... |
Other arguments are currently not used and just discarded. |
Value
No return value, called for side effects.
Author(s)
Kenji Kondo,
Kazuhisa Takahashi and Hikari Banno (worked on L-Variable related features)
References
Suguru Fujita, Toyoto Tanaka, Kenji Kondo and Hirokazu Iwasawa. (2020)
AGLM: A Hybrid Modeling Method of GLM and Data Science Techniques,
https://www.institutdesactuaires.com/global/gene/link.php?doc_id=16273&fg=1
Actuarial Colloquium Paris 2020
Examples
#################### using plot() and predict() ####################
library(MASS) # For Boston
library(aglm)
## Read data
xy <- Boston # xy is a data.frame to be processed.
colnames(xy)[ncol(xy)] <- "y" # Let medv be the objective variable, y.
## Split data into train and test
n <- nrow(xy) # Sample size.
set.seed(2018) # For reproducibility.
test.id <- sample(n, round(n/4)) # ID numbders for test data.
test <- xy[test.id,] # test is the data.frame for testing.
train <- xy[-test.id,] # train is the data.frame for training.
x <- train[-ncol(xy)]
y <- train$y
newx <- test[-ncol(xy)]
y_true <- test$y
## With the result of aglm()
model <- aglm(x, y)
lambda <- 0.1
plot(model, s=lambda, resid=TRUE, add_rug=TRUE,
verbose=FALSE, layout=c(3, 3))
y_pred <- predict(model, newx=newx, s=lambda)
plot(y_true, y_pred)
## With the result of cv.aglm()
model <- cv.aglm(x, y)
lambda <- model@lambda.min
plot(model, s=lambda, resid=TRUE, add_rug=TRUE,
verbose=FALSE, layout=c(3, 3))
y_pred <- predict(model, newx=newx, s=lambda)
plot(y_true, y_pred)
Make predictions for new data
Description
Make predictions for new data
Usage
## S3 method for class 'AccurateGLM'
predict(
object,
newx = NULL,
s = NULL,
type = c("link", "response", "coefficients", "nonzero", "class"),
exact = FALSE,
newoffset,
...
)
Arguments
object |
A model object obtained from |
newx |
A design matrix for new data.
See the description of |
s |
Same as in predict.glmnet. |
type |
Same as in predict.glmnet. |
exact |
Same as in predict.glmnet. |
newoffset |
Same as in predict.glmnet. |
... |
Other arguments are passed directly when calling |
Value
The returned object depends on type
.
See predict.glmnet for more details.
Author(s)
Kenji Kondo,
Kazuhisa Takahashi and Hikari Banno (worked on L-Variable related features)
References
Suguru Fujita, Toyoto Tanaka, Kenji Kondo and Hirokazu Iwasawa. (2020)
AGLM: A Hybrid Modeling Method of GLM and Data Science Techniques,
https://www.institutdesactuaires.com/global/gene/link.php?doc_id=16273&fg=1
Actuarial Colloquium Paris 2020
Examples
#################### using plot() and predict() ####################
library(MASS) # For Boston
library(aglm)
## Read data
xy <- Boston # xy is a data.frame to be processed.
colnames(xy)[ncol(xy)] <- "y" # Let medv be the objective variable, y.
## Split data into train and test
n <- nrow(xy) # Sample size.
set.seed(2018) # For reproducibility.
test.id <- sample(n, round(n/4)) # ID numbders for test data.
test <- xy[test.id,] # test is the data.frame for testing.
train <- xy[-test.id,] # train is the data.frame for training.
x <- train[-ncol(xy)]
y <- train$y
newx <- test[-ncol(xy)]
y_true <- test$y
## With the result of aglm()
model <- aglm(x, y)
lambda <- 0.1
plot(model, s=lambda, resid=TRUE, add_rug=TRUE,
verbose=FALSE, layout=c(3, 3))
y_pred <- predict(model, newx=newx, s=lambda)
plot(y_true, y_pred)
## With the result of cv.aglm()
model <- cv.aglm(x, y)
lambda <- model@lambda.min
plot(model, s=lambda, resid=TRUE, add_rug=TRUE,
verbose=FALSE, layout=c(3, 3))
y_pred <- predict(model, newx=newx, s=lambda)
plot(y_true, y_pred)
Display textual information of the model
Description
Display textual information of the model
Usage
## S3 method for class 'AccurateGLM'
print(x, digits = max(3, getOption("digits") - 3), ...)
Arguments
x |
A model object obtained from |
digits |
Used to control significant digits in printout. |
... |
Other arguments are passed directly to |
Value
No return value, called for side effects.
Author(s)
Kenji Kondo
Get residuals of various types
Description
Get residuals of various types
Usage
## S3 method for class 'AccurateGLM'
residuals(
object,
x = NULL,
y = NULL,
offset = NULL,
weights = NULL,
type = c("working", "pearson", "deviance"),
s = NULL,
...
)
Arguments
object |
A model object obtained from |
x |
A design matrix.
If not given, |
y |
A response variable.
If not given, |
offset |
An offset values.
If not given, |
weights |
Sample weights.
If not given, |
type |
A string representing type of deviance:
|
s |
A numeric value specifying |
... |
Other arguments are currently not used and just discarded. |
Value
A numeric vector representing calculated residuals.
Author(s)
Kenji Kondo