Type: | Package |
Title: | Functions and Data Sets for the Book "A Progressive Introduction to Linear Models" |
Version: | 0.2 |
Author: | Joshua P. French |
Maintainer: | Joshua P. French <joshua.french@ucdenver.edu> |
Description: | Simplifies aspects of linear regression analysis, particularly simultaneous inference. Additionally, supports "A Progressive Introduction to Linear Models" by Joshua French (https://jfrench.github.io/LinearRegression/). |
License: | GPL-3 |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.2.3 |
Suggests: | testthat (≥ 3.0.0), ggplot2, knitr, rmarkdown, vdiffr |
VignetteBuilder: | knitr |
Depends: | R (≥ 3.5.0) |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2023-06-08 21:57:22 UTC; frencjos |
Repository: | CRAN |
Date/Publication: | 2023-06-08 22:32:54 UTC |
Check arguments of cooks_test
Description
Check arguments of cooks_test
Usage
arg_check_cooks_test(model, n)
Arguments
model |
A fitted model object from the
|
n |
The number of outliers to return. The default is all influential observations. |
Check arguments of dfbeta_plot
Description
Check arguments of dfbeta_plot
Usage
arg_check_dfbeta_plot(
model,
id_n,
regressors,
add_reference,
text_arglist,
abline_arglist,
extendrange_f
)
Arguments
model |
A fitted model object from the
|
id_n |
The number of points to identify with labels.
The default is |
add_reference |
A logical value indicating whether a
reference line should be added. The default is
|
text_arglist |
Additional arguments passed to the
|
abline_arglist |
A named list specifying additional
arguments passed to the |
extendrange_f |
Positive number(s) specifying the
fraction by which the range of the residuals should be
extended using the |
Check arguments of dfbetas_plot
Description
Check arguments of dfbetas_plot
Usage
arg_check_dfbetas_plot(
model,
id_n,
regressors,
add_reference,
text_arglist,
abline_arglist,
extendrange_f
)
Arguments
model |
A fitted model object from the
|
id_n |
The number of points to identify with labels.
The default is |
add_reference |
A logical value indicating whether a
reference line should be added. The default is
|
text_arglist |
Additional arguments passed to the
|
abline_arglist |
A named list specifying additional
arguments passed to the |
extendrange_f |
Positive number(s) specifying the
fraction by which the range of the residuals should be
extended using the |
Check arguments of dffits_test
Description
Check arguments of dffits_test
Usage
arg_check_dffits_test(model, n)
Arguments
model |
A fitted model object from the
|
n |
The number of outliers to return. The default is all influential observations. |
Check arguments of residual_plot.lm
Description
Check arguments of residual_plot.lm
Usage
arg_check_index_plot_lm(
model,
stat,
id_n,
add_reference,
text_arglist,
abline_arglist,
extendrange_f
)
Arguments
model |
A fitted model object from the
|
stat |
A function that can be applied to an |
id_n |
The number of points to identify with labels.
The default is |
add_reference |
A logical value indicating whether a
reference line should be added. The default is
|
text_arglist |
Additional arguments passed to the
|
abline_arglist |
A named list specifying additional
arguments passed to the |
extendrange_f |
Positive number(s) specifying the
fraction by which the range of the residuals should be
extended using the |
Check arguments of dfbetas_plot
Description
Check arguments of dfbetas_plot
Usage
arg_check_influence_plot(
model,
id_n,
add_reference,
alpha,
size,
text_arglist,
abline_arglist,
extendrange_f
)
Arguments
model |
A fitted model object from the
|
id_n |
The number of points to identify with labels
with respect to largest absolute criterion. The default
is |
add_reference |
A logical value indicating whether a
reference line should be added. The default is
|
alpha |
The default quantile used for the horizontal reference lines. The default is 0.05. See Details. |
size |
A numeric vector of length 2 that provides guidelines for the size of the points. |
text_arglist |
Additional arguments passed to the
|
abline_arglist |
A named list specifying additional
arguments passed to the |
extendrange_f |
Positive number(s) specifying the
fraction by which the range of the residuals should be
extended using the |
Check arguments of leverage_test
Description
Check arguments of leverage_test
Usage
arg_check_leverage_test(model, n, ttype, threshold)
Arguments
model |
A fitted model object from the
|
n |
The number of leverage points to return. The default is all leverage points. |
ttype |
Threshold type. The default is
|
threshold |
A number between 0 and 1. Any
observation with a leverage value above this number is
declared a leverage point. This is automatically
determined unless |
Value
A vector of statistics
Check arguments of outlier_test
Description
Check arguments of outlier_test
Usage
arg_check_outlier_test(model, n, alpha)
Arguments
model |
A fitted model object from the
|
n |
The number of outliers to return. The default is all outliers. |
alpha |
The Bonferroni-adjusted threshold at which
an outlier is identified. The default is |
Check arguments of residual_plot.lm
Description
Check arguments of residual_plot.lm
Usage
arg_check_residual_plot_lm(
rtype,
xaxis,
id_n,
smooth,
add_reference,
add_smooth,
text_arglist,
abline_arglist,
smooth_arglist,
lines_arglist,
extendrange_f
)
Arguments
rtype |
The residual type to plot. The default is
|
xaxis |
The variable to use on the x-axis of the
plot(s). The default is |
id_n |
The number of points to identify with labels.
The default is |
smooth |
A function with a
|
add_reference |
A logical value indicating whether a
reference line should be added. The default is
|
add_smooth |
A logical value indicating whether a
smooth should be added to each plot produced. The
default is |
text_arglist |
Additional arguments passed to the
|
abline_arglist |
A named list specifying additional
arguments passed to the |
smooth_arglist |
A named list specifying additional
arguments passed to the function provided in the
|
lines_arglist |
A named list specifying additional
arguments passed to the |
extendrange_f |
Positive number(s) specifying the
fraction by which the range of the residuals should be
extended using the |
Automatically determine 'mfrow'
Description
Automatically determine a reasonable choice for
for 'mfrow' in the par
function based on the number of plots 'n'.
Usage
auto_mfrow(n)
Arguments
n |
The number of plots. |
Value
A 'numeric' vector of length 2.
Plot confint_adjust
object
Description
Plot a confint_adjust
object produced by the
confint_adjust
function. The plotting
function internally calls the
autoplot
function. Note: the
ggplot2
package must be loaded (i.e.,
library(ggplot2)
or ggplot2::autoplot
must be specifically called for this function to work.
See Examples.
Usage
autoplot.confint_adjust(object, parm, ...)
Arguments
object |
An |
parm |
a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. |
... |
Not used |
Value
None.
Author(s)
Joshua French
Examples
fit <- lm(100/mpg ~ disp + hp + wt + am, data = mtcars)
# standard intervals
cia <- confint_adjust(fit)
# if ggplot2 package is available
if (require(ggplot2)) {
autoplot(cia)
# select subset of plots
autoplot(cia, parm = c("hp", "disp"))
}
Compare coefficients of 2 models
Description
Compare the coefficients to two fitted models. The models must have the same coefficients.
Usage
coef_compare(model1, model2, digits = 3, verbose = TRUE)
Arguments
model1 |
A fitted model object from the
|
model2 |
A fitted model object from the
|
digits |
A positive integer indicating how many
significant digits are to be used for numeric and
complex numbers. This argument is passed to
|
verbose |
A logical value indicating whether the
matrix should be printed. The default is |
Value
A matrix.
Examples
# fit model
lmod1 <- lm(murder ~ hs_grad + urban + poverty + single,
data = crime2009)
#fit without DC
lmod2 <- lm(murder ~ hs_grad + urban + poverty + single,
data = crime2009[-9, ])
#compare coefficients of models
coef_compare(lmod1, lmod2)
Return coefficient matrix
Description
coef_matrix
returns the coefficients
element of the summary
function, which is a
matrix with columns for the estimated coefficients, their
standard error, t-statistic and corresponding (two-sided)
p-values.
Usage
coef_matrix(object)
Arguments
object |
an object of class |
Value
A p \times 4
matrix with columns for the
estimated coefficient, its standard error, t-statistic
and corresponding (two-sided) p-value. Aliased
coefficients are omitted. The additional class
coef_matrix
is added for custom printing.
Author(s)
Joshua P. French
Examples
## a fitted model
fit <- lm(100/mpg ~ disp + hp + wt + am, data = mtcars)
coef_matrix(fit)
print(coef_matrix(fit), digits = 3)
Combine coefficients
Description
Combine coefficients and standard errors from two fitted models
Usage
combine_coefs(i, a, b, digits = NULL)
Arguments
i |
Index of coefficients |
a |
Coefficients and standard errors from Model 1 |
b |
Coefficients and standard errors from Model 2 |
digits |
A positive integer indicating how many
significant digits are to be used for numeric and
complex numbers. This argument is passed to
|
Value
A character matrix
Adjust confidence intervals for multiple comparisons
Description
A function to produce adjusted confidence intervals with a family-wise
confidence level of at least level
for
lm
objects (not applicable if no adjustment is used).
Internally, the function is a slight revision of the code
used in the confint.lm
function.
Usage
confint_adjust(object, parm, level = 0.95, method = "none")
Arguments
object |
a fitted model object. |
parm |
a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. |
level |
the confidence level required. |
method |
A character string indicating the type of
adjustment to make. The default choice is
|
Details
Let a = 1 - level
. Let p
be the number of
estimated coefficients in the fitted model. All intervals are computed
using the formula estimate +/- m * ese
, where
m
is a multiplier and ese
is the estimated
standard error of the estimate
.
method = "none"
(no correction) produces the
standard t-based confidence intervals with multiplier
qt(1 - a/2, df = object$df.residual)
.
method = "bonferroni"
produces Bonferroni-adjusted
intervals that use the multiplier m = qt(1 - a/(2 *
k), df = object$df.residual)
, where k
is the
number of intervals being produced.
method = "wh"
produces Working-Hotelling-adjusted
intervals that are valid for all linear combinations of
the regression coefficients, which uses the multiplier
m = sqrt(p * qf(level, df1 = p, df2 =
object$df.residual))
.
Value
A confint_adjust
object, which is simply a
a data.frame
with columns term
,
lwr
(the lower confidence limit), and upr
(the upper confidence limit).
References
Bonferroni, C. (1936). Teoria statistica delle classi e calcolo delle probabilita. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze, 8, 3-62.
Working, H., & Hotelling, H. (1929). Applications of the theory of error to the interpretation of trends. Journal of the American Statistical Association, 24(165A), 73-85. doi:10.1080/01621459.1929.10506274
Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2004). Applied Linear Statistical Models, 5th edition. New York: McGraw-Hill/Irwin. (p. 230)
See Also
Examples
## an extension of the documentation for confint.lm
fit <- lm(100/mpg ~ disp + hp + wt + am, data = mtcars)
# standard intervals
confint_adjust(fit)
# bonferroni-adjusted intervals
(cib <- confint_adjust(fit, method = "b"))
# plot results
plot(cib)
plot(cib, parm = c("hp", "disp"))
if (require(ggplot2)) {
autoplot(cib)
autoplot(cib, parm = c("hp", "disp"))
}
#' working-hotelling-adjusted intervals
(ciwh <- confint_adjust(fit, method = "wh"))
Index plot of Cook's distances for lm
object
Description
cooks_plot
plots the Cook's distances from
the cooks.distance
function of a fitted
lm
object.
Usage
cooks_plot(
model,
id_n = 3,
add_reference = TRUE,
...,
text_arglist = list(),
abline_arglist = list(),
extendrange_f = 0.08
)
Arguments
model |
A fitted model object from the
|
id_n |
The number of points to identify with labels.
The default is |
add_reference |
A logical value indicating whether a reference line should be added. See details. |
... |
Additional arguments passed to the
|
text_arglist |
Additional arguments passed to the
|
abline_arglist |
A named list specifying additional
arguments passed to the |
extendrange_f |
Positive number(s) specifying the
fraction by which the range of the residuals should be
extended using the |
Details
By default, a reference line is plotted at the 0.5
quantile of a F_{p, n-p}
distribution
where p =
length(stats::coef(model))
and
n - p =
stats::df.residual(model)
.
The vertical position of the reference line can be customized
by setting the h
argument of abline_arglist
.
Author(s)
Joshua French
See Also
plot
,
text
,
abline
,
cooks.distance
Examples
lmod <- lm(price ~ sqft_living, data = home_sales)
cooks_plot(lmod, id_n = 1)
Cook's statistics
Description
cooks_stats
returns the ordered Cook's statistics
(distances) decreasing in value of model
to identify the
most influential observations.
Usage
cooks_stats(model, n = 6L)
Arguments
model |
A fitted model object from the
|
n |
an integer vector of length up to |
Value
A vector of statistics
See Also
Examples
lmod <- lm(price ~ sqft_living, data = home_sales)
cooks_stats(lmod, n = 5)
Identify influential observations
Description
cooks_test
returns the observations identified
as influential based on the cooks statistics being
larger than a threshold.
The threshold for this
test is the 0.5
quantile of a F_{p, n-p}
distribution
where p =
length(stats::coef(model))
and
n - p =
stats::df.residual(model)
.
Usage
cooks_test(model, n = stats::nobs(model))
Arguments
model |
A fitted model object from the
|
n |
The number of outliers to return. The default is all influential observations. |
Value
A vector of influential observations.
See Also
Examples
lmod <- lm(price ~ sqft_living, data = home_sales)
cooks_test(lmod)
2009 Crime Data
Description
Data related to crime for the 50 U.S. states
plus the District of Columbia. The data are taken from
the crime_data
data set available in the
statsmodels
package in Python. As stated in its
documentation, "All data is for 2009 and was obtained
from the American Statistical Abstracts except as
indicated ...." The original documentation is available
at
https://www.statsmodels.org/dev/datasets/generated/statecrime.html.
The violent
variable includes murder, forcible
rape, robbery, and aggravated assault. Numbers for
Illinois and Minnesota do not include forcible rapes.
Footnote included with the American Statistical
Abstract table reads: "The data collection methodology
for the offense of forcible rape used by the Illinois
and the Minnesota state Uniform Crime Reporting (UCR)
Programs (with the exception of Rockford, Illinois, and
Minneapolis and St. Paul, Minnesota) does not comply
with national UCR guidelines. Consequently, their state
figures for forcible rape and violent crime (of which
forcible rape is a part) are not published in this
table."
The single
variable is calculated from 2009
1-year American Community Survey obtained obtained from
Census. Variable is Male householder, no wife present,
family household combined with Female householder, no
husband present, family household, divided by the total
number of Family households.
Usage
data(crime2009)
Format
A data frame with 51 observations and 7 variables:
violent
Rate of violent crimes per 100,000 persons in the population.
murder
Rate of murders per 100,000 persons in the population.
hs_grad
Percentage of the population having graduated from high school or higher.
poverty
Percentage of individuals with income below the poverty line.
white
Percentage of the population that is only considered "white" for race based on the 2009 American Community Survey.
single
Percentage of families made up of single individuals.
urban
Percentage of the population living in Urbanized Areas as of 2010 Census, where Urbanized Areas are areas of 50,000 or more people.
Source
A public domain data set available in the
statsmodels
python package. All data is for 2009
and was obtained from the American Statistical
Abstracts except as indicated.
https://www.statsmodels.org/dev/datasets/generated/statecrime.html.
dfbeta index plots
Description
dfbeta_plot
creates an index plot of the
dfbeta
statistics for each regressor.
Usage
dfbeta_plot(
model,
id_n = 3,
regressors = ~.,
add_reference = TRUE,
...,
text_arglist = list(),
abline_arglist = list(),
extendrange_f = 0.08
)
Arguments
model |
A fitted model object from the
|
id_n |
The number of points to identify with labels.
The default is |
regressors |
A formula describing the regressors for
which to plot the |
add_reference |
A logical value indicating whether a
reference line should be added. The default is
|
... |
Additional arguments passed to the
|
text_arglist |
Additional arguments passed to the
|
abline_arglist |
A named list specifying additional
arguments passed to the |
extendrange_f |
Positive number(s) specifying the
fraction by which the range of the residuals should be
extended using the |
Details
A horizontal reference line is added at +/- the
estimated standard error of each coefficient by
default if add_reference
is TRUE
.
Author(s)
Joshua French
See Also
Examples
lmod <- lm(murder ~ hs_grad + urban + poverty + single,
data = crime2009)
dfbeta_plot(lmod)
dfbeta_plot(lmod, regressors = ~ hs_grad + poverty,
id_n = 1)
dfbetas index plots
Description
dfbetas_plot
creates index plot of the
dfbetas
statistics for each regressor.
Usage
dfbetas_plot(
model,
id_n = 3,
regressors = ~.,
add_reference = TRUE,
...,
text_arglist = list(),
abline_arglist = list(),
extendrange_f = 0.08
)
Arguments
model |
A fitted model object from the
|
id_n |
The number of points to identify with labels.
The default is |
regressors |
A formula describing the regressors for
which to plot the |
add_reference |
A logical value indicating whether a
reference line should be added. The default is
|
... |
Additional arguments passed to the
|
text_arglist |
Additional arguments passed to the
|
abline_arglist |
A named list specifying additional
arguments passed to the |
extendrange_f |
Positive number(s) specifying the
fraction by which the range of the residuals should be
extended using the |
Details
A horizontal reference line is added at -1 and +1 by
default if add_reference
is TRUE
.
Author(s)
Joshua French
See Also
Examples
lmod <- lm(murder ~ hs_grad + urban + poverty + single,
data = crime2009)
dfbetas_plot(lmod)
dfbetas_plot(lmod, regressors = ~ hs_grad, id_n = 4)
Index plot of DFFITS values for lm
object
Description
dffits_plot
plots the DFFITS values from
the dffits
function of a fitted
lm
object.
Usage
dffits_plot(
model,
id_n = 3,
add_reference = TRUE,
...,
text_arglist = list(),
abline_arglist = list(),
extendrange_f = 0.08
)
Arguments
model |
A fitted model object from the
|
id_n |
The number of points to identify with labels.
The default is |
add_reference |
A logical value indicating whether a
reference line should be added. The default is
|
... |
Additional arguments passed to the
|
text_arglist |
Additional arguments passed to the
|
abline_arglist |
A named list specifying additional
arguments passed to the |
extendrange_f |
Positive number(s) specifying the
fraction by which the range of the residuals should be
extended using the |
Details
By default, a reference line is plotted at \pm 2\sqrt{p/n}
,
where p =
length(stats::coef(model))
and
n =
stats::nobs(model)
. This can be customized
by setting the h
argument of abline_arglist
.
Author(s)
Joshua French
See Also
Examples
lmod <- lm(price ~ sqft_living, data = home_sales)
dffits_plot(lmod, id_n = 6)
# customized plot
dffits_plot(lmod, id_n = 1,
text_arglist = list(col = "blue", cex = 2),
abline_arglist = list(col = "red", lwd = 2))
DFFITS statistics
Description
dffits_stats
returns the ordered DFFITS values
(decreasing in magnitude) of model
to identify the
observations with the highest DFFITS values.
Usage
dffits_stats(model, n = 6L)
Arguments
model |
A fitted model object from the
|
n |
an integer vector of length up to |
Value
A vector of statistics
See Also
Examples
lmod <- lm(price ~ sqft_living, data = home_sales)
dffits_stats(lmod, n = 5)
Identify influential observations
Description
dffits_test
returns the observations identified
as influential based on the absolute value of the DFFITS statistics being
larger than a threshold.
The threshold used is 2\sqrt{p/n}
,
where p =
length(stats::coef(model))
and
n =
stats::nobs(model)
.
Usage
dffits_test(model, n = stats::nobs(model))
Arguments
model |
A fitted model object from the
|
n |
The number of outliers to return. The default is all influential observations. |
Value
A vector of influential observations.
See Also
Examples
lmod <- lm(price ~ sqft_living, data = home_sales)
dffits_test(lmod)
Dwaine Studios data
Description
Data from the Dwaine Studios data in Applied Linear Statistical Models, 5th edition, p. 237. From the book:
Dwaine Studios, Inc., operates portrait studios in 21
cities of medium size. These studios specialize in
portraits of children. The company is considering an
expansion into other cities of medium size and wishes
to investigates where sales (sales
) in a
community can be predicted from the number of persons
aged 16 or younger in the community (targetpop
)
and the per capita disposable personal income in the
community (dpi
). Data on these variables for the
most recent year for the 21 cities in which Dwaine
Studios is now operating are included in the data set.
Usage
data(dwaine)
Format
A data frame with 21 observations and 3 variables:
targetpop
The number of persons aged 16 or younger in thousands of persons.
dpi
Per capita disposable personal income in thousands of dollars.
sales
Sales in thousands of dollars
Author(s)
Joshua P. French
References
Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2004). Applied Linear Statistical Models, 5th edition. New York: McGraw-Hill/Irwin.
Format percentages
Description
A recreation of the stats:::format.prec
function that is only available internally to the stats
package.
Usage
format_perc_api2lm(probs, digits)
Arguments
probs |
A vector of probabilities. |
digits |
a positive integer indicating how many significant digits
are to be used for
numeric and complex |
Value
A vector of percentages
See Also
Examples
format_perc_api2lm(c(0.523423, 0.9098192, 0.951289), digits = 1)
format_perc_api2lm(c(0.523423, 0.9098192, 0.951289), digits = 3)
Extract residuals from a model
Description
Extracts different types of residuals from a fitted model. The types of residuals are discussed in Details.
Usage
get_residuals(
x,
rtype = c("ordinary", "standardized", "studentized", "jackknife", "loo", "deleted",
"internally studentized", "externally studentized")
)
Arguments
x |
An |
rtype |
The desired residual type. The options are
|
Details
For observations 1, 2, \ldots, n
, let:
-
Y_i
denote the response value for thei
th observation. -
\hat{Y}_i
denote the fitted value for thei
th observation. -
h_i
denote the leverage value for thei
th observation.
We assume that \mathrm{sd}(Y_i) = \sigma
for
i \in \{1, 2, \ldots, n\}
and that \hat{\sigma}
is the estimate produced by sigma(x)
, where x
is the fitted model object.
The ordinary residual for the i
th
observation is computed as
\hat{\epsilon}_i = Y_i - \hat{Y}_i.
The variance of the i
th ordinary residual under standard
assumptions is \sigma^2(1-h_i)
.
The standardized residual for the i
th observation
is computed as
r_i = \frac{\hat{\epsilon}_i}{\hat{\sigma}\sqrt{1-h_i}}.
The standardized residual is also known as the internally studentized residual.
Let \hat{Y}_{i(i)}
denote the predicted value of
Y_i
for the model fit with all n
observations
except observation i
. The leave-one-out (LOO) residual for observation i
is
computed as
l_i = Y_i - \hat{Y}_{i(i)} = \frac{\hat{\epsilon}_i}{1-h_i}.
The LOO residual is also known as the deleted or jackknife residual.
The studentized residual for the i
th observation
is computed as
t_i = \frac{l_i}{\hat{\sigma}_{(i)}\sqrt{1-h_i}},
where \hat{\sigma}_{(i)}
is the leave-one-out estimate
of \sigma
.
The studentized residual is also known as the externally studentized residual.
Value
A vector of residals.
Examples
lmod <- lm(Girth ~ Height, data = trees)
# ordinary residuals
rord <- get_residuals(lmod)
all.equal(rord, residuals(lmod))
# standardized residuals
rstand <- get_residuals(lmod, "standardized")
all.equal(rstand, rstandard(lmod))
# studentized residuals
rstud <- get_residuals(lmod, "studentized")
all.equal(rstud, rstudent(lmod))
# loo residuals
rl <- get_residuals(lmod, "loo")
all.equal(rl, rloo(lmod))
Home sale prices in King County, WA
Description
The home_sales
data set is a data frame consisting of
216 rows and 8 columns. The data are a subset of home
sales in King County, WA made between 2014-05-02 to
2015-05-27. The variables in the data set are:
-
price
: sale price (in log10 US dollars). -
bedrooms
: number of bedrooms. -
bathrooms
: number of bathrooms. -
sqft_living
: size of living space in square feet. -
sqft_lot
: lot size in square feet. -
floors
: number of floors in home. -
waterfront
: afactor
variable with levelsno
and 'yes' that indicate whether the home has a waterfront view. -
condition
: afactor
variable indicating the condition of the house with levels ranging frompoor
tovery good
.
Value
A data.frame
.
Source
The Center for Spatial Data Science, University of Chicago. https://geodacenter.github.io/data-and-lab//KingCounty-HouseSales2015/
These data were created by selectively choosing a subset
of observations from the home_prices
data set in the
**KingCountyHomes** package.
Examples
data(home_sales)
summary(home_sales)
Get elements for dfbetas index plot
Description
Get elements for dfbetas index plot
Usage
index_plot_dfbetas_elements(model, returned_stats)
Arguments
model |
A fitted model object from the
|
returned_stats |
A vector of dfbetas statistics |
Value
A list with elements x
, y
,
and z
.
Index plot of statistics from of an lm
object
Description
index_plot_lm
creates an index plot of statistcs
from an lm
object.
Usage
index_plot_lm(
model,
stat,
id_n = 3,
add_reference = FALSE,
...,
text_arglist = list(),
abline_arglist = list(),
extendrange_f = 0.08
)
Arguments
model |
A fitted model object from the
|
stat |
A function that can be applied to an |
id_n |
The number of points to identify with labels.
The default is |
add_reference |
A logical value indicating whether a
reference line should be added. The default is
|
... |
Additional arguments passed to the
|
text_arglist |
Additional arguments passed to the
|
abline_arglist |
A named list specifying additional
arguments passed to the |
extendrange_f |
Positive number(s) specifying the
fraction by which the range of the residuals should be
extended using the |
Author(s)
Joshua French
See Also
plot
,
text
,
lm
,
rstudent
,
hatvalues
,
cooks.distance
Examples
lmod <- lm(Petal.Length ~ Sepal.Length + Species,
data = iris)
# outlier plot
# number of observations
n <- stats::nobs(lmod)
# loo residual degrees of freedom
rdf <- stats::df.residual(lmod) - 1
h <- c(-1, 1) * stats::qt(0.05/(2 * n), df = rdf)
index_plot_lm(lmod, stat = stats::rstudent,
abline_arglist = list(h = h))
# leverage plot
index_plot_lm(lmod, stat = stats::hatvalues, id_n = 1)
# Cook's distance
index_plot_lm(lmod, stat = stats::cooks.distance,
id_n = 3)
Create plot elements for index_plot_lm
Description
Create plot elements for index_plot_lm
Usage
index_plot_lm_elements(model, stat)
Arguments
model |
A fitted model object from the
|
stat |
A function that can be applied to an |
Value
A list with elements x
, y
,
and z
.
Index plot helper function
Description
Index plot helper function
Usage
index_plot_raw(
x,
y,
idd,
labels,
add_reference,
arglist,
text_arglist,
abline_arglist,
extendrange_f
)
Arguments
x |
x-values to plot |
y |
y-values to plot |
idd |
Identified observations |
labels |
The labels to use for the identified points. |
add_reference |
Logical value |
arglist |
Named list for plot |
text_arglist |
Named list for text |
abline_arglist |
Named list for abline |
extendrange_f |
Positive number(s) specifying the
fraction by which the range of the residuals should be
extended using the |
Influence plots
Description
influence_plot
creates an influence plot for a
fitted lm
object. The y-axis is either the
studentized (the default) or standardized residuals
versus the leverage values for each observation. The size
of the point associated with each observation is
proportional to the value of the Cook's distance (the
default) or the DFFITS statistic for the observation.
Details about the different types of residuals are
discussed in the get_residuals
function.
Usage
influence_plot(
model,
rtype = c("studentized", "standardized"),
criterion = c("cooks", "dffits"),
id_n = 3,
add_reference = TRUE,
alpha = 0.05,
size = c(1, 4.8),
...,
text_arglist = list(),
abline_arglist = list(),
extendrange_f = 0.08
)
Arguments
model |
A fitted model object from the
|
rtype |
The residual type to plot on the y-axis. The
default is |
criterion |
The criterion that decides the size of
the points. The default is |
id_n |
The number of points to identify with labels
with respect to largest absolute criterion. The default
is |
add_reference |
A logical value indicating whether a
reference line should be added. The default is
|
alpha |
The default quantile used for the horizontal reference lines. The default is 0.05. See Details. |
size |
A numeric vector of length 2 that provides guidelines for the size of the points. |
... |
Additional arguments passed to the
|
text_arglist |
Additional arguments passed to the
|
abline_arglist |
A named list specifying additional
arguments passed to the |
extendrange_f |
Positive number(s) specifying the
fraction by which the range of the residuals should be
extended using the |
Details
The range of the criterion
statistic is mapped to
cex_pt = size[2]^2 - size[1]^2
and then the size
of the points is sqrt(cex_pt)
.
If add_reference
is TRUE
, then horizontal
reference lines are added at the \alpha/2
and
1-\alpha/2
quantiles of a t distribution with
degrees of freedom given by
stats::df.residual(model)
.
If add_reference
is TRUE
, then vertical
reference lines are added at 2p/n
and 0.5
where p=
length(stats::coef(model))
and
n=
stats::nobs(model)
.
The vertical position of the reference lines can be
customized by setting the h
argument of
abline_arglist
. The horizontal position of the
reference lines can be customized by setting the v
argument of abline_arglist
.
Author(s)
Joshua French
See Also
plot
,
text
,
abline
,
rstandard
,
rstudent
,
hatvalues
cooks.distance
,
dffits
Examples
lmod <- lm(murder ~ hs_grad + urban + poverty + single,
data = crime2009)
# studentized residuals vs leverage
influence_plot(lmod, id_n = 3)
# standardized residuals vs leverage
influence_plot(lmod, rtype = "stan")
# similar plot from plot.lm
plot(lmod, which = 5)
Influence statistics
Description
influence_stats
returns a data frame with
influence-related statistics ordered from the largest
to smallest magnitude of the criterion
.
Usage
influence_stats(
model,
n = 6L,
rtype = c("studentized", "standardized"),
criterion = c("cooks", "dffits")
)
Arguments
model |
A fitted model object from the
|
n |
an integer vector of length up to |
rtype |
The residual type to plot on the y-axis. The
default is |
criterion |
The criterion that decides the size of
the points. The default is |
Value
A data frame of influence-related statistics.
Author(s)
Joshua French
See Also
rstandard
,
rstudent
,
hatvalues
cooks.distance
,
dffits
Examples
lmod <- lm(murder ~ hs_grad + urban + poverty + single,
data = crime2009)
influence_stats(lmod, n = 3)
influence_stats(lmod, rtype = "stan", crit = "df")
Index plot of leverage values for lm
object
Description
leverage_plot
plots the leverage (hat) values from
the hatvalues
function of a fitted
lm
object.
Usage
leverage_plot(
model,
id_n = 3,
add_reference = TRUE,
ttype = "half",
threshold = NULL,
...,
text_arglist = list(),
abline_arglist = list(),
extendrange_f = 0.08
)
Arguments
model |
A fitted model object from the
|
id_n |
The number of points to identify with labels.
The default is |
add_reference |
A logical value indicating whether a
reference line should be added. The default is
|
ttype |
Threshold type. The default is
|
threshold |
A number between 0 and 1. Any
observation with a leverage value above this number is
declared a leverage point. This is automatically
determined unless |
... |
Additional arguments passed to the
|
text_arglist |
Additional arguments passed to the
|
abline_arglist |
A named list specifying additional
arguments passed to the |
extendrange_f |
Positive number(s) specifying the
fraction by which the range of the residuals should be
extended using the |
Details
If ttype = "half"
, the threshold is 0.5
.
If ttype = "2mean"
, the threshold is 2p/n
,
where p =
length(stats::coef(model))
and
n =
stats::nobs(model)
, which is double the
mean leverage value.
If ttype = "custom"
then the user must manually
specify threshold
.
Author(s)
Joshua French
See Also
Examples
lmod <- lm(price ~ sqft_living, data = home_sales)
# reference line not visible on plot because all
# leverage values are less than 0.5
leverage_plot(lmod, id_n = 2)
# different reference line
leverage_plot(lmod, id_n = 6, ttype = "2mean")
# custom reference line
leverage_plot(lmod, id_n = 2, ttype = "custom",
threshold = 0.15)
Leverage statistics
Description
leverage_stats
returns the ordered
leverage values (decreasing) of
model
to
identify the highest leverage observations.
Usage
leverage_stats(model, n = 6L)
Arguments
model |
A fitted model object from the
|
n |
an integer vector of length up to |
Value
A vector of statistics
Examples
lmod <- lm(price ~ sqft_living, data = home_sales)
leverage_stats(lmod, n = 4)
Identify leverage points
Description
leverage_test
returns the observations identified
as a leverage point based on a threshold.
Usage
leverage_test(model, n = stats::nobs(model), ttype = "half", threshold = NULL)
Arguments
model |
A fitted model object from the
|
n |
The number of leverage points to return. The default is all leverage points. |
ttype |
Threshold type. The default is
|
threshold |
A number between 0 and 1. Any
observation with a leverage value above this number is
declared a leverage point. This is automatically
determined unless |
Details
If ttype = "half"
, the threshold is 0.5
.
If ttype = "2mean"
, the threshold is 2p/n
,
where p =
length(stats::coef(model))
and
n =
stats::nobs(model)
, which is double the
mean leverage value.
If ttype = "custom"
then the user must manually
specify threshold
.
Value
A vector of statistics
See Also
Examples
lmod <- lm(price ~ sqft_living, data = home_sales)
# comparison of results using different threshold types
leverage_test(lmod)
leverage_test(lmod, ttype = "2mean", n = 7)
leverage_test(lmod, ttype = "custom", threshold = 0.1)
Index plot of studentized residuals for lm
object
Description
outlier_plot
plots the studentized residuals (from
the rstudent
function) of a fitted
lm
object.
Usage
outlier_plot(
model,
id_n = 3,
add_reference = TRUE,
alpha = 0.05,
...,
text_arglist = list(),
abline_arglist = list(),
extendrange_f = 0.08
)
Arguments
model |
A fitted model object from the
|
id_n |
The number of points to identify with labels.
The default is |
add_reference |
A logical value indicating whether a
reference line should be added. The default is
|
alpha |
The default lower quantile used for the reference line prior to the Bonferroni adjustment. The default is 0.05. |
... |
Additional arguments passed to the
|
text_arglist |
Additional arguments passed to the
|
abline_arglist |
A named list specifying additional
arguments passed to the |
extendrange_f |
Positive number(s) specifying the
fraction by which the range of the residuals should be
extended using the |
Details
If add_reference = TRUE
, then reference lines are
provided for the \alpha/(2n
and 1-\alpha/(2n)
quantiles of a Student's t
distribution with
(df.residual(lmod) - 1
) degrees of freedom, which
are the standard quantiles used to identify outliers for
a fitted model.
The vertical position of the reference line can be
customized by setting the h
argument of
abline_arglist
.
Author(s)
Joshua French
See Also
Examples
lmod <- lm(price ~ sqft_living, data = home_sales)
outlier_plot(lmod, id_n = 1)
Outlier statistics
Description
outlier_stats
returns the ordered
studentized residuals (decreasing based on magnitude) of
model
to
Identify the most unusual observations.
Usage
outlier_stats(model, n = 6L)
Arguments
model |
A fitted model object from the
|
n |
an integer vector of length up to |
Value
A vector of statistics.
Examples
lmod <- lm(price ~ sqft_living, data = home_sales)
outlier_stats(lmod, n = 3)
Identify outliers
Description
outlier_test
returns the observations identified
as an outlier based on the Bonferroni correction for a
studentized residuals.
Usage
outlier_test(model, n = stats::nobs(model), alpha = 0.05)
Arguments
model |
A fitted model object from the
|
n |
The number of outliers to return. The default is all outliers. |
alpha |
The Bonferroni-adjusted threshold at which
an outlier is identified. The default is |
Value
A data frame with the outliers.
See Also
Examples
lmod <- lm(price ~ sqft_living, data = home_sales)
outlier_test(lmod)
outlier_test(lmod, alpha = 1, n = 7)
lmod2 <- lm(Petal.Length ~ Sepal.Length + Species, iris)
outlier_test(lmod2)
Plot confint_adjust
x
Description
Plot a confint_adjust
x produced by the
confint_adjust
function. See
Examples.
Usage
## S3 method for class 'confint_adjust'
plot(x, parm, mar = c(5.1, 7.1, 4.1, 2.1), line = mar[2] - 1, ...)
Arguments
x |
An |
parm |
a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. |
mar |
A numerical vector of the form c(bottom, left, top, right) which gives the number of lines of margin to be specified on the four sides of the plot. The default is c(5, 7, 4, 2) + 0.1. |
line |
The MARgin line, starting at 0 counting
outwards, to draw the y-axis label. The default is 1
unit less than |
... |
Additional arguments passed to |
Details
The plot
function doesn't automatically adjust the
margins to account for the label names. If you need more
space for your labels, then increase the second element
of mar
from 7.1 upward and line upward.
Alternatively, if you need less space, then you can
decrease both of these values. Or you could use the
autoplot
function that automatically controls the
spacing.
Value
None.
Author(s)
Joshua P. French
Examples
fit <- lm(100/mpg ~ disp + hp + wt + am, data = mtcars)
# standard intervals
cia <- confint_adjust(fit)
plot(cia)
# plot subset of intervals
plot(cia, parm = c("hp", "disp"))
# adjust margin and line for better formatting
plot(cia, parm = 2:3, mar = c(5.1, 4.1, 4.1, 2.1))
Adjust prediction intervals for multiple comparisons
Description
A function to produce adjusted confidence/prediction
intervals for predicted mean/new responses with a
family-wise confidence level of at least level
for
lm
objects (not applicable if no adjustment is
used). Internally, the function is a slight revision of
the code used in the predict.lm
function.
Usage
predict_adjust(
object,
newdata,
se.fit = FALSE,
scale = NULL,
df = Inf,
interval = c("none", "confidence", "prediction"),
level = 0.95,
type = c("response", "terms"),
method = "none",
terms = NULL,
na.action = stats::na.pass,
pred.var = res.var/weights,
weights = 1,
...
)
Arguments
object |
Object of class inheriting from |
newdata |
An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used. |
se.fit |
A switch indicating if standard errors are required. |
scale |
Scale parameter for std.err. calculation. |
df |
Degrees of freedom for scale. |
interval |
Type of interval calculation. Can be abbreviated. |
level |
Tolerance/confidence level. |
type |
Type of prediction (response or model term). Can be abbreviated. |
method |
A character string indicating the type of
adjustment to make. The default choice is
|
terms |
If |
na.action |
function determining what should be done with missing
values in |
pred.var |
the variance(s) for future observations to be assumed for prediction intervals. See ‘Details’. |
weights |
variance weights for prediction. This can be a numeric
vector or a one-sided model formula. In the latter case, it is
interpreted as an expression evaluated in |
... |
further arguments passed to or from other methods. |
Details
Let a = 1 - level
. All intervals are computed
using the formula prediction +/- m * epesd
, where
m
is a multiplier and epesd
is the
estimated standard deviation of the prediction error of
the estimate
.
method = "none"
(no correction) produces the
standard t-based confidence intervals with multiplier
stats::qt(1 - a/2, df = object$df.residual)
.
method = "bonferroni"
produces Bonferroni-adjusted
intervals that use the multiplier m = stats::qt(1 -
a/(2 * k), df = object$df.residual)
, where k
is
the number of intervals being produced.
The Working-Hotelling and Scheffe adjustments are distinct; the Working-Hotelling typically is related to a multiple comparisons adjustment for confidence intervals of the response mean while the Scheffe adjustment is typically related to a multiple comparisons adjustment for prediction intervals for a new response. However, references often uses these names interchangeably, so we use them equivalently in this function.
method = "wh"
(Working-Hotelling) or
method = "scheffe"
and interval =
"confidence"
produces Working-Hotelling-adjusted intervals that
use the multiplier m = sqrt(p * stats::qf(level,
df1 = p, df2 = object$df.residual))
, where p
is
the number of estimated coefficients in the model.
method = "wh"
(Working-Hotelling) or
method = "scheffe"
and interval =
"prediction"
produces Scheffe-adjusted intervals that
use the multiplier m = sqrt(k * stats::qf(level,
df1 = k, df2 = object$df.residual))
, where k
is
the number of intervals being produced.
Value
predict_adjust
produces:
A vector of predictions if interval = "none"
.
A matrix of predictions and bounds with
column names fit
, lwr
, and upr
if
interval
is set. For type = "terms"
this is
a matrix with a column per term and may have an attribute
"constant"
.
If se.fit
is TRUE
, a
list with the following components is returned:
fit
: vector or matrix as abovese.fit
: standard error of predicted meansresidual.scale
: residual standard deviationsdf
: degrees of freedom for residual
References
Bonferroni, C. (1936). Teoria statistica delle classi e calcolo delle probabilita. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze, 8, 3-62.
Working, H., & Hotelling, H. (1929). Applications of the theory of error to the interpretation of trends. Journal of the American Statistical Association, 24(165A), 73-85. doi:10.1080/01621459.1929.10506274
Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2004). Applied Linear Statistical Models, 5th edition. New York: McGraw-Hill/Irwin.
See Also
Examples
fit <- lm(100/mpg ~ disp + hp + wt + am, data = mtcars)
newdata <- as.data.frame(rbind(
apply(mtcars, 2, mean),
apply(mtcars, 2, median)))
predict_adjust(fit, newdata = newdata,
interval = "confidence",
method = "none")
predict_adjust(fit, newdata = newdata,
interval = "confidence",
method = "bonferroni")
predict_adjust(fit, newdata = newdata,
interval = "confidence",
method = "wh")
predict_adjust(fit, newdata = newdata,
interval = "prediction",
method = "scheffe")
Print an object of class coef_matrix
produced
by the coef_matrix
function.
Description
Print an object of class coef_matrix
produced
by the coef_matrix
function.
Usage
## S3 method for class 'coef_matrix'
print(x, digits = 2, ...)
Arguments
x |
An |
digits |
the minimum number of significant digits to be used: see
|
... |
Additional arguments to the
|
Value
A p \times 4
matrix with columns for the
estimated coefficient, its standard error, t-statistic
and corresponding (two-sided) p-value.
Author(s)
Joshua French
Examples
fit <- lm(100/mpg ~ disp + hp + wt + am, data = mtcars)
(coefm <- coef_matrix(fit))
# print more digits
print(coefm, digits = 8)
Print confint_adjust
object
Description
Print an object of class confint_adjust
produced
by the confint_adjust
function.
Usage
## S3 method for class 'confint_adjust'
print(x, ...)
Arguments
x |
An |
... |
Additional arguments to the
|
Value
A data.frame
with columns term
,
lwr
, and upr
, which are the coefficients
for which inference is being made, and the lower and
upper bounds of the confidence intervals for each
coefficient, respectively.
Author(s)
Joshua French
Examples
fit <- lm(100/mpg ~ disp + hp + wt + am, data = mtcars)
(cia <- confint_adjust(fit))
print(cia, digits = 3)
Print predict_adjust
object
Description
Print an object of class predict_adjust
produced
by the predict_adjust
function.
Usage
## S3 method for class 'predict_adjust'
print(x, ...)
Arguments
x |
An |
... |
Additional arguments to the
|
Value
Depending on the interval
argument of
predict_adjust
:
A vector of predictions if interval = "none"
.
A matrix of predictions and bounds with column names
fit
, lwr
, and upr
if
interval
is set. For type = "terms"
this
is a matrix with a column per term and may have an
attribute "constant"
.
If se.fit
is TRUE
, a
list with the following components is returned:
fit
: vector or matrix as abovese.fit
: standard error of predicted meansresidual.scale
: residual standard deviationsdf
: degrees of freedom for residual
Author(s)
Joshua French
Examples
fit <- lm(100/mpg ~ disp + hp + wt + am, data = mtcars)
(cia <- predict_adjust(fit))
print(cia, digits = 3)
Plot residuals of a fitted model
Description
residual_plot
plots the residuals of a fitted model.
Usage
residual_plot(model, ...)
Arguments
model |
A fitted model. |
... |
Currently unimplemented. |
Author(s)
Joshua French
Examples
lmod <- lm(Girth ~ Height, data = trees)
residual_plot(lmod)
Plot residuals of a fitted lm
object
Description
residual_plot.lm
plots the residuals of a fitted
lm
object. In general, it is intended to provide
similar functionality to plot.lm
when which = 1
, but can be used for different
types of residuals and can also plot first-order
predictor variables along the x-axis instead of only the
fitted values.
Details about the different types of
residuals are discussed in the
get_residuals
function.
Usage
## S3 method for class 'lm'
residual_plot(
model,
rtype = c("ordinary", "standardized", "studentized", "loo", "jackknife", "deleted",
"internally studentized", "externally studentized"),
xaxis = "fitted",
id_n = 3,
predictors = ~.,
smooth = stats::loess,
add_reference = TRUE,
add_smooth = TRUE,
...,
text_arglist = list(),
abline_arglist = list(),
smooth_arglist = list(),
lines_arglist = list(),
extendrange_f = 0.08
)
Arguments
model |
A fitted model object from the
|
rtype |
The residual type to plot. The default is
|
xaxis |
The variable to use on the x-axis of the
plot(s). The default is |
id_n |
The number of points to identify with labels.
The default is |
predictors |
A formula describing the first-order predictors to plot the residuals against. The default is all available first-order predictors. |
smooth |
A function with a
|
add_reference |
A logical value indicating whether a
reference line should be added. The default is
|
add_smooth |
A logical value indicating whether a
smooth should be added to each plot produced. The
default is |
... |
Additional arguments passed to the
|
text_arglist |
Additional arguments passed to the
|
abline_arglist |
A named list specifying additional
arguments passed to the |
smooth_arglist |
A named list specifying additional
arguments passed to the function provided in the
|
lines_arglist |
A named list specifying additional
arguments passed to the |
extendrange_f |
Positive number(s) specifying the
fraction by which the range of the residuals should be
extended using the |
Author(s)
Joshua French
See Also
plot
,
text
,
abline
,
lines
loess
.
Examples
lmod <- lm(Petal.Length ~ Sepal.Length + Species,
data = iris)
# similarity with built-in plot.lm functionality
residual_plot(lmod)
plot(lmod, which = 1)
# residual plot for other residual types
residual_plot(lmod, rtype = "standardized", id_n = 0)
# another residual plot with several customizations
residual_plot(lmod,
text_arglist = list(col = "blue", cex = 2),
abline_arglist = list(lwd = 2, lty = 2,
col = "brown"),
lines_arglist = list(col = "purple"),
)
# residual plot for predictors
residual_plot(lmod, xaxis = "pred", id_n = 2)
# residual plot for individual predictors
residual_plot(lmod, xaxis = "pred",
predictors = ~ Sepal.Length, id_n = 2)
residual_plot(lmod, xaxis = "pred",
predictors = ~ Species,)
Compute leave-one-out residuals
Description
rloo
computes the leave-one-out residuals of
model
.
rjacknife
and rdeleted
are
aliases for rloo
.
Usage
rloo(model, ...)
rdeleted(model, ...)
rjackknife(model, ...)
Arguments
model |
a fitted model object from the |
... |
Currently unimplemented |
Author(s)
Joshua French
See Also
Examples
lmod <- lm(Girth ~ Height, data = trees)
rloo(lmod)
Compute leave-one-out residuals for 'lm' objects.
Description
rloo.lm
computes the leave-one-out residuals of
the lm
object stored in model
.
rjackknife.lm
and rdeleted.lm
are aliases
for rloo.lm
.
Usage
## S3 method for class 'lm'
rloo(
model,
infl = stats::lm.influence(model, do.coef = FALSE),
res = infl$wt.res,
...
)
## S3 method for class 'lm'
rdeleted(
model,
infl = stats::lm.influence(model, do.coef = FALSE),
res = infl$wt.res,
...
)
## S3 method for class 'lm'
rjackknife(
model,
infl = stats::lm.influence(model, do.coef = FALSE),
res = infl$wt.res,
...
)
Arguments
model |
a fitted model object from the |
infl |
influence structure as returned by |
res |
(possibly weighted) residuals, with proper default. |
... |
Currently unimplemented |
Details
Let \hat{\epsilon}_i
denote the residual of the
i
th observation and h_i
denote the leverage
value of the i
th observation The leave-one-out residual for observation i
is
computed as
l_i = \frac{\hat{\epsilon}_i}{1-h_i}.
Author(s)
Joshua French
Examples
lmod <- lm(Girth ~ Height, data = trees)
rloo(lmod)
Helper function for residuals_plot.lm
Description
Helper function for residuals_plot.lm
Usage
rplot_raw(
x,
y,
idd,
labels,
xlab,
smooth,
add_reference,
add_smooth,
arglist,
text_arglist,
abline_arglist,
smooth_arglist,
lines_arglist,
extendrange_f
)
Arguments
x |
x-values to plot |
y |
y-values to plot |
idd |
Identified observations |
labels |
The labels to use for the identified points. |
xlab |
The x-axis label |
smooth |
The function for smoothing. Needs a 'formula' argument. |
add_reference |
Logical value |
add_smooth |
Logical value |
arglist |
Named list for plot |
text_arglist |
Named list for text |
abline_arglist |
Named list for abline |
smooth_arglist |
Named list for smooth |
lines_arglist |
Named list for lines of smooth |
extendrange_f |
Positive number(s) specifying the
fraction by which the range of the residuals should be
extended using the |
Set default values of abline_arglist
Description
Set the default values of abline_arglist if not specified by the user.
Usage
set_abline_arglist(abline_arglist)
Arguments
abline_arglist |
A named list. |
Value
A named list.
Set default values of lines_arglist
Description
Set the default values of lines_arglist if not specified by the user.
Usage
set_lines_arglist(lines_arglist)
Arguments
lines_arglist |
A named list. |
Value
A named list.
Set the default values of text_arglist if not specified by the user.
Description
Set the default values of text_arglist if not specified by the user.
Usage
set_text_arglist(text_arglist, x, y, labels, idd)
Arguments
text_arglist |
A named list. |
x |
The x-values of the labels. |
y |
The y-values of the labels. |
labels |
The labels to plot. |
Value
A named list.
Spread-level plot of a fitted model
Description
sl_plot
creates a spread-level plot
for a fitted model.
Usage
sl_plot(model, ...)
Arguments
model |
A fitted model. |
... |
Currently unimplemented. |
Author(s)
Joshua French
Examples
lmod <- lm(Girth ~ Height, data = trees)
sl_plot(lmod)
Spread-level plot for lm
object
Description
sl_plot.lm
plots a spread-level plot of a fitted
lm
object. In general, it is intended to provide
similar functionality to plot.lm
when which = 3
, but can be used for different
types of residuals and can also plot first-order
predictor variables along the x-axis instead of only the
fitted values.
Details about the different types of
residuals are discussed in the
get_residuals
function.
Usage
## S3 method for class 'lm'
sl_plot(
model,
rtype = c("standardized", "studentized", "internally studentized",
"externally studentized"),
xaxis = "fitted",
id_n = 3,
predictors = ~.,
smooth = stats::loess,
add_smooth = TRUE,
...,
text_arglist = list(),
smooth_arglist = list(),
lines_arglist = list()
)
Arguments
model |
A fitted model object from the
|
rtype |
The residual type to plot. The default is
|
xaxis |
The variable to use on the x-axis of the
plot(s). The default is |
id_n |
The number of points to identify with labels.
The default is |
predictors |
A formula describing the first-order predictors to plot the residuals against. The default is all available first-order predictors. |
smooth |
A function with a
|
add_smooth |
A logical value indicating whether a
smooth should be added to each plot produced. The
default is |
... |
Additional arguments passed to the
|
text_arglist |
Additional arguments passed to the
|
smooth_arglist |
A named list specifying additional
arguments passed to the function provided in the
|
lines_arglist |
A named list specifying additional
arguments passed to the |
Author(s)
Joshua French
See Also
Examples
lmod <- lm(Petal.Length ~ Sepal.Length + Species,
data = iris)
# similarity with built-in plot.lm functionality
sl_plot(lmod)
plot(lmod, which = 3)
# spread-level plot for other residual types
sl_plot(lmod, rtype = "studentized", id_n = 0)
# spread-level plot for predictors
sl_plot(lmod, xaxis = "pred", id_n = 2)
# spread-level plot for individual predictors
sl_plot(lmod, xaxis = "pred",
predictors = ~ Sepal.Length,
id_n = 2)
Toluca Company data
Description
Toluca Company data in Applied Linear Statistical Models, 5th edition, p. 19. From the book:
The Toluca Company manufactures refrigeration equipment
as well as many replacement parts. In the past, one of
the replacement parts has been produced periodically in
lots of varying sizes. When a cost improvement program
was undertaken, company officials wished to determine
the optimum lot size for producing this part. The
production of this part involves setting up the
production process (which must be done no matter what
is the lot size) and machining and assembly operations.
One key input for the model to ascertain the optimum
lot size was the relationship between lot size and
labor hours required to produce the lot. To determine
this relationship, data on lot size (lot_size
)
and work hours (work_hours
) for 25 recent
production runs were utilized.
Usage
data(toluca)
Format
A data frame with 25 observations and 2 variables:
lot_size
Number of replacement parts produced in the lot.
work_hours
Number of hours of work required to produce the lot.
Author(s)
Joshua P. French
References
Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2004). Applied Linear Statistical Models, 5th edition. New York: McGraw-Hill/Irwin.