Title: Interpreting CoDa Regression Models
Version: 0.1.0
Description: Provides methods for interpreting CoDa (Compositional Data) regression models along the lines of "Pairwise share ratio interpretations of compositional regression models" (Dargel and Thomas-Agnan 2024) <doi:10.1016/j.csda.2024.107945>. The new methods include variation scenarios, elasticities, elasticity differences and share ratio elasticities. These tools are independent of log-ratio transformations and allow an interpretation in the original space of shares. 'CoDaImpact' is designed to be used with the 'compositions' package and its ecosystem.
License: GPL (≥ 3)
Encoding: UTF-8
RoxygenNote: 7.3.1
URL: https://github.com/LukeCe/CoDaImpact, https://lukece.github.io/CoDaImpact/
BugReports: https://github.com/LukeCe/CoDaImpact/issues
Imports: methods, compositions
Depends: R (≥ 2.10)
LazyData: true
Suggests: rmarkdown, knitr, sf, tinytest
VignetteBuilder: knitr
NeedsCompilation: no
Packaged: 2024-03-22 11:22:01 UTC; lukece
Author: Lukas Dargel ORCID iD [aut, cre], Christine Thomas-Agnan ORCID iD [aut], Rodrigue Nasr [ctb], Sijia Pan [ctb], Iban Rendo Barreiro [ctb], Shuyao Li [ctb]
Maintainer: Lukas Dargel <lukas.dargel@mailbox.org>
Repository: CRAN
Date/Publication: 2024-03-23 10:30:02 UTC

CoDaImpact: Interpreting CoDa Regression Models

Description

Provides methods for interpreting CoDa (Compositional Data) regression models along the lines of "Pairwise share ratio interpretations of compositional regression models" (Dargel and Thomas-Agnan 2024) doi:10.1016/j.csda.2024.107945. The new methods include variation scenarios, elasticities, elasticity differences and share ratio elasticities. These tools are independent of log-ratio transformations and allow an interpretation in the original space of shares. 'CoDaImpact' is designed to be used with the 'compositions' package and its ecosystem.

Author(s)

Maintainer: Lukas Dargel lukas.dargel@mailbox.org (ORCID)

Authors:

Other contributors:

See Also

Useful links:


Create a linear path in the simplex by defining a direction and a step size

Description

Create a linear path in the simplex by defining a direction and a step size

Usage

CoDa_path(
  comp_direc,
  comp_from,
  step_size = 0.01,
  n_steps = 100,
  add_opposite = FALSE,
  dir_from_start = FALSE
)

Arguments

comp_direc

A numeric vector, defining a direction in the simplex

comp_from

A numeric vector, an initial point in the simplex - defaults to a balanced composition, which represents the origin in the simplex

step_size

A numeric, indicting the step size

n_steps

A numeric, indicating the number of steps to be taking from comp_from

add_opposite

A logical, if TRUE steps in the opposite direction are also computed

dir_from_start

A logical, if TRUE the direction is calculated from the difference between comp_from and comp_direc

Details

The function is very similar to CoDa_seq(). However, of drawing a line between a starting and end point it uses only a starting point and a direction.

Value

A data.frame frame where each row corresponds to one compositional vector

Author(s)

Lukas Dargel

See Also

CoDa_seq

Examples


# three steps that go from the origin towards the defined direction
comp_direc <- c(A =.4,B = .35, C= .25)
CoDa_path(comp_direc, n_steps = 3)


# we can draw the path that is defined by this direction
comp_direc <- c(A =.4,B = .35, C= .25)
compositions::plot.acomp(CoDa_path(comp_direc,n_steps = 10))
compositions::plot.acomp(CoDa_path(comp_direc,n_steps = 100))
compositions::plot.acomp(CoDa_path(comp_direc,add_opposite = TRUE))


# using the same direction we can draw a new path that does not go through the origin
comp_direc <- c(A =.4,B = .35, C= .25)
comp_from <- c(.7,.2,.1)
compositions::plot.acomp(CoDa_path(comp_direc, comp_from,n_steps = 10))
compositions::plot.acomp(CoDa_path(comp_direc, comp_from,n_steps = 100))
compositions::plot.acomp(CoDa_path(comp_direc, comp_from,add_opposite = TRUE))


# the balanced composition does not define a direction by itself
comp_origin <- c(A = 1/3, B = 1/3, C= 1/3) # corresponds to a zero vector in real space
try(CoDa_path(comp_origin, comp_from,add_opposite = TRUE))

# with the dir_from_start option the direction is derived
# from the simplex line connecting two compositions
path_origin <- CoDa_path(
  comp_direc = comp_origin,
  comp_from = comp_from,
  add_opposite = TRUE,
  dir_from_start = TRUE,
  step_size = .1)
compositions::plot.acomp(path_origin)
compositions::plot.acomp(comp_origin, add = TRUE, col = "blue", pch = 19)
compositions::plot.acomp(comp_from, add = TRUE, col = "red", pch = 19)


A sequence connecting two points in a simplex

Description

A sequence connecting two points in a simplex

Usage

CoDa_seq(comp_from, comp_to, n_steps = 100, add_opposite = FALSE)

Arguments

comp_from

A numeric vector, representing the initial compositions

comp_to

A numeric vector, representing the final compositions.

n_steps

An integer, indicating the number of steps used to go from comp_from to comp_to

add_opposite

A logical, if TRUE the path in the opposite direction is added

Details

The sequence is evenly spaced and corresponds to a straight line in the simplex geometry. If no end point is provided the line will connect the initial point with the first summit of the simplex. Since exact zeros are not handled by the ilr they are replaced by a small constant.

Value

A data.frame frame where each row corresponds to one compositional vector

Author(s)

Lukas Dargel

See Also

simplex_increment

Examples


# path to the first summit of the simplex
start_comp <- c(A =.4,B = .35, C= .25)
compositions::plot.acomp(CoDa_seq(start_comp))
compositions::plot.acomp(CoDa_seq(start_comp, add_opposite = TRUE))

# path to an edge of the simplex
end_comp <- c(0,.8,.2)
compositions::plot.acomp(CoDa_seq(start_comp, end_comp))
compositions::plot.acomp(CoDa_seq(start_comp, end_comp,add_opposite = TRUE))

Computation of elasticities in CoDa regression models

Description

This function computes elasticities and semi-elasticities for CoDa regression model. where we have to distinguish four cases:

Usage

Impacts(object, Xvar = NULL, obs = 1)

Arguments

object

an object of class "lmCoDa"

Xvar

a character indicating the name of one explanatory variable

obs

a numeric that refers to the indicator of one observation

Details

The mathematical foundation for elasticity computations in CoDa model come from Morais and Thomas-Agnan (2021). Dargel and Thomas-Agnan (2024) present further results and illustrations.

Value

a matrix

Author(s)

References

Examples

res <- lmCoDa(YIELD ~ PRECIPITATION + ilr(TEMPERATURES), data = head(rice_yields,20))
Impacts(res, Xvar = "TEMPERATURES")


Compute share ratio elasticities for CoDa models

Description

In CoDa models with compositional dependent variable (Y) share ratio elasticities (SRE) allow to interpret the influence of compositional explanatory variables (X). The interpretation is analogous to usual elasticities:

Usage

ShareRatioElasticities(object, Xvar, Xdir = NULL)

Arguments

object

an object of class "lmCoDa"

Xvar

a character indicating the name of the explanatory variable that changes

Xdir

a numeric vector, a single character, or NULL:

  • if numeric Xdir is taken as a fixed direction in the simplex

  • if character Xdir is interpreted as one summit of the X composition and converted to the fixed direction towards this summit

  • if NULL the share ratio elasticities are computed for variable directions corresponding the example in Dargel and Thomas-Agnan (2024 Lukas Dargel & Christine Thomas-Agnan (2024) The link between multiplicative competitive interaction models and compositional data regression with a total, Journal of Applied Statistics, DOI: 10.1080/02664763.2024.2329923 )

Details

More details on this interpretation can be found in Dargel and Thomas-Agnan (2024) and in the accompanying vignette.

Value

a data.frame

Author(s)

Lukas Dargel

References

Examples


### XY-compositional model
res <- lmCoDa(
  ilr(cbind(left, right, extreme_right)) ~
  ilr(cbind(Educ_BeforeHighschool, Educ_Highschool, Educ_Higher)),
  data =  head(election, 20))

## Focus on changes in the education composition
educ_comp <- "cbind(Educ_BeforeHighschool, Educ_Highschool, Educ_Higher)"

## case 1
## changes towards the summit "Educ_Higher" as (fixed) direction
SRE1 <- ShareRatioElasticities(res, Xvar = educ_comp, Xdir = "Educ_Higher")

SRE1[1,]
# Result: SRE=Inf
# cannot be interpreted because, for this direction,
# the relative change in the share ratio of X (Highschool / BeforeHighschool) is zero
SRE1[7,]
# Result: SRE=0.9
# when the ratio of X (Higher / BeforeHighschool) increases by 1%
# the ratio of Y (right / left) increases by about 0.9%

## case 2
## numeric vector as (fixed) direction
SRE2 <- ShareRatioElasticities(res, Xvar = educ_comp, Xdir = exp(c(0,0,1)))
identical(SRE1,SRE2) # exp(c(0,0,1)) is the direction that points to the third summit

## case 3
## variable directions with Xdir = NULL
## In this case the direction depends components used for the share ratio of X
## In particular the component of X in the numerator grows
## by the same rate as the denominator decreases
SRE3 <- ShareRatioElasticities(res, Xvar = educ_comp, Xdir = NULL)
SRE3[1,]
# Result: SRE=-2.8
# when the ratio of X (Highschool / BeforeHighschool) increases by 1%
# the ratio of Y (right / left) decreases by about -2.8%

Converting Linear Models to CoDa models

Description

The function converts the output of a "lm" to the "lmCoDa" class, which offers additional tools for the interpretation of a CoDa regression models. Most of the work is done by the transformationSummary() function, which has its own documentation page, but should be reserved for internal use.

Usage

ToSimplex(object)

Arguments

object

an object of class "lmCoDa"

Value

an object of class "lm" and "lmCoDa" if the formula include at least one log-transformation

Author(s)

See Also

lm(), lmCoDa()

Examples


# XY-compositional model
res <- lm(
  ilr(cbind(left, right, extreme_right)) ~
  ilr(cbind(Educ_BeforeHighschool, Educ_Highschool, Educ_Higher)),
  data =  head(election, 20))
res <- ToSimplex(res)

# X-compositional model
res <- lm(YIELD ~ PRECIPITATION + ilr(TEMPERATURES), data = head(rice_yields, 20))
res <- ToSimplex(res)

Scenarios for variation in CoDa regressions models

Description

Scenarios of this type are illustrated in Dargel and Thomas-Agnan (2024). They allow to evaluate how the response variable (Y) in a CoDa model would evolve under a hypothetical scenario for linear changes in one explanatory variable (X). When the changing explanatory variable is compositional the term "linear" is understood with respect to the geometry of the simplex.

Usage

VariationScenario(
  object,
  Xvar,
  Xdir,
  obs = 1,
  inc_size = 0.1,
  n_steps = 100,
  add_opposite = TRUE,
  normalize_Xdir = TRUE
)

Arguments

object

an object of class "lmCoDa"

Xvar

a character indicating the name of the explanatory variable that changes

Xdir

either character or numeric, to indicate the direction in which Xvar should change

  • when character this should be one of the components of X, in which case the direction is the corresponding vertex of the simplex

  • when numeric this argument is coerced to a unit vector in the simplex

  • (when Xvar refers to a scalar variable this argument is ignored)

obs

a numeric indicating the observation used for the scenario

inc_size

a numeric indicating the distance between each point in the scenario of X

n_steps

a numeric indicating the number of points in the scenario

add_opposite

a logical, if TRUE the scenario also includes changes in the opposite direction

normalize_Xdir

a logical, if TRUE the direction Xdir scaled to have an Aitchison norm of 1, allowing to interpret inc_size as the Aitchison distance

Details

The linear scenario for X is computed with seq() in the scalar case and with CoDa_seq() in the compositional case. The corresponding changes in Y are computed with the prediction formula, where we exploit the fact that only in one variable is changing.

Value

a data.frame containing the scenario of X and the corresponding predicted values of Y

Author(s)

Lukas Dargel

References

Examples


# ---- model with scalar response ----
res <- lmCoDa(YIELD ~ PRECIPITATION + ilr(TEMPERATURES), data = head(rice_yields,20))
VariationScenario(res, Xvar = "TEMPERATURES", Xdir = "MEDIUM", n_steps = 5)
VariationScenario(res, Xvar = "PRECIPITATION", n_steps = 5)


# ---- model with compositional response ----
res <- lmCoDa(ilr(cbind(left, right, extreme_right)) ~
                ilr(cbind(Age_1839, Age_4064)) +
                ilr(cbind(Educ_BeforeHighschool, Educ_Highschool, Educ_Higher)) +
                log(unemp_rate),
              data = head(election))

VariationScenario(res, Xvar ="cbind(Age_1839,Age_4064)",Xdir = "Age_1839", n_steps = 5)
VariationScenario(res, "log(unemp_rate)", n_steps = 5)


Effects of infinitesimal changes in CoDa models

Description

This function allows to evaluate how a change in an explanatory variables impacts the response variable in a CoDa regression model. The changes are calculated based from the approximate formal presented in Dargel and Thomas-Agnan (2024). Changes in the response variables are provided as data.frame and the underlying changes in the explanatory variable are given as attributes.

Usage

VariationTable(
  object,
  Xvar,
  Xdir,
  obs = 1,
  inc_size = 0.1,
  inc_rate = NULL,
  Ytotal = 1,
  normalize_Xdir = TRUE
)

Arguments

object

an object of class "lmCoDa"

Xvar

a character indicating the name of the explanatory variable that changes

Xdir

either character or numeric, to indicate the direction in which Xvar should change

  • when character this should be one of the components of X, in which case the direction is the corresponding vertex of the simplex

  • when numeric this argument is coerced to a unit vector in the simplex

  • (when Xvar refers to a scalar variable this argument is ignored)

obs

a numeric indicating the observation used for the scenario

inc_size

a numeric indicating the distance between each point in the scenario of X

inc_rate

a numeric that can be used as a parameterization of the step size

Ytotal

a numeric indicating the total of Y

normalize_Xdir

a logical, if TRUE the direction Xdir scaled to have an Aitchison norm of 1, allowing to interpret inc_size as the Aitchison distance

Value

data.frame

Author(s)

References

Examples


# XY-compositional model
res <- lmCoDa(
  ilr(cbind(left, right, extreme_right)) ~
  ilr(cbind(Educ_BeforeHighschool, Educ_Highschool, Educ_Higher)),
  data =  head(election, 20))

# Focus on changes in the education composition
educ_comp <- "cbind(Educ_BeforeHighschool, Educ_Highschool, Educ_Higher)"

# ... changes towards a summit towards a summit (higher share of people with lower education)
VariationTable(res, educ_comp, Xdir = "Educ_BeforeHighschool")

# ... same changes using a compositional vector as direction
VariationTable(res, educ_comp, Xdir = c(.5,.25,.25))

# ... changes in a more general direction and for a different observation
VariationTable(res, educ_comp, Xdir = c(.35,.45,.10), obs = 2)


French car market data

Description

This data set shows monthly data of the French car market between 2003 and 2015. The market is divided into 5 main segments (A to E), according to the size of the vehicle chassis. Morais et. al (2018) first used this data to compare compositional and Dirichlet models for market shares.

Usage

car_market

Format

An object of class data.frame with 152 rows and 10 columns.

Details

Author(s)

Lukas Dargel, Christine Thomas-Agnan

Source

References

Joanna Morais, Christine Thomas-Agnan & Michel Simioni (2018) Using compositional and Dirichlet models for market share regression, Journal of Applied Statistics, 45:9, 1670-1689, DOI: 10.1080/02664763.2017.1389864


Internal: check for valid computational direction arguments

Description

Internal: check for valid computational direction arguments

Usage

check_Xdir(Xdir, Xopts, normalize = FALSE)

Arguments

Xdir

a character or numeric indicating the direction

Xopts

a character indicating the names of the vertices

normalize

a logical if true Xdir is normalized to have an Aitchison norm of 1

Value

a numeric vector

Author(s)

Lukas Dargel


Internal: check for valid name of Xvar

Description

Users should always specify Xvar as "NAME_SIMPLEX", which means before log-ratio transformations.

Usage

check_Xvar(
  Xvar,
  trSry,
  return_type = c("NAME_SIMPLEX", "NAME_COORD", "pos")[1]
)

Arguments

Xvar

a character or numeric indicating the direction

trSry

a character indicating the names of the vertices

Value

a single integer or character

Author(s)

Lukas Dargel


Predictions, fitted values, residuals, and coefficients in CoDa models

Description

These functions work as in the usual lm object. They additionally offer the possibility use the space argument which transforms them into directly into clr space or in the simplex.

Usage

## S3 method for class 'lmCoDa'
coef(object, space = NULL, split = FALSE, ...)

Arguments

object

class "lmCoDa"

space

a character indicating in which space the prediction should be returned. Supported are the options c("clr", "simplex").

split

logical, if TRUE the coefficients are reported as a list instead of a matrix, where list structure reflects the explanatory variables of the model

...

not used

Value

a matrix

Author(s)

Lukas Dargel


Confidence Intervals for CoDa Models

Description

Dargel and Thomas-Agnan (2024) show to compute variances and confidence intervals for parameters of CoDa models in log-ratio spaces.

Of particular interest are the clr parameters since they can be directly interpreted as differences from an average elasticity.

Another option is interpret the difference in clr parameters as these coincide with the difference in elasticities.

Usage

## S3 method for class 'lmCoDa'
confint(object, parm, level = 0.95, y_ref = NULL, obs = NULL, ...)

Arguments

object

class "lmCoDa"

parm

a character, indicating the name of one explanatory variable

level

a numeric, indicating the confidence level required

y_ref

an optional argument that indicates the reference component of the response variable using its name or its position.
This argument is only used in the Y-compositional model. If it is supplied confidence intervals of difference are used instead of the direct intervals of the parameters.

obs

an optional integer that indicates one observation when this argument is supplied the function return the observation dependent elasticity

...

passed on to confit()

Details

Since CoDa models are often multivariate this function only allows to specify one explanatory variable at a time. The output is also more complex than the usual one for "lm" classes, because we have to indicate the component of Y and X. With confint.lm() it is still possible to compute the usual the confidence intervals.

Value

data.frame

Author(s)

Lukas Dargel

References

Examples


## ==== Y-compositional model ====
res <- lmCoDa(
  ilr(cbind(left, right, extreme_right)) ~
  ilr(cbind(Age_1839, Age_4064)) +
  ilr(cbind(Educ_BeforeHighschool, Educ_Highschool, Educ_Higher)) +
  unemp_rate,
  data = head(election, 20))

## ---- CI for scalar X
# CI for clr parameters
confint(res, "unemp_rate")
# CI for difference in clr parameters (coincides with difference in the semi elasticity)
confint(res, "unemp_rate", y_ref = 1)

## ---- CI for compositional X
# CI for clr parameters
confint(res, "cbind(Age_1839, Age_4064)")

# CI for difference in clr parameters (coincides with difference in the elasticity)
confint(res, "cbind(Age_1839, Age_4064)", y_ref = 1)



Results of french departmental elections in 2015

Description

The data is used by Nguyen et. al (2020) and originally disseminated by the French ministry (Ministère de l'Intérieur et des Outre-Mer). Information about the population characteristics comes from the french national statistics institute (INSEE).

Usage

election

Format

An object of class data.frame with 95 rows and 13 columns.

Details

Author(s)

Lukas Dargel, Christine Thomas-Agnan

Source

References

Nguyen THA, Laurent T, Thomas-Agnan C, Ruiz-Gazen A. Analyzing the impacts of socio-economic factors on French departmental elections with CoDa methods. J Appl Stat. 2020 Dec 9;49(5):1235-1251. doi: 10.1080/02664763.2020.1858274. PMID: 35707505; PMCID: PMC9041641.


Predictions, fitted values, residuals, and coefficients in CoDa models

Description

These functions work as in the usual lm object. They additionally offer the possibility use the space argument which transforms them into directly into clr space or in the simplex.

Usage

## S3 method for class 'lmCoDa'
fitted(object, space = NULL, ...)

Arguments

object

class "lmCoDa"

space

a character indicating in which space the prediction should be returned. Supported are the options c("clr", "simplex").

...

passed on to predict.lm()

Value

matrix or vector

Author(s)

Lukas Dargel


Estimating CoDa regression models

Description

This is a thin wrapper around lm() followed by ToSimplex(), which allows to create a lmCoDa object in one step.

Usage

lmCoDa(formula, data, ...)

Arguments

formula

as in lm()

data

as in lm()

...

arguments passed on to lm()

Value

an object of class "lm" and "lmCoDa" if the formula include at least one log-transformation

Author(s)

Lukas Dargel

See Also

lm(), ToSimplex(), compositions::ilr(), compositions::alr()

Examples


# XY-compositional model
res <- lmCoDa(
  ilr(cbind(left, right, extreme_right)) ~
  ilr(cbind(Educ_BeforeHighschool, Educ_Highschool, Educ_Higher)),
  data =  head(election, 20))

# X-compositional model
res <- lmCoDa(YIELD ~ PRECIPITATION + ilr(TEMPERATURES), data = head(rice_yields, 20))


Format numbers to percentages This code copied from stats:::format.perc(), to avoid notes about the ::: operator.

Description

Format numbers to percentages This code copied from stats:::format.perc(), to avoid notes about the ::: operator.

Usage

pct(probs, digits = 10)

Predictions, fitted values, residuals, and coefficients in CoDa models

Description

These functions work as in the usual lm object. They additionally offer the possibility use the space argument which transforms them into directly into clr space or in the simplex.

Usage

## S3 method for class 'lmCoDa'
predict(object, space = NULL, ...)

Arguments

object

class "lmCoDa"

space

a character indicating in which space the prediction should be returned. Supported are the options c("clr", "simplex").

...

passed on to predict.lm()

Value

matrix or vector

Author(s)

Lukas Dargel


Objects exported from other packages

Description

These objects are imported from other packages. Follow the links below to see their documentation.

compositions

alr, alrInv, clr, clrInv, ilr, ilrInv


Predictions, fitted values, residuals, and coefficients in CoDa models

Description

These functions work as in the usual lm object. They additionally offer the possibility use the space argument which transforms them into directly into clr space or in the simplex.

Usage

## S3 method for class 'lmCoDa'
residuals(object, space = NULL, ...)

Arguments

object

class "lmCoDa"

space

a character indicating in which space the prediction should be returned. Supported are the options c("clr", "simplex").

...

passed on to predict.lm()

Value

matrix or vector

Author(s)

Lukas Dargel


Data on the rice yields in the Vietnamese provinces

Description

The data is presented in Trinh et al. (2023) for studying the impact of climate change on rice production in Vietnam.
It contains the following information:

Usage

rice_yields

Format

An object of class data.frame with 1890 rows and 6 columns.

Author(s)

Lukas Dargel, Christine Thomas-Agnan

References

Thi-Huong Trinh, Michel Simioni, and Christine Thomas-Agnan, “Discrete and Smooth Scalar-on-Density Compositional Regression for Assessing the Impact of Climate Change on Rice Yield in Vietnam”, TSE Working Paper, n. 23-1410, February 2023.


Simulated retail data for nine shopping malls in the city of Toulouse

Description

This data set provides an example for the use of CoDa models in geomarketing applications. The data is simulated, but realistic in the sense that the parameters used for the simulation were estimated on a real, but confidential data set (Dargel and Thomas-Agnan 2024).

Usage

toulouse_retail

Format

An object of class sf (inherits from data.table, data.frame) with 428 rows and 6 columns.

Details

Author(s)

Lukas Dargel, Christine Thomas-Agnan

Source

References

Lukas Dargel & Christine Thomas-Agnan (2024) “The link between multiplicative competitive interaction models and compositional data regression with a total”, Journal of Applied Statistics, DOI: 10.1080/02664763.2024.2329923

Summarize the transformations in a CoDa model (internal)

Description

Extract from a CoDa model estimated by lm() all information related to the log-ratio transformations of the variables and the parameters.

Usage

transformationSummary(lm_res)

Arguments

lm_res

class "lm"

Details

The structure of the return value resembles a data.frame where most columns are lists instead of vectors. The rows in this data.frame correspond to the variables used for fitting the model. The columns store information on the log-ratio transformations and their associated bases (K and F). Additionally the clr parameters and the covariance matrices are retained.

Value

data.frame, with list columns

Author(s)

Lukas Dargel