Help for package RprobitB

Type:

Package

Title:

Bayesian Probit Choice Modeling

Version:

1.1.4

Description:

Bayes estimation of probit choice models, both in the cross-sectional and panel setting. The package can analyze binary, multivariate, ordered, and ranked choices, as well as heterogeneity of choice behavior among deciders. The main functionality includes model fitting via Markov chain Monte Carlo m ethods, tools for convergence diagnostic, choice data simulation, in-sample and out-of-sample choice prediction, and model selection using information criteria and Bayes factors. The latent class model extension facilitates preference-based decider classification, where the number of latent classes can be inferred via the Dirichlet process or a weight-based updating heuristic. This allows for flexible modeling of choice behavior without the need to impose structural constraints. For a reference on the method see Oelschlaeger and Bauer (2021) https://trid.trb.org/view/1759753.

URL:

https://loelschlaeger.de/RprobitB/, https://github.com/loelschlaeger/RprobitB

BugReports:

https://github.com/loelschlaeger/RprobitB/issues

License:

GPL-3

Encoding:

UTF-8

Imports:

checkmate, cli, crayon, doSNOW, foreach, ggplot2, graphics, gridExtra, MASS, mixtools, mvtnorm, oeli (≥ 0.4.1), parallel, plotROC, progress, Rcpp, Rdpack, rlang, stats, utils, viridis

LinkingTo:

Rcpp, RcppArmadillo

Suggests:

knitr, mlogit, testthat (≥ 3.0.0)

Depends:

R (≥ 3.5.0)

RoxygenNote:

7.3.1

RdMacros:

Rdpack

VignetteBuilder:

knitr

LazyData:

true

LazyDataCompression:

Config/testthat/edition:

NeedsCompilation:

yes

Packaged:

2024-02-26 10:52:22 UTC; loelschlaeger@ad.uni-bielefeld.de

Author:

Lennart Oelschläger

[aut, cre], Dietmar Bauer

[aut], Sebastian Büscher [ctb], Manuel Batram [ctb]

Maintainer:

Lennart Oelschläger <oelschlaeger.lennart@gmail.com>

Repository:

CRAN

Date/Publication:

2024-02-26 14:10:06 UTC

RprobitB: Bayesian Probit Choice Modeling

Description

Author(s)

Maintainer: Lennart Oelschläger oelschlaeger.lennart@gmail.com (ORCID)

Authors:

Dietmar Bauer dietmar.bauer@uni-bielefeld.de (ORCID)

Other contributors:

Sebastian Büscher sebastian.buescher@uni-bielefeld.de [contributor]
Manuel Batram manuel.batram@uni-bielefeld.de [contributor]

Compute Gelman-Rubin statistic

Description

This function computes the Gelman-Rubin statistic R_hat.

Usage

R_hat(samples, parts = 2)

Arguments

samples

A vector or a matrix of samples from a Markov chain, e.g. Gibbs samples. If samples is a matrix, each column gives the samples for a separate run.

parts

The number of parts to divide each chain into sub-chains.

Value

A numeric value, the Gelman-Rubin statistic.

References

https://bookdown.org/rdpeng/advstatcomp/monitoring-convergence.html

Examples

no_chains <- 2
length_chains <- 1e3
samples <- matrix(NA_real_, length_chains, no_chains)
samples[1, ] <- 1
Gamma <- matrix(c(0.8, 0.1, 0.2, 0.9), 2, 2)
for (c in 1:no_chains) {
  for (t in 2:length_chains) {
    samples[t, c] <- sample(1:2, 1, prob = Gamma[samples[t - 1, c], ])
  }
}
R_hat(samples)

Create object of class `RprobitB_data`

Description

This function constructs an object of class RprobitB_data.

Usage

RprobitB_data(
  data,
  choice_data,
  N,
  T,
  J,
  P_f,
  P_r,
  alternatives,
  ordered,
  ranked,
  base,
  form,
  re,
  ASC,
  effects,
  standardize,
  simulated,
  choice_available,
  true_parameter,
  res_var_names
)

Arguments

data

A list with the choice data. The list has N elements. Each element is a list with two elements, X and y, which are the covariates and decisions for a decision maker. More precisely, X is a list of T elements, where each element is a matrix of dimension Jx(P_f+P_r) and contains the characteristics for one choice occasion. y is a vector of length T and contains the labels for the chosen alternatives.

choice_data

A data.frame of choice data in wide format, i.e. each row represents one choice occasion.

N

The number (greater or equal 1) of decision makers.

T

The number (greater or equal 1) of choice occasions or a vector of choice occasions of length N (i.e. a decision maker specific number). Per default, T = 1.

J

The number (greater or equal 2) of choice alternatives.

P_f

The number of covariates connected to a fixed coefficient (can be 0).

P_r

The number of covariates connected to a random coefficient (can be 0).

alternatives

A character vector with the names of the choice alternatives. If not specified, the choice set is defined by the observed choices. If ordered = TRUE, alternatives is assumed to be specified with the alternatives ordered from worst to best.

ordered

A boolean, FALSE per default. If TRUE, the choice set alternatives is assumed to be ordered from worst to best.

ranked

TBA

base

A character, the name of the base alternative for covariates that are not alternative specific (i.e. type 2 covariates and ASCs). Ignored and set to NULL if the model has no alternative specific covariates (e.g. in the ordered probit model). Per default, base is the last element of alternatives.

form

A formula object that is used to specify the model equation. The structure is choice ~ A | B | C, where

choice is the name of the dependent variable (the choices),
A are names of alternative and choice situation specific covariates with a coefficient that is constant across alternatives,
B are names of choice situation specific covariates with alternative specific coefficients,
and C are names of alternative and choice situation specific covariates with alternative specific coefficients.

Multiple covariates (of one type) are separated by a + sign. By default, alternative specific constants (ASCs) are added to the model. They can be removed by adding +0 in the second spot.

In the ordered probit model (ordered = TRUE), the formula object has the simple structure choice ~ A. ASCs are not estimated.

re

A character (vector) of covariates of form with random effects. If re = NULL (the default), there are no random effects. To have random effects for the ASCs, include "ASC" in re.

ASC

A boolean, determining whether the model has ASCs.

effects

A data frame with the effect names and booleans indicating whether they are connected to random effects.

standardize

A character vector of names of covariates that get standardized. Covariates of type 1 or 3 have to be addressed by <covariate>_<alternative>. If standardize = "all", all covariates get standardized.

simulated

A boolean, if TRUE then data is simulated, otherwise data is empirical.

choice_available

A boolean, if TRUE then data contains observed choices.

true_parameter

An object of class RprobitB_parameters.

res_var_names

A names list of reserved variable names in choice_data.

Value

An object of class RprobitB_data with the arguments of this function as elements.

Create object of class `RprobitB_fit`

Description

This function creates an object of class RprobitB_fit.

Usage

RprobitB_fit(
  data,
  scale,
  level,
  normalization,
  R,
  B,
  Q,
  latent_classes,
  prior,
  gibbs_samples,
  class_sequence,
  comp_time
)

## S3 method for class 'RprobitB_fit'
summary(object, FUN = c(mean = mean, sd = stats::sd, `R^` = R_hat), ...)

Arguments

data

An object of class RprobitB_data.

scale

A character which determines the utility scale. It is of the form ⁠<parameter> := <value>⁠, where ⁠<parameter>⁠ is either the name of a fixed effect or ⁠Sigma_<j>,<j>⁠ for the ⁠<j>⁠th diagonal element of Sigma, and ⁠<value>⁠ is the value of the fixed parameter.

normalization

An object of class RprobitB_normalization.

R

The number of iterations of the Gibbs sampler.

B

The length of the burn-in period, i.e. a non-negative number of samples to be discarded.

Q

The thinning factor for the Gibbs samples, i.e. only every Qth sample is kept.

latent_classes

Either NULL (for no latent classes) or a list of parameters specifying the number of latent classes and their updating scheme:

C: The fixed number (greater or equal 1) of latent classes, which is set to 1 per default. If either weight_update = TRUE or dp_update = TRUE (i.e. if classes are updated), C equals the initial number of latent classes.
weight_update: A boolean, set to TRUE to weight-based update the latent classes. See ... for details.
dp_update: A boolean, set to TRUE to update the latent classes based on a Dirichlet process. See ... for details.
Cmax: The maximum number of latent classes.
buffer: The number of iterations to wait before a next weight-based update of the latent classes.
epsmin: The threshold weight (between 0 and 1) for removing a latent class in the weight-based updating scheme.
epsmax: The threshold weight (between 0 and 1) for splitting a latent class in the weight-based updating scheme.
distmin: The (non-negative) threshold in class mean difference for joining two latent classes in the weight-based updating scheme.

prior

A named list of parameters for the prior distributions. See the documentation of check_prior for details about which parameters can be specified.

gibbs_samples

An object of class RprobitB_gibbs_samples.

class_sequence

The sequence of class numbers during Gibbs sampling of length R.

comp_time

The time spent for Gibbs sampling.

...

Currently not used.

Value

An object of class RprobitB_fit.

Create object of class `RprobitB_gibbs_samples_statistics`

Description

This function creates an object of class RprobitB_gibbs_samples_statistics.

Usage

RprobitB_gibbs_samples_statistics(gibbs_samples, FUN = list(mean = mean))

Arguments

gibbs_samples

An object of class RprobitB_gibbs_samples, which generally is located as object gibbs_samples in an RprobitB_model object.

FUN

A (preferably named) list of functions that compute parameter statistics from the Gibbs samples, for example

mean for the mean,
sd for the standard deviation,
min for the minimum,
max for the maximum,
median for the median,
function(x) quantile(x, p) for the pth quantile,
R_hat for the Gelman-Rubin statistic.

Value

An object of class RprobitB_gibbs_samples_statistics, which is a list of statistics from gibbs_samples obtained by applying the elements of FUN.

Create object of class `RprobitB_latent_classes`

Description

This function creates an object of class RprobitB_latent_classes which defines the number of latent classes and their updating scheme. The RprobitB_latent_classes object generated by this function is only of relevance if the model possesses at least one random coefficient, i.e. if P_r>0.

Usage

RprobitB_latent_classes(latent_classes = NULL)

## S3 method for class 'RprobitB_latent_classes'
print(x, ...)

Arguments

latent_classes

Either NULL (for no latent classes) or a list of parameters specifying the number of latent classes and their updating scheme:

C: The fixed number (greater or equal 1) of latent classes, which is set to 1 per default. If either weight_update = TRUE or dp_update = TRUE (i.e. if classes are updated), C equals the initial number of latent classes.
weight_update: A boolean, set to TRUE to weight-based update the latent classes. See ... for details.
dp_update: A boolean, set to TRUE to update the latent classes based on a Dirichlet process. See ... for details.
Cmax: The maximum number of latent classes.
buffer: The number of iterations to wait before a next weight-based update of the latent classes.
epsmin: The threshold weight (between 0 and 1) for removing a latent class in the weight-based updating scheme.
epsmax: The threshold weight (between 0 and 1) for splitting a latent class in the weight-based updating scheme.
distmin: The (non-negative) threshold in class mean difference for joining two latent classes in the weight-based updating scheme.

Details

Why update latent classes?

In order not to have to specify the number of latent classes before estimation.

What options to update latent classes exist?

Currently two updating schemes are implemented, weight-based and via a Dirichlet process, see the vignette on modeling heterogeneity.

What is the default behavior?

One latent class without updates is specified per default. Print an RprobitB_latent_classes-object to see a summary of all relevant (default) parameter settings.

Why is `Cmax` required?

The implementation requires an upper bound on the number of latent classes for saving the Gibbs samples. However, this is not a restriction since the number of latent classes is bounded by the number of deciders in any case. A plot method for visualizing the sequence of class numbers after estimation and can be used to check if Cmax was reached, see plot.RprobitB_fit.

Value

An object of class RprobitB_latent_classes.

Examples

### default setting
RprobitB:::RprobitB_latent_classes()

### setting for a fixed number of two latent classes
RprobitB:::RprobitB_latent_classes(list(C = 2))

### setting for weight-based on Dirichlet process-based updates
RprobitB:::RprobitB_latent_classes(
  list("weight_update" = TRUE, "dp_update" = TRUE)
)

Create object of class `RprobitB_normalization`

Description

This function creates an object of class RprobitB_normalization, which determines the utility scale and level.

Usage

RprobitB_normalization(
  level,
  scale = "Sigma_1,1 := 1",
  form,
  re = NULL,
  alternatives,
  base,
  ordered = FALSE
)

## S3 method for class 'RprobitB_normalization'
print(x, ...)

Arguments

level

The alternative name with respect to which utility differences are computed. Currently, only differences with respect to the last alternative can be computed.

scale

form

A formula object that is used to specify the model equation. The structure is choice ~ A | B | C, where

choice is the name of the dependent variable (the choices),
A are names of alternative and choice situation specific covariates with a coefficient that is constant across alternatives,
B are names of choice situation specific covariates with alternative specific coefficients,
and C are names of alternative and choice situation specific covariates with alternative specific coefficients.

Multiple covariates (of one type) are separated by a + sign. By default, alternative specific constants (ASCs) are added to the model. They can be removed by adding +0 in the second spot.

In the ordered probit model (ordered = TRUE), the formula object has the simple structure choice ~ A. ASCs are not estimated.

re

A character (vector) of covariates of form with random effects. If re = NULL (the default), there are no random effects. To have random effects for the ASCs, include "ASC" in re.

alternatives

base

ordered

A boolean, FALSE per default. If TRUE, the choice set alternatives is assumed to be ordered from worst to best.

x

An object of class RprobitB_normalization.

...

Currently not used.

Details

Any choice model has to be normalized with respect to the utility level and scale.

For level normalization, {RprobitB} takes utility differences with respect to one alternative. For the ordered model where only one utility is modeled, {RprobitB} fixes the first utility threshold to 0.
For scale normalization, {RprobitB} fixes one model parameter. Per default, the first error-term variance is fixed to 1. This is specified via scale = "Sigma_1,1 := 1". Alternatively, any error-term variance or any non-random coefficient can be fixed.

Value

An object of class RprobitB_normalization, which is a list of

level, a list with the elements level (the number of the alternative specified by the input level) and name (the name of the alternative, i.e. the input level), or alternatively NA in the ordered probit case,
and scale, a list with the elements parameter (either "s" for an element of Sigma or "a"for an element of alpha), the parameter index, and the fixed value. If parameter = "a", also the name of the fixed effect.

Examples

RprobitB:::RprobitB_normalization(
  level = "B",
  scale = "price := -1",
  form = choice ~ price + time + comfort + change | 1,
  re = "time",
  alternatives = c("A", "B"),
  base = "A"
)

Define probit model parameter

Description

This function creates an object of class RprobitB_parameter, which contains the parameters of a probit model. If sample = TRUE, missing parameters are sampled. All parameters are checked against the values of P_f, P_r, J, and N.

Usage

RprobitB_parameter(
  P_f,
  P_r,
  J,
  N,
  ordered = FALSE,
  alpha = NULL,
  C = NULL,
  s = NULL,
  b = NULL,
  Omega = NULL,
  Sigma = NULL,
  Sigma_full = NULL,
  beta = NULL,
  z = NULL,
  d = NULL,
  seed = NULL,
  sample = TRUE
)

Arguments

P_f

The number of covariates connected to a fixed coefficient (can be 0).

P_r

The number of covariates connected to a random coefficient (can be 0).

J

The number (greater or equal 2) of choice alternatives.

N

The number (greater or equal 1) of decision makers.

ordered

A boolean, FALSE per default. If TRUE, the choice set alternatives is assumed to be ordered from worst to best.

alpha

The fixed coefficient vector of length P_f. Set to NA if P_f = 0.

C

The number (greater or equal 1) of latent classes of decision makers. Set to NA if P_r = 0. Otherwise, C = 1 per default.

s

The vector of class weights of length C. Set to NA if P_r = 0. For identifiability, the vector must be non-ascending.

b

The matrix of class means as columns of dimension P_r x C. Set to NA if P_r = 0.

Omega

The matrix of class covariance matrices as columns of dimension P_r*P_r x C. Set to NA if P_r = 0.

Sigma

The differenced error term covariance matrix of dimension J-1 x J-1 with respect to alternative J. In case of ordered = TRUE, a numeric, the single error term variance.

Sigma_full

The error term covariance matrix of dimension J x J. Internally, Sigma_full gets differenced with respect to alternative J, so it becomes an identified covariance matrix of dimension J-1 x J-1. Sigma_full is ignored if Sigma is specified or ordered = TRUE.

beta

The matrix of the decision-maker specific coefficient vectors of dimension P_r x N. Set to NA if P_r = 0.

z

The vector of the allocation variables of length N. Set to NA if P_r = 0.

d

The numeric vector of the logarithmic increases of the utility thresholds in the ordered probit case (ordered = TRUE) of length J-2.

seed

Set a seed for the sampling of missing parameters.

sample

A boolean, if TRUE (default) missing parameters get sampled.

Value

An object of class RprobitB_parameter, i.e. a named list with the model parameters alpha, C, s, b, Omega, Sigma, Sigma_full, beta, and z.

Examples

RprobitB_parameter(P_f = 1, P_r = 2, J = 3, N = 10)

Compute WAIC value

Description

This function computes the WAIC value of an RprobitB_fit object.

Usage

WAIC(x)

Arguments

x

An object of class RprobitB_fit.

Details

WAIC is short for Widely Applicable (or Watanabe-Akaike) Information Criterion. As for AIC and BIC, the smaller the WAIC value the better the model. Its definition is

WAIC = -2 \cdot lppd + 2 \cdot p_{WAIC},

where lppd stands for log pointwise predictive density and p_{WAIC} is a penalty term proportional to the variance in the posterior distribution that is sometimes called effective number of parameters. The lppd is approximated as follows. Let

p_{is} = \Pr(y_i\mid \theta_s)

be the probability of observation y_i given the sth set \theta_s of parameter samples from the posterior. Then

lppd = \sum_i \log S^{-1} \sum_s p_{si}.

The penalty term is computed as the sum over the variances in log-probability for each observation:

p_{WAIC} = \sum_i V_{\theta} \left[ \log p_{si} \right].

Value

A numeric, the WAIC value, with the following attributes:

se_waic, the standard error of the WAIC value,
lppd, the log pointwise predictive density,
p_waic, the effective number of parameters,
p_waic_vec, the vector of summands of p_waic,
p_si, the output of compute_p_si.

Re-label alternative specific covariates

Description

In {RprobitB}, alternative specific covariates must be named in the format "<covariate>_<alternative>". This convenience function generates the format for a given choice_data set.

Usage

as_cov_names(choice_data, cov, alternatives)

Arguments

choice_data

A data.frame of choice data in wide format, i.e. each row represents one choice occasion.

cov

A character vector of the names of alternative specific covariates in choice_data.

alternatives

A (character or numeric) vector of the alternative names.

Value

The choice_data input with updated column names.

Examples

data("Electricity", package = "mlogit")
cov <- c("pf", "cl", "loc", "wk", "tod", "seas")
alternatives <- 1:4
colnames(Electricity)
Electricity <- as_cov_names(Electricity, cov, alternatives)
colnames(Electricity)

Check model formula

Description

This function checks the input form.

Usage

check_form(form, re = NULL, ordered = FALSE)

Arguments

form

A formula object that is used to specify the model equation. The structure is choice ~ A | B | C, where

choice is the name of the dependent variable (the choices),
A are names of alternative and choice situation specific covariates with a coefficient that is constant across alternatives,
B are names of choice situation specific covariates with alternative specific coefficients,
and C are names of alternative and choice situation specific covariates with alternative specific coefficients.

Multiple covariates (of one type) are separated by a + sign. By default, alternative specific constants (ASCs) are added to the model. They can be removed by adding +0 in the second spot.

In the ordered probit model (ordered = TRUE), the formula object has the simple structure choice ~ A. ASCs are not estimated.

re

A character (vector) of covariates of form with random effects. If re = NULL (the default), there are no random effects. To have random effects for the ASCs, include "ASC" in re.

ordered

A boolean, FALSE per default. If TRUE, the choice set alternatives is assumed to be ordered from worst to best.

Value

A list that contains the following elements:

The input form.
The name choice of the dependent variable in form.
The input re.
A list vars of three character vectors of covariate names of the three covariate types.
A boolean ASC, determining whether the model has ASCs.

Check prior parameters

Description

This function checks the compatibility of submitted parameters for the prior distributions and sets missing values to default values.

Usage

check_prior(
  P_f,
  P_r,
  J,
  ordered = FALSE,
  eta = numeric(P_f),
  Psi = diag(P_f),
  delta = 1,
  xi = numeric(P_r),
  D = diag(P_r),
  nu = P_r + 2,
  Theta = diag(P_r),
  kappa = if (ordered) 4 else (J + 1),
  E = if (ordered) diag(1) else diag(J - 1),
  zeta = numeric(J - 2),
  Z = diag(J - 2)
)

Arguments

P_f

The number of covariates connected to a fixed coefficient (can be 0).

P_r

The number of covariates connected to a random coefficient (can be 0).

J

The number (greater or equal 2) of choice alternatives.

ordered

A boolean, FALSE per default. If TRUE, the choice set alternatives is assumed to be ordered from worst to best.

eta

The mean vector of length P_f of the normal prior for alpha. Per default, eta = numeric(P_f).

Psi

The covariance matrix of dimension P_f x P_f of the normal prior for alpha. Per default, Psi = diag(P_f).

delta

A numeric for the concentration parameter vector rep(delta,C) of the Dirichlet prior for s. Per default, delta = 1. In case of Dirichlet process-based updates of the latent classes, delta = 0.1 per default.

xi

The mean vector of length P_r of the normal prior for each b_c. Per default, xi = numeric(P_r).

D

The covariance matrix of dimension P_r x P_r of the normal prior for each b_c. Per default, D = diag(P_r).

nu

The degrees of freedom (a natural number greater than P_r) of the Inverse Wishart prior for each Omega_c. Per default, nu = P_r + 2.

Theta

The scale matrix of dimension P_r x P_r of the Inverse Wishart prior for each Omega_c. Per default, Theta = diag(P_r).

kappa

The degrees of freedom (a natural number greater than J-1) of the Inverse Wishart prior for Sigma. Per default, kappa = J + 1.

E

The scale matrix of dimension J-1 x J-1 of the Inverse Wishart prior for Sigma. Per default, E = diag(J - 1).

zeta

The mean vector of length J - 2 of the normal prior for the logarithmic increments d of the utility thresholds in the ordered probit model. Per default, zeta = numeric(J - 2).

Z

The covariance matrix of dimension J-2 x J-2 of the normal prior for the logarithmic increments d of the utility thresholds in the ordered probit model. Per default, Z = diag(J - 2).

Details

A priori, we assume that the model parameters follow these distributions:

\alpha \sim N(\eta, \Psi)
s \sim Dir(\delta)
b_c \sim N(\xi, D) for all classes c
\Omega_c \sim IW(\nu,\Theta) for all classes c
\Sigma \sim IW(\kappa,E)
d \sim N(\zeta, Z)

where N denotes the normal, Dir the Dirichlet, and IW the Inverted Wishart distribution.

Value

An object of class RprobitB_prior, which is a list containing all prior parameters. Parameters that are not relevant for the model configuration are set to NA.

Examples

check_prior(P_f = 1, P_r = 2, J = 3, ordered = TRUE)

Compute choice probabilities

Description

This function returns the choice probabilities of an RprobitB_fit object.

Usage

choice_probabilities(x, data = NULL, par_set = mean)

Arguments

x

An object of class RprobitB_fit.

data

Either NULL or an object of class RprobitB_data. In the former case, choice probabilities are computed for the data that was used for model fitting. Alternatively, a new data set can be provided.

par_set

Specifying the parameter set for calculation and either

a function that computes a posterior point estimate (the default is mean()),
"true" to select the true parameter set,
an object of class RprobitB_parameter.

Value

A data frame of choice probabilities with choice situations in rows and alternatives in columns. The first two columns are the decider identifier "id" and the choice situation identifier "idc".

Examples

data <- simulate_choices(form = choice ~ covariate, N = 10, T = 10, J = 2)
x <- fit_model(data)
choice_probabilities(x)

Classify deciders preference-based

Description

This function classifies the deciders based on their allocation to the components of the mixing distribution.

Usage

classification(x, add_true = FALSE)

Arguments

x

An object of class RprobitB_fit.

add_true

Set to TRUE to add true class memberships to output (if available).

Details

The function can only be used if the model has at least one random effect (i.e. P_r >= 1) and at least two latent classes (i.e. C >= 2).

In that case, let z_1,\dots,z_N denote the class allocations of the N deciders based on their estimated mixed coefficients \beta = (\beta_1,\dots,\beta_N). Independently for each decider n, the conditional probability \Pr(z_n = c \mid s,\beta_n,b,\Omega) of having \beta_n allocated to class c for c=1,\dots,C depends on the class allocation vector s, the class means b=(b_c)_c and the class covariance matrices Omega=(Omega_c)_c and is proportional to

s_c \phi(\beta_n \mid b_c,Omega_c).

This function displays the relative frequencies of which each decider was allocated to the classes during the Gibbs sampling. Only the thinned samples after the burn-in period are considered.

Value

A data frame. The row names are the decider ids. The first C columns contain the relative frequencies with which the deciders are allocated to the C classes. Next, the column est contains the estimated class of the decider based on the highest allocation frequency. If add_true, the next column true contains the true class memberships.

Extract model effects

Description

This function extracts the estimated model effects.

Usage

## S3 method for class 'RprobitB_fit'
coef(object, ...)

Arguments

object

An object of class RprobitB_fit.

...

Ignored.

Value

An object of class RprobitB_coef.

Compute probit choice probabilities

Description

This is a helper function for choice_probabilities and computes the probit choice probabilities for a single choice situation with J alternatives.

Usage

compute_choice_probabilities(X, alternatives, parameter, ordered = FALSE)

Arguments

X

A matrix of covariates with J rows and P_f + P_r columns, where the first P_f columns are connected to fixed coefficients and the last P_r columns are connected to random coefficients.

alternatives

A vector with unique integers from 1 to J, indicating the alternatives for which choice probabilities are to be computed.

parameter

An object of class RprobitB_parameter.

ordered

A boolean, FALSE per default. If TRUE, the choice set alternatives is assumed to be ordered from worst to best.

Value

A probability vector of length length(alternatives).

Compute choice probabilities at posterior samples

Description

This function computes the probability for each observed choice at the (normalized, burned and thinned) samples from the posterior. These probabilities are required to compute the WAIC and the marginal model likelihood mml.

Usage

compute_p_si(x, ncores = parallel::detectCores() - 1, recompute = FALSE)

Arguments

x

An object of class RprobitB_fit.

ncores

This function is parallelized, set the number of cores here.

recompute

Set to TRUE to recompute the probabilities.

Value

The object x, including the object p_si, which is a matrix of probabilities, observations in rows and posterior samples in columns.

Extract estimated covariance matrix of mixing distribution

Description

This convenience function returns the estimated covariance matrix of the mixing distribution.

Usage

cov_mix(x, cor = FALSE)

Arguments

x

An object of class RprobitB_fit.

cor

If TRUE, returns the correlation matrix instead.

Value

The estimated covariance matrix of the mixing distribution. In case of multiple classes, a list of matrices for each class.

Create labels for `Omega`

Description

This function creates labels for the model parameter Omega.

Usage

create_labels_Omega(P_r, C, cov_sym)

Arguments

P_r

The number of covariates connected to a random coefficient (can be 0).

cov_sym

Set to TRUE for labels of symmetric covariance elements.

Details

The labels are of the form "c.p1,p2", where c is the latent class number and p1,p2 the indeces of two random coefficients.

Value

A vector of labels for the model parameter Omega of length P_r^2 * C if P_r > 0 and cov_sym = TRUE or of length P_r*(P_r+1)/2*C if cov_sym = FALSE and NULL otherwise.

Examples

RprobitB:::create_labels_Omega(2, 3, cov_sym = TRUE)
RprobitB:::create_labels_Omega(2, 3, cov_sym = FALSE)

Create labels for `Sigma`

Description

This function creates labels for the model parameter Sigma.

Usage

create_labels_Sigma(J, cov_sym, ordered = FALSE)

Arguments

J

The number (greater or equal 2) of choice alternatives.

cov_sym

Set to TRUE for labels of symmetric covariance elements.

ordered

A boolean, FALSE per default. If TRUE, the choice set alternatives is assumed to be ordered from worst to best.

Details

The labels are of the form "j1,j2", where j1,j2 are indices of the two alternatives j1 and j2.

Value

A vector of labels for the model parameter Sigma of length (J-1)^2 if cov_sym = TRUE or of length J*(J-1)/2 if cov_sym = FALSE. If ordered = TRUE, Sigma has only one element.

Examples

RprobitB:::create_labels_Sigma(3, cov_sym = TRUE)
RprobitB:::create_labels_Sigma(4, cov_sym = FALSE)
RprobitB:::create_labels_Sigma(4, ordered = TRUE)

Create labels for `alpha`

Description

This function creates labels for the model parameter alpha.

Usage

create_labels_alpha(P_f)

Arguments

P_f

The number of covariates connected to a fixed coefficient (can be 0).

Value

A vector of labels for the model parameter alpha of length P_f if P_f > 0 and NULL otherwise.

Examples

RprobitB:::create_labels_alpha(P_f = 3)

Create labels for `b`

Description

This function creates labels for the model parameter b.

Usage

create_labels_b(P_r, C)

Arguments

P_r

The number of covariates connected to a random coefficient (can be 0).

Details

The labels are of the form "c.p", where c is the latent class number and p the index of the random coefficient.

Value

A vector of labels for the model parameter b of length P_r * C if P_r > 0 and NULL otherwise.

Examples

RprobitB:::create_labels_b(2, 3)

Create labels for `d`

Description

This function creates labels for the model parameter d.

Usage

create_labels_d(J, ordered)

Arguments

J

The number (greater or equal 2) of choice alternatives.

ordered

A boolean, FALSE per default. If TRUE, the choice set alternatives is assumed to be ordered from worst to best.

Details

Note that J must be greater or equal 3 in the ordered probit model.

Value

A vector of labels for the model parameter d of length J - 2 if ordered = TRUE and NULL otherwise.

Examples

RprobitB:::create_labels_d(5, TRUE)

Create labels for `s`

Description

This function creates labels for the model parameter s.

Usage

create_labels_s(P_r, C)

Arguments

P_r

The number of covariates connected to a random coefficient (can be 0).

Value

A vector of labels for the model parameter s of length C if P_r > 0 and NULL otherwise.

Examples

RprobitB:::create_labels_s(1, 3)

Create lagged choice covariates

Description

This function creates lagged choice covariates from the data.frame choice_data, which is assumed to be sorted by the choice occasions.

Usage

create_lagged_cov(choice_data, column, k = 1, id = "id")

Arguments

choice_data

A data.frame of choice data in wide format, i.e. each row represents one choice occasion.

column

A character, the column name in choice_data, i.e. the covariate name. Can be a vector.

k

A positive number, the number of lags (in units of observations), see the details. Can be a vector. The default is k = 1.

id

A character, the name of the column in choice_data that contains unique identifier for each decision maker. The default is "id".

Details

Say that choice_data contains the column column. Then, the function call

create_lagged_cov(choice_data, column, k, id)

returns the input choice_data which includes a new column named column.k. This column contains for each decider (based on id) and each choice occasion the covariate faced before k choice occasions. If this data point is not available, it is set to NA. In particular, the first k values of column.k will be NA (initial condition problem).

Value

The input choice_data with the additional columns named column.k for each element column and each number k containing the lagged covariates.

Transform threshold increments to thresholds

Description

This helper function transforms the threshold increments d to the thresholds gamma.

Usage

d_to_gamma(d)

Arguments

d

A numeric vector of threshold increments.

Details

The threshold vector gamma is computed from the threshold increments d as c(-100,0,cumsum(exp(d)),100), where the bounds -100 and 100 exist for numerical reasons and the first threshold is fixed to 0 for identification.

Value

A numeric vector of the thresholds.

Examples

d_to_gamma(c(0,0,0))

Density of multivariate normal distribution

Description

This function computes the density of a multivariate normal distribution.

Usage

dmvnorm(x, mean, Sigma, log = FALSE)

Arguments

x

A quantile vector of length n.

mean

The mean vector of length n.

Sigma

The covariance matrix of dimension n x n.

log

A boolean, if TRUE the logarithm of the density value is returned.

Value

The density value.

Examples

x = c(0,0)
mean = c(0,0)
Sigma = diag(2)
dmvnorm(x = x, mean = mean, Sigma = Sigma)
dmvnorm(x = x, mean = mean, Sigma = Sigma, log = TRUE)

Sample from prior distributions

Description

This function returns a sample from each parameter's prior distribution.

Usage

draw_from_prior(prior, C = 1)

Arguments

prior

An object of class RprobitB_prior, which is the output of check_prior.

C

The number of latent classes.

Value

A list of draws for alpha, s, b, Omega, and Sigma (if specified for the model).

Examples

prior <- check_prior(P_f = 1, P_r = 2, J = 3)
RprobitB:::draw_from_prior(prior, C = 2)

Euclidean distance

Description

This function computes the euclidean distance between two vectors.

Usage

euc_dist(a, b)

Arguments

a

A numeric vector.

b

Another numeric vector of the same length as a.

Value

The euclidean distance.

Filter Gibbs samples

Description

This is a helper function that filters Gibbs samples.

Usage

filter_gibbs_samples(
  x,
  P_f,
  P_r,
  J,
  C,
  cov_sym,
  ordered = FALSE,
  keep_par = c("s", "alpha", "b", "Omega", "Sigma", "d"),
  drop_par = NULL
)

Arguments

x

An object of class RprobitB_gibbs_samples.

P_f

The number of covariates connected to a fixed coefficient (can be 0).

P_r

The number of covariates connected to a random coefficient (can be 0).

J

The number (greater or equal 2) of choice alternatives.

cov_sym

Set to TRUE for labels of symmetric covariance elements.

ordered

A boolean, FALSE per default. If TRUE, the choice set alternatives is assumed to be ordered from worst to best.

keep_par

A vector of parameter names which are kept.

drop_par

A vector of parameter names which get dropped.

Value

An object of class RprobitB_gibbs_samples filtered by the labels of parameter_labels.

Fit probit model to choice data

Description

This function performs Markov chain Monte Carlo simulation for fitting different types of probit models (binary, multivariate, mixed, latent class, ordered, ranked) to discrete choice data.

Usage

fit_model(
  data,
  scale = "Sigma_1,1 := 1",
  R = 1000,
  B = R/2,
  Q = 1,
  print_progress = getOption("RprobitB_progress"),
  prior = NULL,
  latent_classes = NULL,
  seed = NULL,
  fixed_parameter = list()
)

Arguments

data

An object of class RprobitB_data.

scale

R

The number of iterations of the Gibbs sampler.

B

The length of the burn-in period, i.e. a non-negative number of samples to be discarded.

Q

The thinning factor for the Gibbs samples, i.e. only every Qth sample is kept.

print_progress

A boolean, determining whether to print the Gibbs sampler progress and the estimated remaining computation time.

prior

A named list of parameters for the prior distributions. See the documentation of check_prior for details about which parameters can be specified.

latent_classes

Either NULL (for no latent classes) or a list of parameters specifying the number of latent classes and their updating scheme:

C: The fixed number (greater or equal 1) of latent classes, which is set to 1 per default. If either weight_update = TRUE or dp_update = TRUE (i.e. if classes are updated), C equals the initial number of latent classes.
weight_update: A boolean, set to TRUE to weight-based update the latent classes. See ... for details.
dp_update: A boolean, set to TRUE to update the latent classes based on a Dirichlet process. See ... for details.
Cmax: The maximum number of latent classes.
buffer: The number of iterations to wait before a next weight-based update of the latent classes.
epsmin: The threshold weight (between 0 and 1) for removing a latent class in the weight-based updating scheme.
epsmax: The threshold weight (between 0 and 1) for splitting a latent class in the weight-based updating scheme.
distmin: The (non-negative) threshold in class mean difference for joining two latent classes in the weight-based updating scheme.

seed

Set a seed for the Gibbs sampling.

fixed_parameter

Optionally specify a named list with fixed parameter values for alpha, C, s, b, Omega, Sigma, Sigma_full, beta, z, or d for the simulation. See the vignette on model definition for definitions of these variables.

Details

See the vignette on model fitting for more details.

Value

An object of class RprobitB_fit.

Examples

data <- simulate_choices(
  form = choice ~ var | 0, N = 100, T = 10, J = 3, seed = 1
)
model <- fit_model(data = data, R = 1000, seed = 1)
summary(model)

Extract covariates of choice occasion

Description

This convenience function returns the covariates and the choices of specific choice occasions.

Usage

get_cov(x, id, idc, idc_label)

Arguments

x

Either an object of class RprobitB_data or RprobitB_fit.

id

A numeric (vector), that specifies the decider(s).

idc

A numeric (vector), that specifies the choice occasion(s).

idc_label

The name of the column that contains the choice occasion identifier.

Value

A subset of the choice_data data frame specified in prepare_data().

Markov chain Monte Carlo simulation for the probit model

Description

This function draws from the posterior distribution of the probit model via Markov chain Monte Carlo simulation-

Usage

gibbs_sampling(
  sufficient_statistics,
  prior,
  latent_classes,
  fixed_parameter,
  init,
  R,
  B,
  print_progress,
  ordered,
  ranked
)

Arguments

sufficient_statistics

The output of sufficient_statistics.

prior

A named list of parameters for the prior distributions. See the documentation of check_prior for details about which parameters can be specified.

latent_classes

Either NULL (for no latent classes) or a list of parameters specifying the number of latent classes and their updating scheme:

C: The fixed number (greater or equal 1) of latent classes, which is set to 1 per default. If either weight_update = TRUE or dp_update = TRUE (i.e. if classes are updated), C equals the initial number of latent classes.
weight_update: A boolean, set to TRUE to weight-based update the latent classes. See ... for details.
dp_update: A boolean, set to TRUE to update the latent classes based on a Dirichlet process. See ... for details.
Cmax: The maximum number of latent classes.
buffer: The number of iterations to wait before a next weight-based update of the latent classes.
epsmin: The threshold weight (between 0 and 1) for removing a latent class in the weight-based updating scheme.
epsmax: The threshold weight (between 0 and 1) for splitting a latent class in the weight-based updating scheme.
distmin: The (non-negative) threshold in class mean difference for joining two latent classes in the weight-based updating scheme.

fixed_parameter

init

The output of set_initial_gibbs_values.

R

The number of iterations of the Gibbs sampler.

B

The length of the burn-in period, i.e. a non-negative number of samples to be discarded.

print_progress

A boolean, determining whether to print the Gibbs sampler progress and the estimated remaining computation time.

ordered

A boolean, FALSE per default. If TRUE, the choice set alternatives is assumed to be ordered from worst to best.

ranked

TBA

Details

This function is not supposed to be called directly, but rather via fit_model.

Value

A list of Gibbs samples for

Sigma,
alpha (if P_f>0),
s, z, b, Omega (if P_r>0),
d (if ordered = TRUE),

and a vector class_sequence of length R, where the rth entry is the number of latent classes after iteration r.

Log-likelihood in the ordered probit model

Description

This function computes the log-likelihood value given the threshold increments d.

Usage

ll_ordered(d, y, mu, Tvec)

Arguments

d

A numeric vector of threshold increments.

y

A matrix of the choices.

mu

A matrix of the systematic utilities.

Tvec

The element Tvec in sufficient_statistics.

Value

The log-likelihood value.

Examples

ll_ordered(c(0,0,0), matrix(1), matrix(1), 1)

Handle missing covariates

Description

This function checks for and replaces missing covariate entries in choice_data.

Usage

missing_covariates(
  choice_data,
  impute = "complete_cases",
  col_ignore = character()
)

Arguments

choice_data

A data.frame of choice data in wide format, i.e. each row represents one choice occasion.

impute

A character that specifies how to handle missing covariate entries in choice_data, one of:

"complete_cases", removes all rows containing missing covariate entries (the default),
"zero", replaces missing covariate entries by zero (only for numeric columns),
"mean", imputes missing covariate entries by the mean (only for numeric columns).

col_ignore

A character vector of columns that are ignored (none per default).

Value

The input choice_data, in which missing covariates are addressed.

Approximate marginal model likelihood

Description

This function approximates the model's marginal likelihood.

Usage

mml(x, S = 0, ncores = parallel::detectCores() - 1, recompute = FALSE)

Arguments

x

An object of class RprobitB_fit.

S

The number of prior samples for the prior arithmetic mean estimate. Per default, S = 0. In this case, only the posterior samples are used for the approximation via the posterior harmonic mean estimator, see the details section.

ncores

Computation of the prior arithmetic mean estimate is parallelized, set the number of cores.

recompute

Set to TRUE to recompute the likelihood.

Details

The model's marginal likelihood p(y\mid M) for a model M and data y is required for the computation of Bayes factors. In general, the term has no closed form and must be approximated numerically.

This function uses the posterior Gibbs samples to approximate the model's marginal likelihood via the posterior harmonic mean estimator. To check the convergence, call plot(x$mml), where x is the output of this function. If the estimation does not seem to have converged, you can improve the approximation by combining the value with the prior arithmetic mean estimator. The final approximation of the model's marginal likelihood than is a weighted sum of the posterior harmonic mean estimate and the prior arithmetic mean estimate, where the weights are determined by the sample sizes.

Value

The object x, including the object mml, which is the model's approximated marginal likelihood value.

Compare fitted models

Description

This function returns a table with several criteria for model comparison.

Usage

model_selection(
  ...,
  criteria = c("npar", "LL", "AIC", "BIC"),
  add_form = FALSE
)

Arguments

...

One or more objects of class RprobitB_fit.

criteria

A vector of one or more of the following characters:

"npar" for the number of model parameters (see npar),
"LL" for the log-likelihood value (see logLik),
"AIC" for the AIC value (see AIC),
"BIC" for the BIC value (see BIC),
"WAIC" for the WAIC value (also shows its standard error sd(WAIC) and the number pWAIC of effective model parameters, see WAIC),
"MMLL" for the marginal model log-likelihood,
"BF" for the Bayes factor,
"pred_acc" for the prediction accuracy (see pred_acc).

add_form

Set to TRUE to add the model formulas.

Details

See the vignette on model selection for more details.

Value

A data frame, criteria in columns, models in rows.

Extract number of model parameters

Description

This function extracts the number of model parameters of an RprobitB_fit object.

Usage

npar(object, ...)

## S3 method for class 'RprobitB_fit'
npar(object, ...)

Arguments

object

An object of class RprobitB_fit.

...

Optionally more objects of class RprobitB_fit.

Value

Either a numeric value (if just one object is provided) or a numeric vector.

Print effect overview

Description

This function gives an overview of the effect names, whether the covariate is alternative-specific, whether the coefficient is alternative-specific, and whether it is a random effect.

Usage

overview_effects(
  form,
  re = NULL,
  alternatives,
  base = tail(alternatives, 1),
  ordered = FALSE
)

Arguments

form

A formula object that is used to specify the model equation. The structure is choice ~ A | B | C, where

choice is the name of the dependent variable (the choices),
A are names of alternative and choice situation specific covariates with a coefficient that is constant across alternatives,
B are names of choice situation specific covariates with alternative specific coefficients,
and C are names of alternative and choice situation specific covariates with alternative specific coefficients.

Multiple covariates (of one type) are separated by a + sign. By default, alternative specific constants (ASCs) are added to the model. They can be removed by adding +0 in the second spot.

In the ordered probit model (ordered = TRUE), the formula object has the simple structure choice ~ A. ASCs are not estimated.

re

A character (vector) of covariates of form with random effects. If re = NULL (the default), there are no random effects. To have random effects for the ASCs, include "ASC" in re.

alternatives

base

ordered

A boolean, FALSE per default. If TRUE, the choice set alternatives is assumed to be ordered from worst to best.

Value

A data frame, each row is a effect, columns are the effect name "effect", and booleans whether the covariate is alternative-specific "as_value", whether the coefficient is alternative-specific "as_coef", and whether it is a random effect "random".

Examples

overview_effects(
  form = choice ~ price + time + comfort + change | 1,
  re = c("price", "time"),
  alternatives = c("A", "B"),
  base = "A"
)

Create parameters labels

Description

This function creates model parameter labels.

Usage

parameter_labels(
  P_f,
  P_r,
  J,
  C,
  cov_sym,
  ordered = FALSE,
  keep_par = c("s", "alpha", "b", "Omega", "Sigma", "d"),
  drop_par = NULL
)

Arguments

P_f

The number of covariates connected to a fixed coefficient (can be 0).

P_r

The number of covariates connected to a random coefficient (can be 0).

J

The number (greater or equal 2) of choice alternatives.

cov_sym

Set to TRUE for labels of symmetric covariance elements.

ordered

A boolean, FALSE per default. If TRUE, the choice set alternatives is assumed to be ordered from worst to best.

keep_par

A vector of parameter names which are kept.

drop_par

A vector of parameter names which get dropped.

Value

A list of labels for the selected model parameters.

Visualize choice data

Description

This function is the plot method for an object of class RprobitB_data.

Usage

## S3 method for class 'RprobitB_data'
plot(x, by_choice = FALSE, alpha = 1, position = "dodge", ...)

Arguments

x

An object of class RprobitB_data.

by_choice

Set to TRUE to group the covariates by the chosen alternatives.

alpha, position

Passed to ggplot.

...

Ignored.

Value

No return value. Draws a plot to the current device.

Examples

data <- simulate_choices(
  form = choice ~ cost | 0,
  N = 100,
  T = 10,
  J = 2,
  alternatives = c("bus", "car"),
  true_parameter = list("alpha" = -1)
)
plot(data, by_choice = TRUE)

Visualize fitted probit model

Description

This function is the plot method for an object of class RprobitB_fit.

Usage

## S3 method for class 'RprobitB_fit'
plot(x, type, ignore = NULL, ...)

Arguments

x

An object of class RprobitB_fit.

type

The type of plot, which can be one of:

"mixture" to visualize the mixing distribution,
"acf" for autocorrelation plots of the Gibbs samples,
"trace" for trace plots of the Gibbs samples,
"class_seq" to visualize the sequence of class numbers.

See the details section for visualization options.

ignore

A character (vector) of covariate or parameter names that do not get visualized.

...

Ignored.

Value

No return value. Draws a plot to the current device.

Autocorrelation plot of Gibbs samples

Description

This function plots the autocorrelation of the Gibbs samples. The plots include the total Gibbs sample size TSS and the effective sample size ESS, see the details.

Usage

plot_acf(gibbs_samples, par_labels)

Arguments

gibbs_samples

A matrix of Gibbs samples.

par_labels

A character vector with labels for the Gibbs samples, of length equal to the number of columns of gibbs_samples.

Details

The effective sample size is the value

TSS / \sqrt{1 + 2\sum_{k\geq 1} \rho_k}

, where \rho_k is the auto correlation between the chain offset by k positions. The auto correlations are estimated via spec.ar.

Value

No return value. Draws a plot to the current device.

Plot class allocation (for `P_r = 2` only)

Description

This function plots the allocation of decision-maker specific coefficient vectors beta given the allocation vector z, the class means b, and the class covariance matrices Omega.

Usage

plot_class_allocation(beta, z, b, Omega, ...)

Arguments

beta

The matrix of the decision-maker specific coefficient vectors of dimension P_r x N. Set to NA if P_r = 0.

z

The vector of the allocation variables of length N. Set to NA if P_r = 0.

b

The matrix of class means as columns of dimension P_r x C. Set to NA if P_r = 0.

Omega

The matrix of class covariance matrices as columns of dimension P_r*P_r x C. Set to NA if P_r = 0.

...

Optional visualization parameters:

colors, a character vector of color specifications,
perc, a numeric between 0 and 1 to draw the perc percentile ellipsoids for the underlying Gaussian distributions (perc = 0.95 per default),
r, the current iteration number of the Gibbs sampler to be displayed in the legend,
sleep, the number of seconds to pause after plotting.

Details

Only applicable in the two-dimensional case, i.e. only if P_r = 2.

Value

No return value. Draws a plot to the current device.

Visualizing the number of classes during Gibbs sampling

Description

This function plots the number of latent Glasses during Gibbs sampling to visualize the class updating.

Usage

plot_class_seq(class_sequence, B)

Arguments

class_sequence

The sequence of class numbers during Gibbs sampling of length R.

B

The length of the burn-in period, i.e. a non-negative number of samples to be discarded.

Value

No return value. Draws a plot to the current device.

Plot bivariate contour of mixing distributions

Description

This function plots an estimated bivariate contour mixing distributions.

Usage

plot_mixture_contour(means, covs, weights, names)

Arguments

means

The class means.

covs

The class covariances.

weights

The class weights.

names

The covariate names.

Value

An object of class ggplot.

Plot marginal mixing distributions

Description

This function plots an estimated marginal mixing distributions.

Usage

plot_mixture_marginal(mean, cov, weights, name)

Arguments

mean

The class means.

cov

The class covariances.

weights

The class weights.

name

The covariate name.

Value

An object of class ggplot.

Plot ROC curve

Description

This function draws receiver operating characteristic (ROC) curves.

Usage

plot_roc(..., reference = NULL)

Arguments

...

One or more RprobitB_fit objects or data.frames of choice probability.

reference

The reference alternative.

Value

No return value. Draws a plot to the current device.

Visualizing the trace of Gibbs samples.

Description

This function plots traces of the Gibbs samples.

Usage

plot_trace(gibbs_samples, par_labels)

Arguments

gibbs_samples

A matrix of Gibbs samples.

par_labels

A character vector of length equal to the number of columns of gibbs_samples, containing labels for the Gibbs samples.

Value

No return value. Draws a plot to the current device.

Compute point estimates

Description

This function computes the point estimates of an RprobitB_fit. Per default, the mean of the Gibbs samples is used as a point estimate. However, any statistic that computes a single numeric value out of a vector of Gibbs samples can be specified for FUN.

Usage

point_estimates(x, FUN = mean)

Arguments

x

An object of class RprobitB_fit.

FUN

A function that computes a single numeric value out of a vector of numeric values.

Value

An object of class RprobitB_parameter.

Examples

data <- simulate_choices(form = choice ~ covariate, N = 10, T = 10, J = 2)
model <- fit_model(data)
point_estimates(model)
point_estimates(model, FUN = median)

Parameter sets from posterior samples

Description

This function builds parameter sets from the normalized, burned and thinned posterior samples.

Usage

posterior_pars(x)

Arguments

x

An object of class RprobitB_fit.

Value

A list of RprobitB_parameter objects.

Compute prediction accuracy

Description

This function computes the prediction accuracy of an RprobitB_fit object. Prediction accuracy means the share of choices that are correctly predicted by the model, where prediction is based on the maximum choice probability.

Usage

pred_acc(x, ...)

Arguments

x

An object of class RprobitB_fit.

...

Optionally specify more RprobitB_fit objects.

Value

A numeric.

Predict choices

Description

This function predicts the discrete choice behavior

Usage

## S3 method for class 'RprobitB_fit'
predict(object, data = NULL, overview = TRUE, digits = 2, ...)

Arguments

object

An object of class RprobitB_fit.

data

Either

NULL, using the data in object,
an object of class RprobitB_data, for example the test part generated by train_test,
or a data frame of custom choice characteristics. It must have the same structure as choice_data used in prepare_data. Missing columns or NA values are set to 0.

overview

If TRUE, returns a confusion matrix.

digits

The number of digits of the returned choice probabilities. digits = 2 per default.

...

Ignored.

Details

Predictions are made based on the maximum predicted probability for each choice alternative. See the vignette on choice prediction for a demonstration on how to visualize the model's sensitivity and specificity by means of a receiver operating characteristic (ROC) curve.

Value

Either a table if overview = TRUE or a data frame otherwise.

Examples

data <- simulate_choices(
  form = choice ~ cov, N = 10, T = 10, J = 2, seed = 1
)
data <- train_test(data, test_proportion = 0.5)
model <- fit_model(data$train)

predict(model)
predict(model, overview = FALSE)
predict(model, data = data$test)
predict(
  model,
  data = data.frame("cov_A" = c(1, 1, NA, NA), "cov_B" = c(1, NA, 1, NA)),
  overview = FALSE
)

Check for flip in preferences after change in model scale.

Description

This function checks if a change in the model scale flipped the preferences.

Usage

preference_flip(model_old, model_new)

Arguments

model_old

An object of class RprobitB_fit, the model before the scale change.

model_new

An object of class RprobitB_fit, the model after the scale change.

Value

No return value, called for side-effects.

Prepare choice data for estimation

Description

This function prepares choice data for estimation.

Usage

prepare_data(
  form,
  choice_data,
  re = NULL,
  alternatives = NULL,
  ordered = FALSE,
  ranked = FALSE,
  base = NULL,
  id = "id",
  idc = NULL,
  standardize = NULL,
  impute = "complete_cases"
)

Arguments

form

A formula object that is used to specify the model equation. The structure is choice ~ A | B | C, where

choice is the name of the dependent variable (the choices),
A are names of alternative and choice situation specific covariates with a coefficient that is constant across alternatives,
B are names of choice situation specific covariates with alternative specific coefficients,
and C are names of alternative and choice situation specific covariates with alternative specific coefficients.

Multiple covariates (of one type) are separated by a + sign. By default, alternative specific constants (ASCs) are added to the model. They can be removed by adding +0 in the second spot.

In the ordered probit model (ordered = TRUE), the formula object has the simple structure choice ~ A. ASCs are not estimated.

choice_data

A data.frame of choice data in wide format, i.e. each row represents one choice occasion.

re

A character (vector) of covariates of form with random effects. If re = NULL (the default), there are no random effects. To have random effects for the ASCs, include "ASC" in re.

alternatives

ordered

A boolean, FALSE per default. If TRUE, the choice set alternatives is assumed to be ordered from worst to best.

ranked

TBA

base

id

A character, the name of the column in choice_data that contains unique identifier for each decision maker. The default is "id".

idc

A character, the name of the column in choice_data that contains unique identifier for each choice situation of each decision maker. Per default, these identifier are generated by the order of appearance.

standardize

impute

A character that specifies how to handle missing covariate entries in choice_data, one of:

"complete_cases", removes all rows containing missing covariate entries (the default),
"zero", replaces missing covariate entries by zero (only for numeric columns),
"mean", imputes missing covariate entries by the mean (only for numeric columns).

Details

Requirements for the data.frame choice_data:

It must contain a column named id which contains unique identifier for each decision maker.
It can contain a column named idc which contains unique identifier for each choice situation of each decision maker. If this information is missing, these identifier are generated automatically by the appearance of the choices in the data set.
It can contain a column named choice with the observed choices, where choice must match the name of the dependent variable in form. Such a column is required for model fitting but not for prediction.
It must contain a numeric column named p_j for each alternative specific covariate p in form and each choice alternative j in alternatives.
It must contain a numeric column named q for each covariate q in form that is constant across alternatives.

In the ordered case (ordered = TRUE), the column choice must contain the full ranking of the alternatives in each choice occasion as a character, where the alternatives are separated by commas, see the examples.

See the vignette on choice data for more details.

Value

An object of class RprobitB_data.

Examples

data <- prepare_data(
  form = choice ~ price + time + comfort + change | 0,
  choice_data = train_choice,
  re = c("price", "time"),
  id = "deciderID",
  idc = "occasionID",
  standardize = c("price", "time")
)

### ranked case
choice_data <- data.frame(
  "id" = 1:3, "choice" = c("A,B,C", "A,C,B", "B,C,A"), "cov" = 1
)
data <- prepare_data(
  form = choice ~ 0 | cov + 0,
  choice_data = choice_data,
  ranked = TRUE
)

Draw from Dirichlet distribution

Description

Function to draw from a Dirichlet distribution.

Usage

rdirichlet(delta)

Arguments

delta

A vector, the concentration parameter.

Value

A vector, the sample from the Dirichlet distribution of the same length as delta.

Examples

rdirichlet(delta = 1:3)

Draw from multivariate normal distribution

Description

This function draws from a multivariate normal distribution.

Usage

rmvnorm(mu, Sigma)

Arguments

mu

The mean vector of length n.

Sigma

The covariance matrix of dimension n x n.

Details

The function builds upon the following fact: If \epsilon = (\epsilon_1,\dots,\epsilon_n), where each \epsilon_i is drawn independently from a standard normal distribution, then \mu+L\epsilon is a draw from the multivariate normal distribution N(\mu,\Sigma), where L is the lower triangular factor of the Choleski decomposition of \Sigma.

Value

A numeric vector of length n.

Examples

mu <- c(0,0)
Sigma <- diag(2)
rmvnorm(mu = mu, Sigma = Sigma)

Draw from one-sided truncated normal

Description

This function draws from a one-sided truncated univariate normal distribution.

Usage

rtnorm(mu, sig, trunpt, above)

Arguments

mu

The mean.

sig

The standard deviation.

trunpt

The truncation point.

above

A boolean, if TRUE truncate from above, otherwise from below.

Value

A numeric value.

Examples

### samples from a standard normal truncated at 1 from above
R <- 1e4
draws <- replicate(R, rtnorm(0,1,1,TRUE))
plot(density(draws))

Draw from two-sided truncated normal

Description

This function draws from a two-sided truncated univariate normal distribution.

Usage

rttnorm(mu, sig, lower_bound, upper_bound)

Arguments

mu

The mean.

sig

The standard deviation.

lower_bound

The lower truncation point.

upper_bound

The upper truncation point.

Value

A numeric value.

Examples

### samples from a standard normal truncated at -2 and 2
R <- 1e4
draws <- replicate(R, rttnorm(0,1,-2,2))
plot(density(draws))

Draw from Wishart distribution

Description

This function draws from a Wishart and inverted Wishart distribution.

Usage

rwishart(nu, V)

Arguments

nu

A numeric, the degrees of freedom. Must be at least the number of dimensions.

V

A matrix, the scale matrix.

Details

The Wishart distribution is a generalization to multiple dimensions of the gamma distributions and draws from the space of covariance matrices. Its expectation is nu*V and its variance increases both in nu and in the values of V. The Wishart distribution is the conjugate prior to the precision matrix of a multivariate normal distribution and proper if nu is greater than the number of dimensions.

Value

A list, the draws from the Wishart (W), inverted Wishart (IW), and corresponding Choleski decomposition (C and CI).

Examples

rwishart(nu = 2, V = diag(2))

Set initial values for the Gibbs sampler

Description

This function sets initial values for the Gibbs sampler.

Usage

set_initial_gibbs_values(
  N,
  T,
  J,
  P_f,
  P_r,
  C,
  ordered = FALSE,
  ranked = FALSE,
  suff_stat = NULL
)

Arguments

N

The number (greater or equal 1) of decision makers.

T

The number (greater or equal 1) of choice occasions or a vector of choice occasions of length N (i.e. a decision maker specific number). Per default, T = 1.

J

The number (greater or equal 2) of choice alternatives.

P_f

The number of covariates connected to a fixed coefficient (can be 0).

P_r

The number of covariates connected to a random coefficient (can be 0).

C

The number (greater or equal 1) of latent classes.

ordered

A boolean, FALSE per default. If TRUE, the choice set alternatives is assumed to be ordered from worst to best.

ranked

TBA

suff_stat

Optionally the output of sufficient_statistics.

Value

A list of initial values for the Gibbs sampler.

Examples

RprobitB:::set_initial_gibbs_values(
  N = 2, T = 3, J = 3, P_f = 1, P_r = 2, C = 2
)

Simulate choice data

Description

This function simulates choice data from a probit model.

Usage

simulate_choices(
  form,
  N,
  T = 1,
  J,
  re = NULL,
  alternatives = NULL,
  ordered = FALSE,
  ranked = FALSE,
  base = NULL,
  covariates = NULL,
  seed = NULL,
  true_parameter = list()
)

Arguments

form

A formula object that is used to specify the model equation. The structure is choice ~ A | B | C, where

choice is the name of the dependent variable (the choices),
A are names of alternative and choice situation specific covariates with a coefficient that is constant across alternatives,
B are names of choice situation specific covariates with alternative specific coefficients,
and C are names of alternative and choice situation specific covariates with alternative specific coefficients.

Multiple covariates (of one type) are separated by a + sign. By default, alternative specific constants (ASCs) are added to the model. They can be removed by adding +0 in the second spot.

In the ordered probit model (ordered = TRUE), the formula object has the simple structure choice ~ A. ASCs are not estimated.

N

The number (greater or equal 1) of decision makers.

T

The number (greater or equal 1) of choice occasions or a vector of choice occasions of length N (i.e. a decision maker specific number). Per default, T = 1.

J

The number (greater or equal 2) of choice alternatives.

re

A character (vector) of covariates of form with random effects. If re = NULL (the default), there are no random effects. To have random effects for the ASCs, include "ASC" in re.

alternatives

ordered

A boolean, FALSE per default. If TRUE, the choice set alternatives is assumed to be ordered from worst to best.

ranked

TBA

base

covariates

A named list of covariate values. Each element must be a vector of length equal to the number of choice occasions and named according to a covariate. Covariates for which no values are supplied are drawn from a standard normal distribution.

seed

Set a seed for the simulation.

true_parameter

Optionally specify a named list with true parameter values for alpha, C, s, b, Omega, Sigma, Sigma_full, beta, z, or d for the simulation. See the vignette on model definition for definitions of these variables.

Details

See the vignette on choice data for more details.

Value

An object of class RprobitB_data.

Examples

### simulate data from a binary probit model with two latent classes
data <- simulate_choices(
  form = choice ~ cost | income | time,
  N = 100,
  T = 10,
  J = 2,
  re = c("cost", "time"),
  alternatives = c("car", "bus"),
  seed = 1,
  true_parameter = list(
    "alpha" = c(-1, 1),
    "b" = matrix(c(-1, -1, -0.5, -1.5, 0, -1), ncol = 2),
    "C" = 2
  )
)

### simulate data from an ordered probit model
data <- simulate_choices(
  form = opinion ~ age + gender,
  N = 10,
  T = 1:10,
  J = 5,
  alternatives = c("very bad", "bad", "indifferent", "good", "very good"),
  ordered = TRUE,
  covariates = list(
    "gender" = rep(sample(c(0, 1), 10, replace = TRUE), times = 1:10)
  ),
  seed = 1
)

### simulate data from a ranked probit model
data <- simulate_choices(
  form = product ~ price,
  N = 10,
  T = 1:10,
  J = 3,
  alternatives = c("A", "B", "C"),
  ranked = TRUE,
  seed = 1
)

Compute sufficient statistics

Description

This function computes sufficient statistics from an RprobitB_data object for the Gibbs sampler to save computation time.

Usage

sufficient_statistics(data, normalization)

Arguments

data

An object of class RprobitB_data.

normalization

An object of class RprobitB_normalization, which can be created via RprobitB_normalization.

Value

A list of sufficient statistics on the data for Gibbs sampling, containing

the elements N, T, J, P_f and P_r from data,
Tvec, the vector of choice occasions for each decider of length N,
csTvec, a vector of length N with the cumulated sums of Tvec starting from 0,
W, a list of design matrices differenced with respect to alternative number normalization$level$level for each decider in each choice occasion with covariates that are linked to a fixed coefficient (or NA if P_f = 0),
X, a list of design matrices differenced with respect to alternative number normalization$level$level for each decider in each choice occasion with covariates that are linked to a random coefficient (or NA if P_r = 0),
y, a matrix of dimension N x max(Tvec) with the observed choices of deciders in rows and choice occasions in columns, decoded to numeric values with respect to their appearance in data$alternatives, where rows are filled with NA in case of an unbalanced panel,
WkW, a matrix of dimension P_f^2 x (J-1)^2, the sum over Kronecker products of each transposed element in W with itself,
XkX, a list of length N, each element is constructed in the same way as WkW but with the elements in X and separately for each decider,
rdiff (for the ranked case only), a list of matrices that reverse the base differencing and instead difference in such a way that the resulting utility vector is negative.

Stated Preferences for Train Traveling

Description

Data set of 2929 stated choices by 235 Dutch individuals deciding between two virtual train trip options "A" and "B" based on the price, the travel time, the number of rail-to-rail transfers (changes), and the level of comfort.

The data were obtained in 1987 by Hague Consulting Group for the National Dutch Railways. Prices were recorded in Dutch guilder and in this data set transformed to Euro at an exchange rate of 2.20371 guilders = 1 Euro.

Usage

train_choice

Format

A data.frame with 2929 rows and 11 columns:

deciderID: integer identifier for the decider
occasionID: integer identifier for the choice occasion
choice: character for the chosen alternative (either "A" or "B")
price_A: numeric price for alternative "A" in Euro
time_A: numeric travel time for alternative "A" in hours
change_A: integer number of changes for alternative "A"
comfort_A: integer comfort level (in decreasing comfort order) for alternative "A"
price_B: numeric price for alternative "B" in Euro
time_B: numeric travel time for alternative "B" in hours
change_B: integer number of changes for alternative "B"
comfort_B: integer comfort level (in decreasing comfort order) for alternative "B"

References

Ben-Akiva M, Bolduc D, Bradley M (1993). “Estimation of travel choice models with randomly distributed values of time.” Transportation Research Record, 1413, 88–97.

Meijer E, Rouwendal J (2006). “Measuring welfare effects in models with random coefficients.” Journal of Applied Econometrics, 21(2), 227–244.

Split choice data in train and test subset

Description

This function splits choice data into a train and a test part.

Usage

train_test(
  x,
  test_proportion = NULL,
  test_number = NULL,
  by = "N",
  random = FALSE,
  seed = NULL
)

Arguments

x

An object of class RprobitB_data.

test_proportion

A number between 0 and 1, the proportion of the test subsample.

test_number

A positive integer, the number of observations in the test subsample.

by

One of "N" (split by deciders) and "T" (split by choice occasions).

random

If TRUE, the subsamples are build randomly.

seed

Set a seed for building the subsamples randomly.

Details

See the vignette on choice data for more details.

Value

A list with two objects of class RprobitB_data, named "train" and "test".

Examples

### simulate choices for demonstration
x <- simulate_choices(form = choice ~ covariate, N = 10, T = 10, J = 2)

### 70% of deciders in the train subsample,
### 30% of deciders in the test subsample
train_test(x, test_proportion = 0.3, by = "N")

### 2 randomly chosen choice occasions per decider in the test subsample,
### the rest in the train subsample
train_test(x, test_number = 2, by = "T", random = TRUE, seed = 1)

Transform fitted probit model

Description

Given an object of class RprobitB_fit, this function can:

change the length B of the burn-in period,
change the the thinning factor Q of the Gibbs samples,
change the utility scale.

Usage

## S3 method for class 'RprobitB_fit'
transform(
  `_data`,
  B = NULL,
  Q = NULL,
  scale = NULL,
  check_preference_flip = TRUE,
  ...
)

Arguments

_data

An object of class RprobitB_fit.

B

The length of the burn-in period, i.e. a non-negative number of samples to be discarded.

Q

The thinning factor for the Gibbs samples, i.e. only every Qth sample is kept.

scale

check_preference_flip

Set to TRUE to check for flip in preferences after new scale.

...

Ignored.

Details

See the vignette "Model fitting" for more details: vignette("v03_model_fitting", package = "RprobitB").

Value

The transformed RprobitB_fit object.

Transformation of Gibbs samples

Description

This function normalizes, burns and thins the Gibbs samples.

Usage

transform_gibbs_samples(gibbs_samples, R, B, Q, normalization)

Arguments

gibbs_samples

The output of gibbs_sampling, i.e. a list of Gibbs samples for

Sigma,
alpha (if P_f>0),
s, z, b, Omega (if P_r>0).

R

The number of iterations of the Gibbs sampler.

B

The length of the burn-in period, i.e. a non-negative number of samples to be discarded.

Q

The thinning factor for the Gibbs samples, i.e. only every Qth sample is kept.

normalization

An object of class RprobitB_normalization, which can be created via RprobitB_normalization.

Value

A list, the first element gibbs_sampes_raw is the input gibbs_samples, the second element is the normalized, burned, and thinned version of gibbs_samples called gibbs_samples_nbt. The list gets the class RprobitB_gibbs_samples.

Transformation of parameter values

Description

This function transforms parameter values based on normalization.

Usage

transform_parameter(parameter, normalization, ordered = FALSE)

Arguments

parameter

An object of class RprobitB_parameter.

normalization

An object of class RprobitB_normalization.

ordered

A boolean, FALSE per default. If TRUE, the choice set alternatives is assumed to be ordered from worst to best.

Value

An object of class RprobitB_parameter.

Transform differenced to non-differenced error term covariance matrix

Description

This function transforms the differenced error term covariance matrix Sigma back to a non-differenced error term covariance matrix.

Usage

undiff_Sigma(Sigma, i, checks = TRUE, pos = TRUE, labels = TRUE)

Arguments

Sigma

An error term covariance matrix of dimension J-1 x J-1 which was differenced with respect to alternative i.

i

An integer, the alternative number with respect to which Sigma was differenced.

checks

If TRUE the function runs additional input and transformation checks.

pos

If TRUE the function returns a positive matrix.

labels

If TRUE the function adds labels to the output matrix.

Value

A covariance matrix of dimension J x J. If this covariance matrix gets differenced with respect to alternative i, the results is again Sigma.

Update and re-fit probit model

Description

This function estimates a nested probit model based on a given RprobitB_fit object.

Usage

## S3 method for class 'RprobitB_fit'
update(
  object,
  form,
  re,
  alternatives,
  id,
  idc,
  standardize,
  impute,
  scale,
  R,
  B,
  Q,
  print_progress,
  prior,
  latent_classes,
  seed,
  ...
)

Arguments

object

An object of class RprobitB_fit.

form

A formula object that is used to specify the model equation. The structure is choice ~ A | B | C, where

choice is the name of the dependent variable (the choices),
A are names of alternative and choice situation specific covariates with a coefficient that is constant across alternatives,
B are names of choice situation specific covariates with alternative specific coefficients,
and C are names of alternative and choice situation specific covariates with alternative specific coefficients.

Multiple covariates (of one type) are separated by a + sign. By default, alternative specific constants (ASCs) are added to the model. They can be removed by adding +0 in the second spot.

In the ordered probit model (ordered = TRUE), the formula object has the simple structure choice ~ A. ASCs are not estimated.

re

A character (vector) of covariates of form with random effects. If re = NULL (the default), there are no random effects. To have random effects for the ASCs, include "ASC" in re.

alternatives

id

A character, the name of the column in choice_data that contains unique identifier for each decision maker. The default is "id".

idc

standardize

impute

A character that specifies how to handle missing covariate entries in choice_data, one of:

"complete_cases", removes all rows containing missing covariate entries (the default),
"zero", replaces missing covariate entries by zero (only for numeric columns),
"mean", imputes missing covariate entries by the mean (only for numeric columns).

scale

R

The number of iterations of the Gibbs sampler.

B

The length of the burn-in period, i.e. a non-negative number of samples to be discarded.

Q

The thinning factor for the Gibbs samples, i.e. only every Qth sample is kept.

print_progress

A boolean, determining whether to print the Gibbs sampler progress and the estimated remaining computation time.

prior

A named list of parameters for the prior distributions. See the documentation of check_prior for details about which parameters can be specified.

latent_classes

Either NULL (for no latent classes) or a list of parameters specifying the number of latent classes and their updating scheme:

C: The fixed number (greater or equal 1) of latent classes, which is set to 1 per default. If either weight_update = TRUE or dp_update = TRUE (i.e. if classes are updated), C equals the initial number of latent classes.
weight_update: A boolean, set to TRUE to weight-based update the latent classes. See ... for details.
dp_update: A boolean, set to TRUE to update the latent classes based on a Dirichlet process. See ... for details.
Cmax: The maximum number of latent classes.
buffer: The number of iterations to wait before a next weight-based update of the latent classes.
epsmin: The threshold weight (between 0 and 1) for removing a latent class in the weight-based updating scheme.
epsmax: The threshold weight (between 0 and 1) for splitting a latent class in the weight-based updating scheme.
distmin: The (non-negative) threshold in class mean difference for joining two latent classes in the weight-based updating scheme.

seed

Set a seed for the Gibbs sampling.

...

Ignored.

Details

All parameters (except for object) are optional and if not specified retrieved from the specification for object.

Value

An object of class RprobitB_fit.

Update class covariances

Description

This function updates the class covariances (independent from the other classes).

Usage

update_Omega(beta, b, z, m, nu, Theta)

Arguments

beta

The matrix of the decision-maker specific coefficient vectors of dimension P_r x N. Set to NA if P_r = 0.

b

The matrix of class means as columns of dimension P_r x C. Set to NA if P_r = 0.

z

The vector of the allocation variables of length N. Set to NA if P_r = 0.

m

The vector of class sizes of length C.

nu

The degrees of freedom (a natural number greater than P_r) of the Inverse Wishart prior for each Omega_c. Per default, nu = P_r + 2.

Theta

The scale matrix of dimension P_r x P_r of the Inverse Wishart prior for each Omega_c. Per default, Theta = diag(P_r).

Details

The following holds independently for each class c. Let \Omega_c be the covariance matrix of class number c. A priori, we assume that \Omega_c is inverse Wishart distributed with \nu degrees of freedom and scale matrix \Theta. Let (\beta_n)_{z_n=c} be the collection of \beta_n that are currently allocated to class c, m_c the size of class c, and b_c the class mean vector. Due to the conjugacy of the prior, the posterior \Pr(\Omega_c \mid (\beta_n)_{z_n=c}) follows an inverted Wishart distribution with \nu + m_c degrees of freedom and scale matrix \Theta^{-1} + \sum_n (\beta_n - b_c)(\beta_n - b_c)', where the product is over the values n for which z_n=c holds.

Value

A matrix of updated covariance matrices for each class in columns.

Examples

### N = 100 decider, P_r = 2 random coefficients, and C = 2 latent classes
N <- 100
b <- cbind(c(0,0),c(1,1))
(Omega_true <- matrix(c(1,0.3,0.3,0.5,1,-0.3,-0.3,0.8), ncol=2))
z <- c(rep(1,N/2),rep(2,N/2))
m <- as.numeric(table(z))
beta <- sapply(z, function(z) rmvnorm(b[,z], matrix(Omega_true[,z],2,2)))
### degrees of freedom and scale matrix for the Wishart prior
nu <- 1
Theta <- diag(2)
### updated class covariance matrices (in columns)
update_Omega(beta = beta, b = b, z = z, m = m, nu = nu, Theta = Theta)

Update error term covariance matrix of multiple linear regression

Description

This function updates the error term covariance matrix of a multiple linear regression.

Usage

update_Sigma(kappa, E, N, S)

Arguments

kappa

The degrees of freedom (a natural number greater than J-1) of the Inverse Wishart prior for Sigma. Per default, kappa = J + 1.

E

The scale matrix of dimension J-1 x J-1 of the Inverse Wishart prior for Sigma. Per default, E = diag(J - 1).

N

The draw size.

S

A matrix, the sum over the outer products of the residuals (\epsilon_n)_{n=1,\dots,N}.

Details

This function draws from the posterior distribution of the covariance matrix \Sigma in the linear utility equation

U_n = X_n\beta + \epsilon_n,

where U_n is the (latent, but here assumed to be known) utility vector of decider n = 1,\dots,N, X_n is the design matrix build from the choice characteristics faced by n, \beta is the coefficient vector, and \epsilon_n is the error term assumed to be normally distributed with mean 0 and unknown covariance matrix \Sigma. A priori we assume the (conjugate) Inverse Wishart distribution

\Sigma \sim W(\kappa,E)

with \kappa degrees of freedom and scale matrix E. The posterior for \Sigma is the Inverted Wishart distribution with \kappa + N degrees of freedom and scale matrix E^{-1}+S, where S = \sum_{n=1}^{N} \epsilon_n \epsilon_n' is the sum over the outer products of the residuals (\epsilon_n = U_n - X_n\beta)_n.

Value

A matrix, a draw from the Inverse Wishart posterior distribution of the error term covariance matrix in a multiple linear regression.

Examples

### true error term covariance matrix
(Sigma_true <- matrix(c(1,0.5,0.2,0.5,1,0.2,0.2,0.2,2), ncol=3))
### coefficient vector
beta <- matrix(c(-1,1), ncol=1)
### draw data
N <- 100
X <- replicate(N, matrix(rnorm(6), ncol=2), simplify = FALSE)
eps <- replicate(N, rmvnorm(mu = c(0,0,0), Sigma = Sigma_true), simplify = FALSE)
U <- mapply(function(X, eps) X %*% beta + eps, X, eps, SIMPLIFY = FALSE)
### prior parameters for covariance matrix
kappa <- 4
E <- diag(3)
### draw from posterior of coefficient vector
outer_prod <- function(X, U) (U - X %*% beta) %*% t(U - X %*% beta)
S <- Reduce(`+`, mapply(outer_prod, X, U, SIMPLIFY = FALSE))
Sigma_draws <- replicate(100, update_Sigma(kappa, E, N, S))
apply(Sigma_draws, 1:2, mean)
apply(Sigma_draws, 1:2, stats::sd)

Update latent utility vector

Description

This function updates the latent utility vector, where (independent across deciders and choice occasions) the utility for each alternative is updated conditional on the other utilities.

Usage

update_U(U, y, sys, Sigmainv)

Arguments

U

The current utility vector of length J-1.

y

An integer from 1 to J, the index of the chosen alternative.

sys

A vector of length J-1, the systematic utility part.

Sigmainv

The inverted error term covariance matrix of dimension J-1 x J-1.

Details

The key ingredient to Gibbs sampling for probit models is considering the latent utilities as parameters themselves which can be updated (data augmentation). Independently for all deciders n=1,\dots,N and choice occasions t=1,...,T_n, the utility vectors (U_{nt})_{n,t} in the linear utility equation U_{nt} = X_{nt} \beta + \epsilon_{nt} follow a J-1-dimensional truncated normal distribution, where J is the number of alternatives, X_{nt} \beta the systematic (i.e. non-random) part of the utility and \epsilon_{nt} \sim N(0,\Sigma) the error term. The truncation points are determined by the choices y_{nt}. To draw from a truncated multivariate normal distribution, this function implemented the approach of Geweke (1998) to conditionally draw each component separately from a univariate truncated normal distribution. See Oelschläger (2020) for the concrete formulas.

Value

An updated utility vector of length J-1.

References

See Geweke (1998) Efficient Simulation from the Multivariate Normal and Student-t Distributions Subject to Linear Constraints and the Evaluation of Constraint Probabilities for Gibbs sampling from a truncated multivariate normal distribution. See Oelschläger and Bauer (2020) Bayes Estimation of Latent Class Mixed Multinomial Probit Models for its application to probit utilities.

Examples

U <- c(0,0,0)
y <- 3
sys <- c(0,0,0)
Sigmainv <- solve(diag(3))
update_U(U, y, sys, Sigmainv)

Update latent utility vector in the ranked probit case

Description

This function updates the latent utility vector in the ranked probit case.

Usage

update_U_ranked(U, sys, Sigmainv)

Arguments

U

The current utility vector of length J-1, differenced such that the vector is negative.

sys

A vector of length J-1, the systematic utility part.

Sigmainv

The inverted error term covariance matrix of dimension J-1 x J-1.

Details

This update is basically the same as in the non-ranked case, despite that the truncation point is zero.

Value

An updated utility vector of length J-1.

Examples

U <- c(0,0)
sys <- c(0,0)
Sigmainv <- diag(2)
update_U_ranked(U, sys, Sigmainv)

Update class means

Description

This function updates the class means (independent from the other classes).

Usage

update_b(beta, Omega, z, m, xi, Dinv)

Arguments

beta

The matrix of the decision-maker specific coefficient vectors of dimension P_r x N. Set to NA if P_r = 0.

Omega

The matrix of class covariance matrices as columns of dimension P_r*P_r x C. Set to NA if P_r = 0.

z

The vector of the allocation variables of length N. Set to NA if P_r = 0.

m

The vector of class sizes of length C.

xi

The mean vector of length P_r of the normal prior for each b_c. Per default, xi = numeric(P_r).

Dinv

The precision matrix (i.e. the inverse of the covariance matrix) of dimension P_r x P_r of the normal prior for each b_c.

Details

The following holds independently for each class c. Let b_c be the mean of class number c. A priori, we assume that b_c is normally distributed with mean vector \xi and covariance matrix D. Let (\beta_n)_{z_n=c} be the collection of \beta_n that are currently allocated to class c, m_c the class size, and \bar{b}_c their arithmetic mean. Assuming independence across draws, (\beta_n)_{z_n=c} has a normal likelihood of

\prod_n \phi(\beta_n \mid b_c,\Omega_c),

where the product is over the values n for which z_n=c holds. Due to the conjugacy of the prior, the posterior \Pr(b_c \mid (\beta_n)_{z_n=c}) follows a normal distribution with mean

(D^{-1} + m_c\Omega_c^{-1})^{-1}(D^{-1}\xi + m_c\Omega_c^{-1}\bar{b}_c)

and covariance matrix

(D^{-1} + m_c \Omega_c^{-1})^{-1}.

Value

A matrix of updated means for each class in columns.

Examples

### N = 100 decider, P_r = 2 random coefficients, and C = 2 latent classes
N <- 100
(b_true <- cbind(c(0,0),c(1,1)))
Omega <- matrix(c(1,0.3,0.3,0.5,1,-0.3,-0.3,0.8), ncol=2)
z <- c(rep(1,N/2),rep(2,N/2))
m <- as.numeric(table(z))
beta <- sapply(z, function(z) rmvnorm(b_true[,z], matrix(Omega[,z],2,2)))
### prior mean vector and precision matrix (inverse of covariance matrix)
xi <- c(0,0)
Dinv <- diag(2)
### updated class means (in columns)
update_b(beta = beta, Omega = Omega, z = z, m = m, xi = xi, Dinv = Dinv)

Dirichlet process-based update of latent classes

Description

This function updates the latent classes based on a Dirichlet process.

Usage

update_classes_dp(
  Cmax,
  beta,
  z,
  b,
  Omega,
  delta,
  xi,
  D,
  nu,
  Theta,
  s_desc = TRUE
)

Arguments

Cmax

The maximum number of classes.

beta

The matrix of the decision-maker specific coefficient vectors of dimension P_r x N. Set to NA if P_r = 0.

z

The vector of the allocation variables of length N. Set to NA if P_r = 0.

b

The matrix of class means as columns of dimension P_r x C. Set to NA if P_r = 0.

Omega

The matrix of class covariance matrices as columns of dimension P_r*P_r x C. Set to NA if P_r = 0.

delta

xi

The mean vector of length P_r of the normal prior for each b_c. Per default, xi = numeric(P_r).

D

The covariance matrix of dimension P_r x P_r of the normal prior for each b_c. Per default, D = diag(P_r).

nu

The degrees of freedom (a natural number greater than P_r) of the Inverse Wishart prior for each Omega_c. Per default, nu = P_r + 2.

Theta

The scale matrix of dimension P_r x P_r of the Inverse Wishart prior for each Omega_c. Per default, Theta = diag(P_r).

s_desc

If TRUE, sort the classes in descending class weight.

Details

To be added.

Value

A list of updated values for z, b, Omega, s, and C.

Examples

set.seed(1)
z <- c(rep(1,20),rep(2,30))
b <- matrix(c(1,1,1,-1), ncol=2)
Omega <- matrix(c(1,0.3,0.3,0.5,1,-0.3,-0.3,0.8), ncol=2)
beta <- sapply(z, function(z) rmvnorm(b[,z], matrix(Omega[,z],2,2)))
delta <- 1
xi <- numeric(2)
D <- diag(2)
nu <- 4
Theta <- diag(2)
RprobitB:::update_classes_dp(
  Cmax = 10, beta = beta, z = z, b = b, Omega = Omega,
  delta = delta, xi = xi, D = D, nu = nu, Theta = Theta
)

Weight-based update of latent classes

Description

This function updates the latent classes based on their class weights.

Usage

update_classes_wb(Cmax, epsmin, epsmax, distmin, s, b, Omega)

Arguments

Cmax

The maximum number of classes.

epsmin

The threshold weight (between 0 and 1) for removing a class.

epsmax

The threshold weight (between 0 and 1) for splitting a class.

distmin

The (non-negative) threshold difference in class means for joining two classes.

s

The vector of class weights of length C. Set to NA if P_r = 0. For identifiability, the vector must be non-ascending.

b

The matrix of class means as columns of dimension P_r x C. Set to NA if P_r = 0.

Omega

The matrix of class covariance matrices as columns of dimension P_r*P_r x C. Set to NA if P_r = 0.

Details

The updating scheme bases on the following rules:

We remove class c, if s_c<\epsilon_{min}, i.e. if the class weight s_c drops below some threshold \epsilon_{min}. This case indicates that class c has a negligible impact on the mixing distribution.
We split class c into two classes c_1 and c_2, if s_c>\epsilon_{max}. This case indicates that class c has a high influence on the mixing distribution whose approximation can potentially be improved by increasing the resolution in directions of high variance. Therefore, the class means b_{c_1} and b_{c_2} of the new classes c_1 and c_2 are shifted in opposite directions from the class mean b_c of the old class c in the direction of the highest variance.
We join two classes c_1 and c_2 to one class c, if ||b_{c_1} - b_{c_2}||<\epsilon_{distmin}, i.e. if the euclidean distance between the class means b_{c_1} and b_{c_2} drops below some threshold \epsilon_{distmin}. This case indicates location redundancy which should be repealed. The parameters of c are assigned by adding the values of s from c_1 and c_2 and averaging the values for b and \Omega. The rules are executed in the above order, but only one rule per iteration and only if Cmax is not exceeded.

Value

A list of updated values for s, b, and Omega.

Examples

### parameter settings
s <- c(0.8,0.2)
b <- matrix(c(1,1,1,-1), ncol=2)
Omega <- matrix(c(0.5,0.3,0.3,0.5,1,-0.1,-0.1,0.8), ncol=2)

### Remove class 2
RprobitB:::update_classes_wb(Cmax = 10, epsmin = 0.3, epsmax = 0.9, distmin = 1,
                             s = s, b = b, Omega = Omega)

### Split class 1
RprobitB:::update_classes_wb(Cmax = 10, epsmin = 0.1, epsmax = 0.7, distmin = 1,
                             s = s, b = b, Omega = Omega)

### Join classes
RprobitB:::update_classes_wb(Cmax = 10, epsmin = 0.1, epsmax = 0.9, distmin = 3,
                             s = s, b = b, Omega = Omega)

Update utility threshold increments

Description

This function updates the utility threshold increments

Usage

update_d(d, y, mu, ll, zeta, Z, Tvec)

Arguments

d

The current vector of utility threshold increments.

y

A matrix of the choices.

mu

A matrix of the systematic utilities.

ll

Current log-likelihood value.

zeta

The mean vector of the normal prior for d.

Z

The covariance matrix of the normal prior for d.

Tvec

The element Tvec in sufficient_statistics.

Value

The updated value for d.

Update class sizes

Description

This function updates the class size vector.

Usage

update_m(C, z, nozero)

Arguments

C

The number (greater or equal 1) of latent classes of decision makers. Set to NA if P_r = 0. Otherwise, C = 1 per default.

z

The vector of the allocation variables of length N. Set to NA if P_r = 0.

nozero

If TRUE, each element in the output vector m is at least one (for numerical stability).

Value

An updated class size vector.

Examples

update_m(C = 3, z = c(1,1,1,2,2,3), FALSE)

Update coefficient vector of multiple linear regression

Description

This function updates the coefficient vector of a multiple linear regression.

Usage

update_reg(mu0, Tau0, XSigX, XSigU)

Arguments

mu0

The mean vector of the normal prior distribution for the coefficient vector.

Tau0

The precision matrix (i.e. inverted covariance matrix) of the normal prior distribution for the coefficient vector.

XSigX

The matrix \sum_{n=1}^N X_n'\Sigma^{-1}X_n. See below for details.

XSigU

The vector \sum_{n=1}^N X_n'\Sigma^{-1}U_n. See below for details.

Details

This function draws from the posterior distribution of \beta in the linear utility equation

U_n = X_n\beta + \epsilon_n,

where U_n is the (latent, but here assumed to be known) utility vector of decider n = 1,\dots,N, X_n is the design matrix build from the choice characteristics faced by n, \beta is the unknown coefficient vector (this can be either the fixed coefficient vector \alpha or the decider-specific coefficient vector \beta_n), and \epsilon_n is the error term assumed to be normally distributed with mean 0 and (known) covariance matrix \Sigma. A priori we assume the (conjugate) normal prior distribution

\beta \sim N(\mu_0,T_0)

with mean vector \mu_0 and precision matrix (i.e. inverted covariance matrix) T_0. The posterior distribution for \beta is normal with covariance matrix

\Sigma_1 = (T_0 + \sum_{n=1}^N X_n'\Sigma^{-1}X_n)^{-1}

and mean vector

\mu_1 = \Sigma_1(T_0\mu_0 + \sum_{n=1}^N X_n'\Sigma^{-1}U_n)

. Note the analogy of \mu_1 to the generalized least squares estimator

\hat{\beta}_{GLS} = (\sum_{n=1}^N X_n'\Sigma^{-1}X_n)^{-1} \sum_{n=1}^N X_n'\Sigma^{-1}U_n

which becomes weighted by the prior parameters \mu_0 and T_0.

Value

A vector, a draw from the normal posterior distribution of the coefficient vector in a multiple linear regression.

Examples

### true coefficient vector
beta_true <- matrix(c(-1,1), ncol=1)
### error term covariance matrix
Sigma <- matrix(c(1,0.5,0.2,0.5,1,0.2,0.2,0.2,2), ncol=3)
### draw data
N <- 100
X <- replicate(N, matrix(rnorm(6), ncol=2), simplify = FALSE)
eps <- replicate(N, rmvnorm(mu = c(0,0,0), Sigma = Sigma), simplify = FALSE)
U <- mapply(function(X, eps) X %*% beta_true + eps, X, eps, SIMPLIFY = FALSE)
### prior parameters for coefficient vector
mu0 <- c(0,0)
Tau0 <- diag(2)
### draw from posterior of coefficient vector
XSigX <- Reduce(`+`, lapply(X, function(X) t(X) %*% solve(Sigma) %*% X))
XSigU <- Reduce(`+`, mapply(function(X, U) t(X) %*% solve(Sigma) %*% U, X, U, SIMPLIFY = FALSE))
beta_draws <- replicate(100, update_reg(mu0, Tau0, XSigX, XSigU), simplify = TRUE)
rowMeans(beta_draws)

Update class weight vector

Description

This function updates the class weight vector by drawing from its posterior distribution.

Usage

update_s(delta, m)

Arguments

delta

m

The vector of current class frequencies.

Details

Let m=(m_1,\dots,m_C) be the frequencies of C classes. Given the class weight (probability) vector s=(s_1,\dots,s_C), the distribution of m is multinomial and its likelihood is

L(m\mid s) \propto \prod_{i=1}^C s_i^{m_i}.

The conjugate prior p(s) for s is a Dirichlet distribution, which has a density function proportional to

\prod_{i=1}^C s_i^{\delta_i-1},

where \delta = (\delta_1,\dots,\delta_C) is the concentration parameter vector. Note that in {RprobitB}, \delta_1=\dots=\delta_C. This restriction is necessary because the class number C can change. The posterior distribution of s is proportional to

p(s) L(m\mid s) \propto \prod_{i=1}^C s_i^{\delta_i + m_i - 1},

which in turn is proportional to a Dirichlet distribution with parameters \delta+m.

Value

A vector, a draw from the Dirichlet posterior distribution for s.

Examples

### number of classes
C <- 4
### current class sizes
m <- sample.int(C)
### concentration parameter for Dirichlet prior (single-valued)
delta <- 1
### updated class weight vector
update_s(delta = 1, m = m)

Update class allocation vector

Description

This function updates the class allocation vector (independently for all observations) by drawing from its conditional distribution.

Usage

update_z(s, beta, b, Omega)

Arguments

s

The vector of class weights of length C. Set to NA if P_r = 0. For identifiability, the vector must be non-ascending.

beta

The matrix of the decision-maker specific coefficient vectors of dimension P_r x N. Set to NA if P_r = 0.

b

The matrix of class means as columns of dimension P_r x C. Set to NA if P_r = 0.

Omega

The matrix of class covariance matrices as columns of dimension P_r*P_r x C. Set to NA if P_r = 0.

Details

Let z = (z_1,\dots,z_N) denote the class allocation vector of the observations (mixed coefficients) \beta = (\beta_1,\dots,\beta_N). Independently for each n, the conditional probability \Pr(z_n = c \mid s,\beta_n,b,\Omega) of having \beta_n allocated to class c for c=1,\dots,C depends on the class allocation vector s, the class means b=(b_c)_c and the class covariance matrices Omega=(Omega_c)_c and is proportional to

s_c \phi(\beta_n \mid b_c,Omega_c).

Value

An updated class allocation vector.

Examples

### class weights for C = 2 classes
s <- rdirichlet(c(1,1))
### coefficient vector for N = 1 decider and P_r = 2 random coefficients
beta <- matrix(c(1,1), ncol = 1)
### class means and covariances
b <- cbind(c(0,0),c(1,1))
Omega <- cbind(c(1,0,0,1),c(1,0,0,1))
### updated class allocation vector
update_z(s = s, beta = beta, b = b, Omega = Omega)

RprobitB: Bayesian Probit Choice Modeling

Description

Author(s)

See Also

Compute Gelman-Rubin statistic

Description

Usage

Arguments

Value

References

Examples

Create object of class RprobitB_data

Description

Usage

Arguments

Value

Create object of class RprobitB_fit

Description

Usage

Arguments

Value

Create object of class RprobitB_gibbs_samples_statistics

Description

Usage

Arguments

Value

Create object of class RprobitB_latent_classes

Description

Usage

Arguments

Details

Why update latent classes?

What options to update latent classes exist?

What is the default behavior?

Why is Cmax required?

Value

Examples

Create object of class RprobitB_normalization

Description

Usage

Arguments

Details

Value

Examples

Define probit model parameter

Description

Usage

Arguments

Value

Examples

Compute WAIC value

Description

Usage

Arguments

Details

Value

Re-label alternative specific covariates

Description

Usage

Arguments

Value

Examples

Check model formula

Description

Usage

Arguments

Value

See Also

Check prior parameters

Description

Usage

Arguments

Details

Value

Examples

Compute choice probabilities

Description

Usage

Arguments

Value

Create object of class `RprobitB_data`

Create object of class `RprobitB_fit`

Create object of class `RprobitB_gibbs_samples_statistics`

Create object of class `RprobitB_latent_classes`

Why is `Cmax` required?

Create object of class `RprobitB_normalization`

Create labels for `Omega`

Create labels for `Sigma`

Create labels for `alpha`

Create labels for `b`

Create labels for `d`

Create labels for `s`