Help for package DisImpact

Title:

Calculates Disproportionate Impact When Binary Success Data are Disaggregated by Subgroups

Version:

0.0.21

Description:

Implements methods for calculating disproportionate impact: the percentage point gap, proportionality index, and the 80% index. California Community Colleges Chancellor's Office (2017). Percentage Point Gap Method. https://www.cccco.edu/-/media/CCCCO-Website/About-Us/Divisions/Digital-Innovation-and-Infrastructure/Research/Files/PercentagePointGapMethod2017.ashx. California Community Colleges Chancellor's Office (2014). Guidelines for Measuring Disproportionate Impact in Equity Plans. https://www.cccco.edu/-/media/CCCCO-Website/Files/DII/guidelines-for-measuring-disproportionate-impact-in-equity-plans-tfa-ada.pdf.

Depends:

R (≥ 3.4.0)

Imports:

dplyr (≥ 0.8.5), rlang, tidyselect, purrr, tidyr, parallel, fst, DBI, duckdb (≥ 0.5.0), glue, stringr, data.table (≥ 1.14.2), collapse, sets

License:

GPL-3

URL:

https://github.com/vinhdizzo/DisImpact

BugReports:

https://github.com/vinhdizzo/DisImpact/issues

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.1.2

Suggests:

knitr, rmarkdown, markdown, prettydoc, ggplot2, forcats, scales, tinytest

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2022-10-10 17:26:29 UTC; vnguyen216

Author:

Vinh Nguyen [aut, cre]

Maintainer:

Vinh Nguyen <nguyenvq714@gmail.com>

Repository:

CRAN

Date/Publication:

2022-10-10 18:00:02 UTC

Calculate disproportionate impact per the 80% index

Description

Calculate disproportionate impact per the 80% index method.

Usage

di_80_index(
  success,
  group,
  cohort,
  weight,
  data,
  di_80_index_cutoff = 0.8,
  reference_group = "hpg",
  check_valid_reference = TRUE
)

Arguments

success

A vector of success indicators (1/0 or TRUE/FALSE) or an unquoted reference (name) to a column in data if it is specified. It could also be a vector of counts, in which case weight should also be specified (group size).

group

A vector of group names of the same length as success or an unquoted reference (name) to a column in data if it is specified.

cohort

(Optional) A vector of cohort names of the same length as success or an unquoted reference (name) to a column in data if it is specified. disproportionate impact is calculated for every group within each cohort. When cohort is not specified, then the analysis assumes a single cohort.

weight

(Optional) A vector of case weights of the same length as success or an unquoted reference (name) to a column in data if it is specified. If success consists of counts instead of success indicators (1/0), then weight should also be specified to indicate the group size.

data

(Optional) A data frame containing the variables of interest. If data is specified, then success, group, and cohort will be searched within it.

di_80_index_cutoff

A numeric value between 0 and 1 that is used to determine disproportionate impact if the index comparing the success rate of the current group to the reference group falls below this threshold; defaults to 0.80.

reference_group

The reference group value in group that each group should be compared to in order to determine disproportionate impact. By default (='hpg'), the group with the highest success rate is used as reference. The user could also specify a value of 'overall' to use the overall rate as the reference for comparison, or 'all but current' to use the combined success rate of all other groups excluding the current group for each comparison.

check_valid_reference

Check whether reference_group is a valid value; defaults to TRUE. This argument exists to be used in di_iterate as when iterating DI calculations, there may be some scenarios where a specified reference group does not contain any students.

Details

This function determines disproportionate impact based on the 80% index method, as described in this reference from the California Community Colleges Chancellor's Office. It assumes that a higher rate is good ("success"). For rates that are deemed negative (eg, rate of drop-outs, high is bad), then consider looking at the converse of the non-success (eg, non drop-outs, high is good) instead in order to leverage this function properly.

Value

A data frame consisting of:

cohort (if used),
group,
n (sample size),
success (number of successes for the cohort-group),
pct (proportion of successes for the cohort-group),
reference_group (the reference group used to compare and determine disproportionate impact),
reference (the reference rate used for comparison, corresponding to reference_group),
di_80_index (ratio of pct to the reference),
di_indicator (1 if di_80_index < di_80_index_cutoff),
success_needed_not_di (the number of additional successes needed in order to no longer be considered disproportionately impacted as compared to the reference), and
success_needed_full_parity (the number of additional successes needed in order to achieve full parity with the reference).

References

California Community Colleges Chancellor's Office (2014). Guidelines for Measuring Disproportionate Impact in Equity Plans.

Examples

library(dplyr)
data(student_equity)
di_80_index(success=Transfer, group=Ethnicity, data=student_equity) %>%
  as.data.frame

Calculates disproportionate impact using multiple methods for data stored in a data.table object.

Description

Calculate disproportionate impact via the percentage point gap (PPG), proportionality index, and 80% index methods for data stored in a data.table object. This is the workhorse function leveraged by the di_iterate_dt function.

Usage

di_calc_dt(
  dt,
  success_var,
  group_var,
  cohort_var = "",
  weight_var = NULL,
  ppg_reference_group = "overall",
  min_moe = 0.03,
  use_prop_in_moe = FALSE,
  prop_sub_0 = 0.5,
  prop_sub_1 = 0.5,
  di_prop_index_cutoff = 0.8,
  di_80_index_cutoff = 0.8,
  di_80_index_reference_group = "hpg",
  filter_subset = ""
)

Arguments

dt

A data frame of class data.table. If the object is not a data table, one could surround the object with as.data.table.

success_var

A character value specifying the success variable name.

group_var

A character value specifying the group (disaggregation) variable name.

cohort_var

(Optional) A character value specifying the cohort variable. If not specified, then a single cohort is assumed (defaults to an empty string, '').

weight_var

(Optional) A character variable specifying the weight variable if the input data set is summarized (i.e., the the success variables specified in success_vars contain count of successes). Weight here corresponds to the denominator when calculating the success rate. Defaults to NULL for an input data set where each row describes an individual.

ppg_reference_group

Either 'overall', 'hpg', 'all but current', or a character value specifying a group from group_var to be used as the reference group for comparison using percentage point gap method.

min_moe

The minimum margin of error to be used in the PPG calculation; see di_ppg.

use_prop_in_moe

(TRUE or FALSE) Whether the estimated proportions should be used in the margin of error calculation by the PPG; see di_ppg.

prop_sub_0

Default is 0.50; see di_ppg.

prop_sub_1

Default is 0.50; see di_ppg.

di_prop_index_cutoff

Threshold used for determining disproportionate impact using the proportionality index; see di_prop_index; defaults to 0.80.

di_80_index_cutoff

Threshold used for determining disproportionate impact using the 80% index; see di_80_index; defaults to 0.80.

di_80_index_reference_group

Either 'overall', 'hpg', 'all but current', or a character value specifying a group from group_var to be used as the reference group for comparison using 80% index.

filter_subset

A character value such as "Ethnicity == 'White' & Gender == 'M'" used in the i argument (filtering rows via dt[i, j, by]) to filter data in dt. The character value is parsed using eval(parse(text=filter_subset)). Defaults to '' for no filtering.

Value

A data.table object with summarized results.

Generate SQL code that calculates disproportionate impact using multiple methods for a specified table.

Description

Generate SQL code that calculates disproportionate impact via the percentage point gap (PPG), proportionality index, and 80% index methods for a specified table name, success variable, group variable, and cohort variable. This is the workhorse function leveraged by the di_iterate_sql function.

Usage

di_calc_sql(
  db_table_name,
  success_var,
  group_var,
  cohort_var = "",
  weight_var = 1,
  ppg_reference_group = "overall",
  min_moe = 0.03,
  use_prop_in_moe = FALSE,
  prop_sub_0 = 0.5,
  prop_sub_1 = 0.5,
  di_prop_index_cutoff = 0.8,
  di_80_index_cutoff = 0.8,
  di_80_index_reference_group = "hpg",
  before_with_statement = "",
  after_with_statement = "",
  end_of_select_statement = "",
  where_statement = "",
  select_statement_add = ""
)

Arguments

db_table_name

A character value specifying a database table name.

success_var

A character value specifying the success variable name.

group_var

A character value specifying the group (disaggregation) variable name.

cohort_var

(Optional) A character value specifying the cohort variable. If not specified, then a single cohort is assumed (defaults to an empty string, '').

weight_var

(Optional) A character variable specifying the weight variable if the input data set is summarized (i.e., the the success variables specified in success_vars contain count of successes). Weight here corresponds to the denominator when calculating the success rate. Defaults to a numeric 1 which treats each row as an individual.

ppg_reference_group

Either 'overall', 'hpg', 'all but current', or a character value specifying a group from group_var to be used as the reference group for comparison using the percentage point gap method.

min_moe

The minimum margin of error to be used in the PPG calculation; see di_ppg.

use_prop_in_moe

(TRUE or FALSE) Whether the estimated proportions should be used in the margin of error calculation by the PPG; see di_ppg.

prop_sub_0

Default is 0.50; see di_ppg.

prop_sub_1

Default is 0.50; see di_ppg.

di_prop_index_cutoff

Threshold used for determining disproportionate impact using the proportionality index; see di_prop_index; defaults to 0.80.

di_80_index_cutoff

Threshold used for determining disproportionate impact using the 80% index; see di_80_index; defaults to 0.80.

di_80_index_reference_group

Either 'overall', 'hpg', 'all but current', or a character value specifying a group from group_var to be used as the reference group for comparison using 80% index.

before_with_statement