Help for package SPARRAfairness

Title:

Analysis of Differential Behaviour of SPARRA Score Across Demographic Groups

Version:

0.1.0.0

Maintainer:

James Liley <james.liley@durham.ac.uk>

Description:

The SPARRA risk score (Scottish Patients At Risk of admission and Re-Admission) estimates yearly risk of emergency hospital admission using electronic health records on a monthly basis for most of the Scottish population. This package implements a suite of functions used to analyse the behaviour and performance of the score, focusing particularly on differential performance over demographically-defined groups. It includes useful utility functions to plot receiver-operator-characteristic, precision-recall and calibration curves, draw stock human figures, estimate counterfactual quantities without the need to re-compute risk scores, to simulate a semi-realistic dataset. Our manuscript can be found at: <doi:10.1371/journal.pdig.0000675>.

License:

GPL (≥ 3)

Encoding:

UTF-8

LazyData:

true

Depends:

R (≥ 3.5.0), stats, graphics, grDevices, matrixStats, ranger

Imports:

mvtnorm,cvAUC,ggplot2,ggrepel,patchwork,scales

RoxygenNote:

7.3.2

Suggests:

knitr, rmarkdown

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2025-04-09 11:53:52 UTC; vwbw55

Author:

Ioanna Thoma

[aut], Catalina Vallejos

[ctb], Louis Aslett

[ctb], Jill Ireland

[ctb], Simon Rogers

[ctb], James Liley

[cre, aut]

Repository:

CRAN

Date/Publication:

2025-04-09 12:40:02 UTC

ab() Shorthand to draw a red x-y line

Description

ab() Shorthand to draw a red x-y line

Usage

ab(...)

Arguments

...

passed to abline()

Value

No return value, draws a figure

adjusted_fdr

Description

Estimates false discovery rate P(target=FALSE|score>cutoff,group=g) 'adjusted' for some category.

Usage

adjusted_fdr(
  scores,
  target,
  category,
  group1,
  group2,
  cutoffs = seq(min(scores, na.rm = TRUE), max(scores, na.rm = TRUE), length = 100),
  nboot = 100
)

Arguments

scores

vector of risk scores

target

vector of values of target (which risk score aims to predict)

category

vector of categories

group1

indices of group 1

group2

indices of group 2

cutoffs

score cutoffs at which to estimate metric (default 100 evenly-spaced)

nboot

number of bootstrap samples for standard error

Details

Namely, calculates

sum ( P(target=FALSE|score>cutoff,category=c,group=g)P(category=c|score<cutoff) )

where the sum is over categories c.

Value

matrix of dimension length(cutoffs)x4, with (i,2g-1)th entry the relevant fairness metric for group g at the ith cutoff value and (i,2g)th entry the approximate standard error of the (i,2g-1)th entry

Examples

# See vignette

adjusted_for

Description

Estimates false omission rate P(target=TRUE|score<=cutoff,group=g) 'adjusted' for some category.

Usage

adjusted_for(
  scores,
  target,
  category,
  group1,
  group2,
  cutoffs = seq(min(scores, na.rm = TRUE), max(scores, na.rm = TRUE), length = 100),
  nboot = 100
)

Arguments

scores

vector of risk scores

target

vector of values of target (which risk score aims to predict)

category

vector of categories

group1

indices of group 1

group2

indices of group 2

cutoffs

score cutoffs at which to estimate metric (default 100 evenly-spaced)

nboot

number of bootstrap samples for standard error

Details

Namely, calculates

sum ( P(target=TRUE|score<=cutoff,category=c,group=g)P(category=c|score<cutoff) )

where the sum is over categories c.

Value

matrix of dimension length(cutoffs)x4, with (i,2g-1)th entry the relevant fairness metric for group g at the ith cutoff value and (i,2g)th entry the approximate standard error of the (i,2g-1)th entry

Examples

# See vignette

All data for fairness measures

Description

This object contains all data from analysis of fairness measures in SPARRA v3 and v4.

Usage

all_data

Format

An object of class list of length 1261.

build_diff Prepares a data frame for a ggplot object to compare differences using linear interpolation.

Description

build_diff Prepares a data frame for a ggplot object to compare differences using linear interpolation.

Usage

build_diff(df, xvar)

Arguments

df

data frame

xvar

name of variable to consider as 'x': interpolate over evenly spaced values of this variable.

Value

data frame using (common) interpolated x values rather than arbitrary x values

Examples

# Only used internally

cal_2panel Draws calibration curves (with legend) with a second panel underneath showing predicted differences.

Description

cal_2panel Draws calibration curves (with legend) with a second panel underneath showing predicted differences.

Usage

cal_2panel(
  cals,
  labels,
  col = 1:length(cals),
  xy_col = phs_colours("phs-magenta"),
  ci_col = col,
  highlight = NULL,
  yrange_lower = NULL,
  legend_title = ""
)

Arguments

cals

list of calibration objects, output from getcal().

labels

labels to use in legend

col

line colours

xy_col

line colour for x-y line, defaults to phs-magenta

ci_col

colours to draw confidence intervals on lower panel; NA to not draw.

highlight

if non-null, highlight a particular value

yrange_lower

y range for lower plot. If NULL, generates automatically

legend_title

title for legend, defaults to nothing

Value

Silently return ggplot object

Examples

# See vignette

counterfactual_yhat

Description

Estimation of counterfactual quantities by resampling.

Usage

counterfactual_yhat(dat, X, x = NULL, G, g, gdash, excl = NULL, n = NULL)

Arguments

dat

data frame containing variables in X, Yhat, G, excl. Variables in U are assumed to be colnames(dat)\(X U Yhat U G U excl)

X

set of variables and values on which to 'condition'; essentially we assume that any causal pathway from G to Yhat is through X

x

values of variables X on which to condition; can be null in which case we use marginal distribution of X

G

grouping variable, usually a sensitive attribute

g

conditioned value of g

gdash

counterfactual value of g

excl

variable names to exclude from U

n

number of samples; if NULL return all

Details

Counterfactual fairness is with respect to the causal graph:

G <———-U | / | | / | | / | VV V X ——–> Yhat

where

G=group (usually sensitive attribute);
Yhat=outcome;
X=set of variables through which G can act on Yhat,
U=set of background variables;

We want the counterfactual Yhat_{g' <- G} |X=x,G=g (or alternatively (Yhat_{g' <- G} |G=g)), using the term Yhat as it appears in the function, rather than \hat{Y}.

This can be interpreted as: ⁠the distribution of values of Yhat amongst indivdiduals whose values of U are distributed as though they were in group \eqn{G=g} (and, optionally, had values \eqn{X=x}, but whose value of \eqn{G} is \eqn{g'}⁠

Essentially, comparison of the counterfactual quantity above to the conditional (Yhat|G=g) isolates the difference in Yhat due to the effect of G on Yhat through X, removing any effect due to different distributions of U due to different values of G.

To estimate Y'=Yhat_{g' \to G} | G=g, we need to

Compute U'\sim (U|G=g)
Compute the distribution X' as X'\sim (X|U~U',G=g')
Sample Y'~(Yhat|X\sim X',U \sim U')

To estimate Y'=Yhat_{g' \to G} |X=x, G=g, we need to

Compute U'\sim (U|G=g,X=x)
Compute the distribution X' as X'\sim (X|U \sim U',G=g')
Sample Y' \sim (Yhat|X \sim X',U \sim U')

This function approximates this samplying procedure as follows

Look at individuals with G=g (and optionally X=x)
Find the values of U for these individuals
Find a second set of individuals with the same values of U but for whom G=g'
Return the indices of these individuals

The values of Yhat for these individuals constitute a sample from the desired counterfactual.

Value

indices representing sample(s) from counterfactual Yhat(g' <- G) |X=x,G=g

Examples


set.seed(23173)
N=10000

# Background variables sampler
background_U=function(n) runif(n) # U~U(0,1)

# Structural equations
struct_G=function(u,n) rbinom(n,1,prob=u) # G|U=u ~ Bern(u)
struct_X=function(u,g,n) rbinom(n,1,prob=u*(0.5 + 0.5*g)) # X|U=u,G=g ~ Bern(u(1+g)/2)
struct_Yhat=function(u,x,n) (runif(n,0,x) + runif(n,0,u))/2 # Yhat|X,N ~ (U(0,X) + U(0,U))/2

# To see that the counterfactual 'isolates' the difference in Yhat due to the 
#  causal pathway from G to Yhat through X, change the definition of struct_G to
#  
#  struct_G=function(u,n) rbinom(n,1,prob=1/2) # G|U=u ~ Bern(1/2)
#
# so the posterior of U|G=g does not depend on g. Note that, with this definition, the 
#  counterfactual Yhat{G<-1}|G=1 coincides with the conditional Yhat|G=0, since
#  the counterfactual G<-1 is equivalent to just conditioning on G=1.
#
# By contrast, if we change struct_G back to its original definition, but 
#  change the definition of struct_Yhat to
#  
#  struct_Yhat=function(u,x,n) (runif(n,0,1) + runif(n,0,u))/2 # Yhat|X,N ~ (U(0,1) + U(0,U))/2
#
# so Yhat depends on G only through the change in posterior of U from changing g, 
#  the counterfactual Yhat{G<01}|G=1 coincides with the conditional Yhat|G=1.


# Sample from complete causal model
U=background_U(N)
G=struct_G(U,N)
X=struct_X(U,G,N)
Yhat=struct_Yhat(U,X,N)
dat=data.frame(U,G,X,Yhat)


# True counterfactual Yhat{G <- 0}|G=1
w1=which(dat$G==1)
n1=length(w1)
UG1=dat$U[w1] # This is U|G=1
XG1=struct_X(UG1,rep(0,n1),n1)
YhatG1=struct_Yhat(UG1,XG1,n1)

# Estimated counterfactual Yhat{G <- 0}|G=1
ind_G1=counterfactual_yhat(dat,X="X",G="G",g = 1, gdash = 0)
YhatG1_resample=dat$Yhat[ind_G1]



# True counterfactual Yhat{G <- 0}|G=1,X=1
w11=which(dat$G==1 & dat$X==1)
n11=length(w11)
UG1X1=dat$U[w11] # This is U|G=1,X=1
XG1X1=struct_X(UG1X1,rep(0,n11),n11)
YhatG1X1=struct_Yhat(UG1X1,XG1X1,n11)

# Estimated counterfactual Yhat{G <- 0}|G=1
ind_G1X1=counterfactual_yhat(dat,X="X",G="G",g = 1, gdash = 0,x=1)
YhatG1X1_resample=dat$Yhat[ind_G1X1]




# Compare CDFs
x=seq(0,1,length=1000)
oldpar = par(mfrow=c(1,2))

plot(0,type="n",xlim=c(0,1),ylim=c(0,1),xlab="Value",
     ylab=expression(paste("Prop. ",hat('Y')," < x")))
lines(x,ecdf(dat$Yhat)(x),col="black") # Unconditional CDF of Yhat
lines(x,ecdf(dat$Yhat[which(dat$G==1)])(x),col="red") # Yhat|G=1
lines(x,ecdf(dat$Yhat[which(dat$G==0)])(x),col="blue") # Yhat|G=0

# True counterfactual Yhat{G <- 0}|G=1
lines(x,ecdf(YhatG1)(x),col="blue",lty=2) 

# Estimated counterfactual Yhat{G <- 0}|G=1
lines(x,ecdf(YhatG1_resample)(x),col="blue",lty=3) 

legend("bottomright",
       c(expression(paste(hat('Y'))),
         expression(paste(hat('Y'),"|G=1")),
         expression(paste(hat('Y'),"|G=0")),
         expression(paste(hat(Y)[G %<-% 0],"|G=1 (true)")),
         expression(paste(hat(Y)[G %<-% 0],"|G=1 (est.)"))),
       col=c("black","red","blue","blue","blue"),
       lty=c(1,1,1,2,3),
       cex=0.5)



plot(0,type="n",xlim=c(0,1),ylim=c(0,1),xlab="Value",
     ylab=expression(paste("Prop. ",hat('Y')," < x")))
lines(x,ecdf(dat$Yhat[which(dat$X==1)])(x),col="black") # CDF of Yhat|X=1
lines(x,ecdf(dat$Yhat[which(dat$G==1 & dat$X==1)])(x),col="red") # Yhat|G=1,X=1
lines(x,ecdf(dat$Yhat[which(dat$G==0 & dat$X==1)])(x),col="blue") # Yhat|G=0,X=1

# True counterfactual Yhat{G <- 0}|G=1,X=1
lines(x,ecdf(YhatG1X1)(x),col="blue",lty=2) 

# Estimated counterfactual Yhat{G <- 0}|G=1,X=1
lines(x,ecdf(YhatG1X1_resample)(x),col="blue",lty=3) 

legend("bottomright",
       c(expression(paste(hat('Y|X=1'))),
         expression(paste(hat('Y'),"|G=1,X=1")),
         expression(paste(hat('Y'),"|G=0,X=1")),
         expression(paste(hat(Y)[G %<-% 0],"|G=1,X=1 (true)")),
         expression(paste(hat(Y)[G %<-% 0],"|G=1,X=1 (est.)"))),
       col=c("black","red","blue","blue","blue"),
       lty=c(1,1,1,2,3),
       cex=0.5)

# In both plots, the estimated counterfactual CDF closely matches the CDF of the
#  true counterfactual.

# Restore parameters
par(oldpar)

dat2mat

Description

Generates matrices for decomposition of admission type which can be used in plot_decomp

Usage

dat2mat(dat, score, group1, group2, nquant = 20, cats = unique(dat$reason))

Arguments

dat

data frame with population data, such as output from sim_pop_data. Must include a column reason

score

risk scores corresponding to dat

group1

indices for group 1

group2

indices for group 2

nquant

number of quantiles of code to use; default 20

cats

vector of strings giving names of admission categories; default the unique values in dat$reason. Can include NAs.

Details

Generates two matrices with the following specifications: Each matrix corresponds to one group Columns are named with the admission types to be plotted. Any admission types including the string 'Died' are counted as deaths If the matrix has N rows, these are interpreted as corresponding to N score quantiles The (i,j)th entry of the matrix is the number of people admitted for reason i with a score greater than or equal to (j-1)/N and less than (j/N) who are in that group

Value

list with two objects matrix1 and matrix2 giving output matrices

Examples

# See vignette

Decomposition matrix

Description

Matrix giving frequency of admission types for various groups at various score thresholds. Row names are of the form vX_Y_qZ, where X is version (3 or 4), Y is cohort (e.g., all, over 65, island postcode) and Z is quantile (1-20) of score. Column names are cause of admission or cause of death.

Usage

decomposition_matrix

Format

An object of class data.frame with 520 rows and 41 columns.

demographic_parity

Description

Estimates demographic parity for a risk score (essentially cumulative distribution function)

Usage

demographic_parity(
  scores,
  group1,
  group2,
  cutoffs = seq(min(scores, na.rm = TRUE), max(scores, na.rm = TRUE), length = 100)
)

Arguments

scores

vector of risk scores

group1

indices of group 1

group2

indices of group 2

cutoffs

score cutoffs at which to estimate DP (default 100 evenly-spaced)

Value

matrix of dimension length(cutoffs)x4, with (i,2g-1)th entry the proportion of scores in group g which are less than or equal to the ith cutoff value and (i,2g)th entry the approximate standard error of the (i,2g-1)th entry

Examples

# See vignette

drawperson

Description

Draws a simple stock image of a person.

Usage

drawperson(
  xloc = 0,
  yloc = 0,
  scale = 1,
  headsize = 0.16,
  headangle = pi/8,
  headloc = 0.5,
  necklength = 0.1,
  shoulderwidth = 0.1,
  shouldersize = 0.05,
  armlength = 0.4,
  armangle = 7 * pi/8,
  armwidth = 0.08,
  leglength = 0.5,
  legangle = 9 * pi/10,
  legwidth = 0.15,
  torsolength = 0.4,
  ...
)

Arguments

xloc

x-axis offset from origin

yloc

y-axis offset from origin

scale

scale upwards from 1x1 box

headsize

head size

headangle

half angle of neck in terms of head

headloc

location of centre of head relative to origin with scale 1

necklength

neck length

shoulderwidth

shoulder width

shouldersize

size radius of arc for shoulder

armlength

arm length

armangle

angle of arm from horizontal

armwidth

width of arm

leglength

leg length

legangle

angle of leg from horizontal

legwidth

width of leg

torsolength

length of torso

...

other parameters passed to polygon()

Details

Draws a figure at a particular location. With defaults, has centre at origin and fits in 1x1 box.

Dimensions customisable

Value

invisibly returns co-ordinates

Examples

plot(0,xlim=c(-1,1),ylim=c(-1,1),type="n")
drawperson(0,0,1,col="yellow",border="red",lwd=3,lty=2)

drawprop

Description

Illustrates a proportion as a set of people who are blue rather than red.

Usage

drawprop(prop, ci, nxy = 10, col1 = "maroon", col2 = "lightblue", ...)

Arguments

prop

the proportion to illustrate

ci

half the 95% CI width of the proportion.

nxy

illustrate on an n x n grid

col1

colour to put the 'in' proportion

col2

the other colour

...

passed to 'plot'

Details

Why anyone would want to think about a proportion this way is beyond the understanding of the authors, but the people have spoken.

Value

No return value, draws a figure

Examples

# See vignette

for_breakdown

Description

For a given category (e.g., 'male', 'over 65') considers

all admissions for people in that category
all admissions for people in that category for which the SPARRA score was less than some threshold (e.g., false negatives

Usage

for_breakdown(
  decomp_table,
  group,
  threshold,
  inc_died = TRUE,
  ldiff = 0.005,
  ci = 0.95,
  xlimit = c(-0.05, 0.35),
  ylimit = c(-0.04, 0.04)
)

Arguments

decomp_table

matrix for group; see specification in description

group

name of group

threshold

cutoff, rounded to nearest 0.05

inc_died

set to TRUE to include a second panel showing 'death' type admissions

ldiff

specifically label points this far from xy line

ci

set to a value <1 to draw confidence intervals at that value, or FALSE to not draw confidence intervals.

xlimit

limits for x axis; default c(-0.05,0.35)

ylimit

limits for y axis; default c(-0.04,0.04)

Details

For each of these groups, we consider the breakdown of medical admission types. We then plot the frequency of admission types in group 1 against the difference in frequencies between group 1 and group 2 (group 2 minus group 1). An admission type which is relatively more common in group (1) indicates that, in the relevant category, the admission type tends to be associated with higher SPARRA scores (and is in a sense easier to predict). Such admission types will correspond to points below the line y=0. Admission types which are relatively more common in group 2 correspond to those which are relatively harder to predict. These correspond to points above the line y=0 Since points are close together, only those greater than a certain distance from 0 are marked.

Takes as an argument a matrix in which The matrix shows only data for the group in question Columns are named with the admission types to be plotted. Any admission types including the string 'Died' are counted as deaths If the matrix has N rows, these are interpreted as corresponding to N score quantiles in increasing order. The (i,j)th entry of the matrix is the number of people admitted for reason i with a score greater than or equal to (j-1)/N and less than (j/N) who are in that group

Value

ggplot figure (invisible)

Examples


# See vignette

getcal()

Description

Produces a set of points for a calibration plot.

Usage

getcal(
  y,
  ypred,
  n = 10,
  kernel = FALSE,
  kernel_sd = 0.05,
  alpha = 0.05,
  c0 = 0,
  c2 = 0.1
)

Arguments

y

class labels, 0/1 or logical

ypred

predictions Pr(Y=1), numeric vector

n

number of subintervals/points

kernel

set to TRUE to use kernel method

kernel_sd

kernel width for kernel method; see above

alpha

return a pointwise confidence envolope for conservative 1-alpha confidence interval

c0

for computing maximum bias; assume true covariance function is of the form a0+ a1x + a2x^2, with |a0|<c0, |a2|<c2 (c1 does not matter)

c2

for computing maximum bias; assume true covariance function is of the form a0+ a1x + a2x^2, with |a0|<c0, |a2|<c2 (c1 does not matter)

Details

Uses either a binning method or a kernel method to determine height of points.

In both methods, considers n equally spaced subintervals of (0,1)

Value

a list with components x (expected calibration), y (observed calibration), n (number of samples in bins, if relevant), lower/upper (confidence interval on y)

Examples

# See vignette

getprc()

Description

Comprehensive plotting function for precision-recall curve. Also calculates AUPRC and standard error.

Usage

getprc(y, ypred, cv = NULL, res = 100)

Arguments

y

class labels, 0/1 or logical

ypred

predictions Pr(Y=1), numeric vector

cv

cross-validation fold assignments, if relevant. Changes estimate of standard error.

res

resolution. Returns this many equally-spaced points along the curve. Set res to null to return all points.

Details

Rather than returning points corresponding to every cutoff, only returns a representative sample of equally-spaced points along the curve.

Does not plot anything. Object can be plotted in a default way.

Value

list containing: ppv, ppv for res points in every cv fold; sens, sensitivity for res points in every cv fold; auc, areas under the curve for each fold and average (note length is 1 greater than number of CV folds); se, standard error for AUC in each fold and standard error for average auc (note length is 1 greater than number of CV folds)

Examples

# See vignette

getroc() Comprehensive plotting function for receiver-operator characteristic curve. Also calculates AUROC and standard error.

Description

Rather than returning points corresponding to every cutoff, only returns a representative sample of equally-spaced points along the curve.

Usage

getroc(y, ypred, cv = NULL, res = 100)

Arguments

y

class labels, 0/1 or logical

ypred

predictions Pr(Y=1), numeric vector

cv

cross-validation fold assignments, if relevant. Changes estimate of standard error.

res

resolution. Returns this many equally-spaced points along the curve. Set res to null to return all points.

Details

SE of AUROC with no CV structure is from Hanley and McNeil 1982. SE of AUROC with CV folds is from LeDell et al 2012

Does not plot anything. Object can be plotted in a default way.

Value

list containing: spec, specificity for res points in every cv fold; sens, sensitivity for res points in every cv fold; auc, areas under the curve for each fold and average (note length is 1 greater than number of CV folds); se, standard error for AUC in each fold and standard error for average auc (note length is 1 greater than number of CV folds)

Examples

# See vignette

group_fairness

Description

Estimates group fairness metric according to a specification vector of the form

Usage

group_fairness(
  specs,
  scores,
  target,
  group1,
  group2,
  cutoffs = seq(min(scores, na.rm = TRUE), max(scores, na.rm = TRUE), length = 100)
)

Arguments

specs

specification vector; see description

scores

vector of risk scores

target

vector of values of target (which risk score aims to predict)

group1

indices of group 1

group2

indices of group 2

cutoffs

score cutoffs at which to estimate metric (default 100 evenly-spaced)

Details

c(A1,B1,C1,A2,B2,C2)

encoding a probability

P(A1,B1,C1|A2,B2,C2)

where

A1/A2 are events coded by 1:'score>= cutoff'; 0: 'score<cutoff' and NA: 1/TRUE B1/B2 are events coded by 1:'target=TRUE'; 0: 'target=FALSE' and NA: 1/TRUE C1/C2 are events coded by 1:'group=g'; and NA: 1/TRUE

For example, specs=c(NA,1,NA,0,NA,1) would encode false omission rate:

P(target=TRUE|score<cutoff,group=g)

Value

matrix of dimension length(cutoffs)x4, with (i,2g-1)th entry the relevant fairness metric for group g at the ith cutoff value and (i,2g)th entry the approximate standard error of the (i,2g-1)th entry

Examples

# See vignette

groupmetric_2panel Draws plots of a group fairness metric with a second panel underneath

Description

groupmetric_2panel Draws plots of a group fairness metric with a second panel underneath

Usage

groupmetric_2panel(
  objs,
  labels = names(objs),
  col = 1:length(objs),
  yrange = NULL,
  ci_col = col,
  highlight = NULL,
  logscale = FALSE,
  lpos = c(1, 0),
  yrange_lower = NULL,
  legend_title = ""
)

Arguments

objs

list of fairness objects. Each should contain sub-objects 'x', 'y' and 'ci', which specify x and y values and half-widths of confidence intervals around y.

labels

labels to use in legend

col

line colours

yrange

limit of y axis; defaults to 0,1

ci_col

confidence envelope colours. These will be transparent.

highlight

if non-null, draw a point at a particular cutoff

logscale

if TRUE, draw with log-scale.

lpos

legend position; as accepted by ggplot legend.position

yrange_lower

y range for lower plot. If NULL, generates automatically

legend_title

title for legend, defaults to nothing

Value

Silently return ggplot object

Examples

# See vignette

integral() Quick form for trapezoidal integration over range of x

Description

integral() Quick form for trapezoidal integration over range of x

Usage

integral(x, y = NULL)

Arguments

x

x co-ordinates, or nx2 matrix of points

y

y co-ordinates

Value

trapezoidal estimate of integral of the xth value of y over range of x.

Logistic

Description

Logistic function: 1/(1+exp(-x))

Usage

logistic(x)

Arguments

x

argument

Value

value of logistic(x)

Examples


# Plot
x=seq(-5,5,length=1000)
plot(x,logistic(x),type="l")

Logit

Description

Logit function: -log((1/x)-1)

Usage

logit(x)

Arguments

x

argument

Value

value of logit(x); na if x is outside (0,1)

Examples


# Plot
x=seq(0,1,length=100)
plot(x,logit(x),type="l")

# Logit and logistic are inverses
x=seq(-5,5,length=1000)
plot(x,logit(logistic(x)),type="l")

phs_colours

Description

Copied from github, "Public-Health-Scotland/phsstyles". Public Health Scotland colour scheme. Internal function.

Usage

phs_colours(colourname = NULL, keep_names = FALSE)

Arguments

colourname

name of colour; usually something like phs-blue. If NULL returns all colours.

keep_names

keep names of colours in return list. Defaults to false.

Value

vector of colours, optionally with names.

Plot function for class sparraCAL

Description

Plot function for class sparraCAL

Usage

## S3 method for class 'sparraCAL'
plot(
  x,
  cols = rep(phs_colours("phs-blue"), dim(x$sens)[1]),
  add = FALSE,
  add_xy_line = TRUE,
  ...
)

Arguments

x

output from getcal()

cols

colour to draw lines

add

set to FALSE to add to existing plot

add_xy_line

set to TRUE to draw an X-Y reference line.

...

passed to lines()

Value

No return value, draws a figure

Examples

# See vignette

Plot function for class above

Description

Plot function for class above

Usage

## S3 method for class 'sparraPRC'
plot(
  x,
  addauc = FALSE,
  cols = rep(phs_colours("phs-blue"), dim(x$sens)[1]),
  ...
)

Arguments

x

output from getprc()

addauc

set to TRUE to add text to the plot showing the (mean) AUC and SE.

cols

colour to draw lines

...

passed to plot()

Value

No return value, draws a figure

Examples

# See vignette

Plot function for class sparraROC

Description

Plot function for class sparraROC

Usage

## S3 method for class 'sparraROC'
plot(
  x,
  addauc = FALSE,
  cols = rep(phs_colours("phs-blue"), dim(x$sens)[1]),
  ...
)

Arguments

x

output from getroc()

addauc

set to TRUE to add text to the plot showing the (mean) AUC and SE.

cols

colour to draw lines

...

passed to plot()

Value

No return value, draws a figure

Examples

# See vignette

plot_decomp

Description

Plots a bar graph of decomposition of FORP by cause of admission

Usage

plot_decomp(decomp1, decomp2, threshold, labels, inc_died = TRUE)

Arguments

decomp1

matrix for first group; see specification in description

decomp2

matrix for second group; see specification in description

threshold

score threshold to plot (between 0 and 1)

labels

labels for group 1 and group 2

inc_died

set to TRUE to include a second panel showing 'death' type admissions

Details

Takes two matrices as input with the following specifications: Each matrix corresponds to one group Columns are named with the admission types to be plotted. Any admission types including the string 'Died' are counted as deaths If the matrix has N rows, these are interpreted as corresponding to N score quantiles in increasing order. The (i,j)th entry of the matrix is the number of people admitted for reason i with a score greater than or equal to (j-1)/N and less than (j/N) who are in that group

Value

Silently return ggplot object

Examples


# See vignette

prc_2panel Draws a PRC curve (with legend) with a second panel underneath showing precision difference.

Description

prc_2panel Draws a PRC curve (with legend) with a second panel underneath showing precision difference.

Usage

prc_2panel(
  prcs,
  labels = names(prcs),
  col = 1:length(prcs),
  highlight = NULL,
  yrange_lower = NULL,
  legend_title = ""
)

Arguments

prcs

list of sparraPRC objects.

labels

labels to use in legend

col

line colours

highlight

if non-null, draw a point at a particular cutoff

yrange_lower

y range for lower plot. If NULL, generates automatically

legend_title

title for legend, defaults to nothing

Value

Silently return ggplot object

Examples

# See vignette

roc_2panel Draws a ROC curve (with legend) with a second panel underneath showing sensitivity difference.

Description

roc_2panel Draws a ROC curve (with legend) with a second panel underneath showing sensitivity difference.

Usage

roc_2panel(
  rocs,
  labels = names(rocs),
  col = 1:length(rocs),
  xy_col = phs_colours("phs-magenta"),
  highlight = NULL,
  yrange_lower = NULL,
  legend_title = ""
)

Arguments

rocs

list of sparraROC objects

labels

labels to use in legend; default to names of rocs.

col

line colours

xy_col

line colour for x-y line, defaults to red

highlight

if non-null, add a point at this cutoff

yrange_lower

y range for lower plot. If NULL, generates automatically

legend_title

title for legend, defaults to nothing

Value

Invisibly returns plot as ggplot object

Examples

# See vignette

sim_pop_data

Description

Simulates population data with a reasonably realistic joint distribution

Usage

sim_pop_data(
  npop,
  coef_adjust = 4,
  offset = 1,
  vcor = NULL,
  coefs = c(2, 1, 0, 5, 3, 0, 0),
  seed = 12345,
  incl_id = TRUE,
  incl_reason = TRUE
)

Arguments

npop

population size

coef_adjust

inverse scale for all (true) coefficients (default 4): lower means that hospital admissions are more predictable from covariates.

offset

offset for logistic model (default 1): higher means a lower overall prevalence of admission

vcor

a valid 5x5 correlation matrix (default NULL), giving correlation between variables. If 'NULL', values roughly represents realistic data.

coefs

coefficients of age, male sex, non-white ethnicity, number of previous admissions, and deprivation decile on hospital admissions, Default (2,1,0,5,3). Divided through by coef_adjust.

seed

random seed (default 12345)

incl_id

include an ID column (default TRUE)

incl_reason

include a column indicating reason for admission.

Details

Simulates data for a range of people for the variables

Age (age)
Sex (sexM; 1 if male)
Race/ethnicity (raceNW: 1 if non-white ethnicity)
Number of previous hospital admissions (PrevAdm)
Deprivation decile (SIMD: 1 most deprived, 10 least deprived. NOTE - opposite to English IMD)
Urban-rural residence status (urban_rural: 1 for rural)
Mainland-island residence status (mainland_island: 1 for island)
Hospital admission (target: 1/TRUE if admitted to hospital in year following prediction date)

Can optionally add an ID column.

Optionally includes an admission reason for samples with target=1. These admission reasons roughly correspond to the first letters of ICD10 categories, and can either correspond to an admission or death. Admission reasons are simulated with a non-constant multinomial distribution which varies across age/sex/ethnicity/urban-rural/mainland-island/PrevAdm values in a randomly- chosen way. The distributions of admission reasons are not however chosen to reflect real distributions, nor are systematic changes in commonality of admission types across categories intended to appear realistic.

Value

data frame with realistic values.

Examples


# Simulate data
dat=sim_pop_data(10000)
cor(dat[,1:7])

# See vignette