Help for package ERPM

Type:

Package

Title:

Exponential Random Partition Models

Version:

0.2.0

Date:

2024-05-03

Description:

Simulates and estimates the Exponential Random Partition Model presented in the paper Hoffman, Block, and Snijders (2023) <doi:10.1177/00811750221145166>. It can also be used to estimate longitudinal partitions, following the model proposed in Hoffman and Chabot (2023) <doi:10.1016/j.socnet.2023.04.002>. The model is an exponential family distribution on the space of partitions (sets of non-overlapping groups) and is called in reference to the Exponential Random Graph Models (ERGM) for networks.

License:

GPL (≥ 3)

Depends:

R (≥ 4.2)

Imports:

numbers, utils, stats, igraph, RColorBrewer, snowfall

Suggests:

knitr, rmarkdown, testthat (≥ 3.0.0)

Config/testthat/edition:

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.3.1

Collate:

'erpm-package.R' 'functions_utility.R' 'functions_Metropolis.R' 'functions_burninthining.R' 'functions_change_statistics.R' 'functions_estimate.R' 'functions_exactcalculations.R' 'functions_exchange_algorithm.R' 'functions_loglikelihood.R' 'functions_output.R' 'functions_phase1.R' 'functions_phase2.R' 'functions_phase3.R' 'functions_statistics.R' 'functions_visualisation.R' 'outcomeObjects.R'

URL:

https://github.com/stocnet/ERPM

BugReports:

https://github.com/stocnet/ERPM/issues

NeedsCompilation:

Packaged:

2024-05-09 13:58:06 UTC; hoffman

Author:

Marion Hoffman

[cre, aut, cph], Alexandra Amani [aut], Nico Keiser [aut]

Maintainer:

Marion Hoffman <marion.hoffman.31@gmail.com>

Repository:

CRAN

Date/Publication:

2024-05-10 17:53:16 UTC

ERPM: Exponential Random Partition Models

Description

Simulates and estimates the Exponential Random Partition Model presented in the paper Hoffman, Block, and Snijders (2023) doi:10.1177/00811750221145166. It can also be used to estimate longitudinal partitions, following the model proposed in Hoffman and Chabot (2023) doi:10.1016/j.socnet.2023.04.002. The model is an exponential family distribution on the space of partitions (sets of non-overlapping groups) and is called in reference to the Exponential Random Graph Models (ERGM) for networks.

Author(s)

Maintainer: Marion Hoffman marion.hoffman.31@gmail.com (ORCID) [copyright holder]

Authors:

Alexandra Amani
Nico Keiser

Function to calculate the number of partitions with groups of sizes between smin and smax

Description

Function to calculate the number of partitions with groups of sizes between smin and smax

Usage

Bell_constraints(n, smin, smax)

Arguments

n

number of nodes

smin

minimum group size possible in the partition

smax

minimum group size possible in the partition

Value

a numeric

Examples

n <- 6
size_min <- 2
size_max <- 4
Bell_constraints(n,size_min,size_max)

CUP

Description

This function tests a partition statistic against a "conditional uniform partition null hypothesi: It compares a statistic computed on an observed partition and the same statistic computed on a set of permuted partition (partitions with the same group structure as the observed partition, with nodes being permuted).

Usage

CUP(observation, fun, permutations = NULL, num.permutations = 1000)

Arguments

observation

A vector giving the observed partition

fun

A function used to compute a given partition statistic to be computed

permutations

A matrix, whose lines contain partitions which are permutations of the observed partition. This argument is NULL by default (in that case, the permutations are created automatically).

num.permutations

An integer indicating the number of permutations to generate, if they are not already given. 1000 permutations are generated by default.

Details

This test is similar to Conditional Uniform Graph tests in networks (we translate this into Condtional Uniform Partition tests).

Value

The value of the statistic calculated for the observed partition, the mean value of the statistic among permuted partitions, the standard deviation of the statistic among permuted partitions, the proportion of permutation below the observed statistic, the proportion of permutation above the observed statistic, the lower boundary of the 95% CI, the upper boundary of the 95% CI

Examples

p <- c(1,2,2,3,3,4,4,4,5)
at <- c(0,1,1,1,1,0,0,0,0)
CUP(p,fun=function(x){same_pairs(x,at,'avg_pergroup')})

Function to calculate the number of partitions with k groups of sizes between smin and smax

Description

Function to calculate the number of partitions with k groups of sizes between smin and smax

Usage

Stirling2_constraints(n, k, smin, smax)

Arguments

n

number of nodes

k

number of groups

smin

minimum group size possible in the partition

smax

maximum group size possible in the partition

Value

a numeric

Examples

n <- 6
k <- 2
size_min <- 2
size_max <- 4
Stirling2_constraints(n,k,size_min,size_max)

Calculate Dirichlet denominator

Description

Recursive function to calculate the denominator for the model with a single statistic for the number of groups and a given parameter value. The set of possible partitions can be restricted to partitions with groups of a certain size.

Usage

calculate_denominator_Dirichlet_restricted(n, smin, smax, alpha, results)

Arguments

n

number of nodes

smin

minimum size for a group

smax

maximum size for a group

alpha

parameter value

results

a list

Value

a numeric

Calculate Dirichlet probability

Description

Calculate the probability of observing a partition with a given number of groups for a model with a single statistic for the number of groups and a given parameter value. The set of possible partitions can be restricted to partitions with groups of a certain size.

Usage

calculate_proba_Dirichlet_restricted(alpha, stat, n, smin, smax)

Arguments

alpha

parameter value

stat

observed stat (number of groups)

n

number of nodes

smin

minimum size for a group

smax

maximum size for a group

Value

a numeric

Function to determine whether a partition contains the allowed group sizes

Description

Function to determine whether a partition contains the allowed group sizes

Usage

check_sizes(partition, sizes.allowed, numgroups.allowed)

Arguments

partition

observed partition

sizes.allowed

vector containing possible group sizes in the partition

numgroups.allowed

vector containing possible number of groups in the partition

Value

boolean

Compute Statistics

Description

Function that computes the statistic vector for a given partition and a given model

Usage

computeStatistics(partition, nodes, effects, objects)

Arguments

partition

vector, A partition

nodes

data frame, Node set

effects

list with a vector "names", and a vector "objects", Effects/sufficient statistics

objects

list with a vector "name", and a vector "object", Objects used for statistics calculation

Value

the statistics

Compute Statistics multiple

Description

Function that computes the statistic vector for given (multiple) partitions and a given model

Usage

computeStatistics_multiple(
  partitions,
  presence.tables,
  nodes,
  effects,
  objects,
  single.obs = NULL
)

Arguments

partitions

Observed partitions

presence.tables

to indicate which nodes were present when

nodes

Node set (data frame)

effects

Effects/sufficient statistics (list with a vector "names", and a vector "objects")

objects

Objects used for statistics calculation (list with a vector "name", and a vector "object")

single.obs

equal NULL by default

Value

A list

Compute the average size of a random partition

Description

Recursive function to compute the average size of a random partition for a given number of nodes

Usage

compute_averagesize(num.nodes)

Arguments

num.nodes

number of nodes

Value

a numeric

Examples

n <- 6
compute_averagesize(n)

Compute denominator for model with number of groups

Description

Recursive function to compute the value of the denominator for the model with a single statistic which is the number of groups

Usage

compute_numgroups_denominator(num.nodes, alpha)

Arguments

num.nodes

number of nodes

alpha

parameter value

Value

a numeric

Between groups correlation

Description

This function computes the correlation between the group averages of the two attributes.

Usage

correlation_between(partition, attribute1, attribute2)

Arguments

partition

A partition (vector)

attribute1

A vector containing the values of the first attribute

attribute2

A vector containing the values of the second attribute

Value

A number corresponding to the correlation coefficient

Examples

p <- c(1,2,2,3,3,4,4,4,5)
at <- c(3,5,23,2,1,0,3,9,2)
at2 <- c(3,5,20,2,1,0,0,9,0)
correlation_between(p,at,at2)

Correlation with size

Description

This function computes the correlation between an attribute and the size of the groups.

Usage

correlation_with_size(partition, attribute, categorical)

Arguments

partition

A partition (vector)

attribute

A vector containing the values of the attribute

categorical

A Boolean (True or False) indicating if the attribute is categorical

Value

A number corresponding to the correlation coefficient if the attribute is numerical or the correlation ratio if the attribute is categorical.

Examples

p <- c(1,2,2,3,3,4,4,4,5)
at <- c(3,5,23,2,1,0,3,9,2)
correlation_with_size(p,at,categorical=FALSE)

Within groups correlation

Description

This function computes the correlation between the two attributes for individuals in the same group.

Usage

correlation_within(partition, attribute1, attribute2, group)

Arguments

partition

A partition (vector)

attribute1

A vector containing the values of the first attribute

attribute2

A vector containing the values of the second attribute

group

A number indicating the selected group

Value

A number corresponding to the correlation coefficient

Examples

p <- c(1,2,2,3,3,4,4,4,5)
at <- c(3,5,23,2,1,0,3,9,2)
at2 <- c(3,5,20,2,1,0,0,9,0)
correlation_within(p,at,at2,4)

Function to count the number of partitions with a certain group size structure, for all possible group size structure. Function to use after calling the "find_all_partitions" function.

Description

Function to count the number of partitions with a certain group size structure, for all possible group size structure. Function to use after calling the "find_all_partitions" function.

Usage

count_classes(allpartitions)

Arguments

allpartitions

matrix containing all possible partitions for a nodeset

Value

integer(number of partitions with different group structures)

Examples

#find partitions first
n <- 6
all_partitions <- find_all_partitions(n)
# count classes
counts_partition_classes <- count_classes(all_partitions)

Draw Metropolis multiple

Description

Function to sample the model with a Markov chain (single partition procedure).

Usage

draw_Metropolis_multiple(
  theta,
  first.partitions,
  presence.tables,
  nodes,
  effects,
  objects,
  burnin,
  thining,
  num.steps,
  neighborhood = c(0.7, 0.3, 0),
  numgroups.allowed,
  numgroups.simulated,
  sizes.allowed,
  sizes.simulated,
  return.all.partitions = FALSE,
  verbose = FALSE
)

Arguments

theta

model parameters

first.partitions

starting partition for the Markov chain

presence.tables

matrix indicating which actors were present for each observations (mandatory)

nodes

node set (data frame)

effects

effects/sufficient statistics (list with a vector "names", and a vector "objects")

objects

objects used for statistics calculation (list with a vector "name", and a vector "object")

burnin

integer for the number of burn-in steps before sampling

thining

integer for the number of thining steps between sampling

num.steps

number of samples

neighborhood

= c(0.7,0.3,0), way of choosing partitions: probability vector (2 actors swap, merge/division, single actor move, single pair move, 2 pairs swap, 2 groups reshuffle)

numgroups.allowed

= NULL, # vector containing the number of groups allowed in the partition (now, it only works with vectors like num_min:num_max)

numgroups.simulated

= NULL, # vector containing the number of groups simulated

sizes.allowed

= NULL, vector of group sizes allowed in sampling (now, it only works for vectors like size_min:size_max)

sizes.simulated

= NULL, vector of group sizes allowed in the Markov chain but not necessraily sampled (now, it only works for vectors like size_min:size_max)

return.all.partitions

= FALSE, option to return the sampled partitions on top of their statistics (for GOF)

verbose

logical: should intermediate results during the estimation be printed or not? Defaults to FALSE.

Value

A list

Examples

# define an arbitrary set of n = 6 nodes with attributes, and an arbitrary covariate matrix
n <- 6 
nodes <- data.frame(label = c("A","B","C","D","E","F"),
                    gender = c(1,1,2,1,2,2),
                    age = c(20,22,25,30,30,31)) 
friendship <- matrix(c(0, 1, 1, 1, 0, 0,
                       1, 0, 0, 0, 1, 0,
                       1, 0, 0, 0, 1, 0,
                       1, 0, 0, 0, 0, 0,
                       0, 1, 1, 0, 0, 1,
                       0, 0, 0, 0, 1, 0), 6, 6, TRUE) 

# specify whether nodes are present at different points of time
presence.tables <- matrix(c(1, 1, 1, 1, 1, 1,
                            0, 1, 1, 1, 1, 1,
                            1, 0, 1, 1, 1, 1), 6, 3)

# choose effects to be included in the estimated model
effects_multiple <- list(names = c("num_groups","same","diff","tie","inertia_1"),
                objects = c("partitions","gender","age","friendship","partitions"),
                objects2 = c("","","","",""))
objects_multiple <- list()
objects_multiple[[1]] <- list(name = "friendship", object = friendship)

# set parameter values for each of these effects
parameters <- c(-0.2,0.2,-0.1,0.5,1)

# set a starting point for the simulation
first.partitions <- matrix(c(1, 1, 2, 2, 2, 3,
                             NA, 1, 1, 2, 2, 2,
                             1, NA, 2, 3, 3, 1), 6, 3) 


# generate the simulated sample
nsteps <- 50
sample <- draw_Metropolis_multiple(theta = parameters, 
                                   first.partitions = first.partitions,
                                   nodes = nodes, 
                                   presence.tables = presence.tables,
                                   effects = effects_multiple, 
                                   objects = objects_multiple, 
                                   burnin = 100, 
                                   thining = 100, 
                                   num.steps = nsteps, 
                                   neighborhood = c(0,1,0), 
                                   numgroups.allowed = 1:n,
                                   numgroups.simulated = 1:n,
                                   sizes.allowed = 1:n,
                                   sizes.simulated = 1:n,
                                   return.all.partitions = TRUE)

Draw Metropolis single

Description

Function to sample the model with a Markov chain (single partition procedure).

Usage

draw_Metropolis_single(
  theta,
  first.partition,
  nodes,
  effects,
  objects,
  burnin,
  thining,
  num.steps,
  neighborhood = c(0.7, 0.3, 0),
  numgroups.allowed = NULL,
  numgroups.simulated = NULL,
  sizes.allowed = NULL,
  sizes.simulated = NULL,
  return.all.partitions = FALSE
)

Arguments

theta

model parameters

first.partition

starting partition for the Markov chain

nodes

nodeset (data frame)

effects

effects/sufficient statistics (list with a vector "names", and a vector "objects")

objects

objects used for statistics calculation (list with a vector "name", and a vector "object")

burnin

integer for the number of burn-in steps before sampling

thining

integer for the number of thining steps between sampling

num.steps

number of samples

neighborhood

= c(0.7,0.3,0), way of choosing partitions: probability vector (2 actors swap, merge/division, single actor move, single pair move, 2 pairs swap, 2 groups reshuffle)

numgroups.allowed

= NULL, # vector containing the number of groups allowed in the partition (now, it only works with vectors like num_min:num_max)

numgroups.simulated

= NULL, # vector containing the number of groups simulated

sizes.allowed

= NULL, vector of group sizes allowed in sampling (now, it only works for vectors like size_min:size_max)

sizes.simulated

= NULL, vector of group sizes allowed in the Markov chain but not necessraily sampled (now, it only works for vectors like size_min:size_max)

return.all.partitions

= FALSE option to return the sampled partitions on top of their statistics (for GOF)

Value

A list

Examples

# define an arbitrary set of n = 6 nodes with attributes, and an arbitrary covariate matrix
n <- 6 
nodes <- data.frame(label = c("A","B","C","D","E","F"),
                    gender = c(1,1,2,1,2,2),
                    age = c(20,22,25,30,30,31)) 
friendship <- matrix(c(0, 1, 1, 1, 0, 0,
                       1, 0, 0, 0, 1, 0,
                       1, 0, 0, 0, 1, 0,
                       1, 0, 0, 0, 0, 0,
                       0, 1, 1, 0, 0, 1,
                       0, 0, 0, 0, 1, 0), 6, 6, TRUE)

# choose the effects to be included (see manual for all effect names)
effects <- list(names = c("num_groups","same","diff","tie"),
objects = c("partition","gender","age","friendship"))
objects <- list()
objects[[1]] <- list(name = "friendship", object = friendship)

# set parameter values for each of these effects
parameters <- c(-0.2, 0.2, -0.1, 0.5)


# generate simulated sample, by setting the desired additional parameters for the 
# Metropolis sampler and choosing a starting point for the chain (first.partition)
nsteps <- 100
sample <- draw_Metropolis_single(theta = parameters, 
                                 first.partition = c(1,1,2,2,3,3), 
                                 nodes = nodes, 
                                 effects = effects, 
                                 objects = objects, 
                                 burnin = 100, 
                                 thining = 10, 
                                 num.steps = nsteps, 
                                 neighborhood = c(0,1,0), 
                                 numgroups.allowed = 1:n,
                                 numgroups.simulated = 1:n,
                                 sizes.allowed = 1:n,
                                 sizes.simulated = 1:n,
                                 return.all.partitions = TRUE)


# or: simulate an estimated model
partition <- c(1,1,2,2,2,3) # the partition already defined for the (previous) estimation
nsimulations <- 1000
simulations <- draw_Metropolis_single(theta = estimation$results$est, 
                                      first.partition = partition, 
                                      nodes = nodes, 
                                      effects = effects, 
                                      objects = objects, 
                                      burnin = 100, 
                                      thining = 20, 
                                      num.steps = nsimulations, 
                                      neighborhood = c(0,1,0), 
                                      sizes.allowed = 1:n,
                                      sizes.simulated = 1:n,
                                      return.all.partitions = TRUE)

Estimate ERPM

Description

Function to estimate a given model for a given observed partition. All options of the algorithm can be specified here.

Usage

estimate_ERPM(
  partition,
  nodes,
  objects,
  effects,
  startingestimates,
  gainfactor = 0.1,
  a.scaling = 0.8,
  r.truncation.p1 = -1,
  r.truncation.p2 = -1,
  burnin = 30,
  thining = 10,
  length.p1 = 100,
  min.iter.p2 = NULL,
  max.iter.p2 = NULL,
  multiplication.iter.p2 = 100,
  num.steps.p2 = 6,
  length.p3 = 1000,
  neighborhood = c(0.7, 0.3, 0),
  fixed.estimates = NULL,
  numgroups.allowed = NULL,
  numgroups.simulated = NULL,
  sizes.allowed = NULL,
  sizes.simulated = NULL,
  double.averaging = FALSE,
  inv.zcov = NULL,
  inv.scaling = NULL,
  parallel = FALSE,
  parallel2 = FALSE,
  cpus = 1,
  verbose = FALSE
)

Arguments

partition

observed partition

nodes

nodeset (data frame)

objects

objects used for statistics calculation (list with a vector "name", and a vector "object")

effects

effects/sufficient statistics (list with a vector "names", and a vector "objects")

startingestimates

first guess for the model parameters

gainfactor

numeric used to decrease the size of steps made in the Newton optimization

a.scaling

numeric used to reduce the influence of non-diagonal elements in the scaling matrix (for stability)

r.truncation.p1

numeric used to limit extreme values in the covariance matrix (for stability)

r.truncation.p2

numeric used to limit extreme values in the covariance matrix (for stability)

burnin

integer for the number of burn-in steps before sampling

thining

integer for the number of thining steps between sampling

length.p1

number of samples in phase 1

min.iter.p2

minimum number of sub-steps in phase 2

max.iter.p2

maximum number of sub-steps in phase 2

multiplication.iter.p2

value for the lengths of sub-steps in phase 2 (multiplied by 2.52^k)

num.steps.p2

number of optimisation steps in phase 2

length.p3

number of samples in phase 3

neighborhood

way of choosing partitions: probability vector (actors swap, merge/division, single actor move)

fixed.estimates

if some parameters are fixed, list with as many elements as effects, these elements equal a fixed value if needed, or NULL if they should be estimated

numgroups.allowed

vector containing the number of groups allowed in the partition (now, it only works with vectors like num_min:num_max)

numgroups.simulated

vector containing the number of groups simulated

sizes.allowed

vector of group sizes allowed in sampling (now, it only works for vectors like size_min:size_max)

sizes.simulated

vector of group sizes allowed in the Markov chain but not necessraily sampled (now, it only works for vectors like size_min:size_max)

double.averaging

option to average the statistics sampled in each sub-step of phase 2

inv.zcov

initial value of the inverted covariance matrix (if a phase 3 was run before) to bypass the phase 1

inv.scaling

initial value of the inverted scaling matrix (if a phase 3 was run before) to bypass the phase 1

parallel

whether the phase 1 and 3 should be parallelized

parallel2

whether there should be several phases 2 run in parallel

cpus

how many cores can be used

verbose

logical: should intermediate results during the estimation be printed or not? Defaults to FALSE.

Value

A list with the outputs of the three different phases of the algorithm

Examples

# define an arbitrary set of n = 6 nodes with attributes, and an arbitrary covariate matrix
n <- 6 
nodes <- data.frame(label = c("A","B","C","D","E","F"),
                    gender = c(1,1,2,1,2,2),
                    age = c(20,22,25,30,30,31)) 
friendship <- matrix(c(0, 1, 1, 1, 0, 0,
                       1, 0, 0, 0, 1, 0,
                       1, 0, 0, 0, 1, 0,
                       1, 0, 0, 0, 0, 0,
                       0, 1, 1, 0, 0, 1,
                       0, 0, 0, 0, 1, 0), 6, 6, TRUE)

# choose the effects to be included (see manual for all effect names)
effects <- list(names = c("num_groups","same","diff","tie"),
                objects = c("partition","gender","age","friendship"))
objects <- list()
objects[[1]] <- list(name = "friendship", object = friendship)

# define observed partition
partition <- c(1,1,2,2,2,3)


# estimate
startingestimates <- c(-2,0,0,0)
estimation <- estimate_ERPM(partition, 
                            nodes, 
                            objects, 
                            effects, 
                            startingestimates = startingestimates, 
                            burnin = 100, 
                            thining = 20,
                            length.p1 = 500, # number of samples in phase 1
                            multiplication.iter.p2 = 20, # iterations in phase 2
                            num.steps.p2 = 4, # number of phase 2 subphases
                            length.p3 = 1000) # number of samples in phase 3

# get results table
estimation

Estimate log likelihood

Description

Function to estimate the log likelihood of a model for an observed partition

Usage

estimate_logL(
  partition,
  nodes,
  effects,
  objects,
  theta,
  theta_0,
  M,
  num.steps,
  burnin,
  thining,
  neighborhoods = c(0.7, 0.3, 0),
  numgroups.allowed = NULL,
  numgroups.simulated = NULL,
  sizes.allowed = NULL,
  sizes.simulated = NULL,
  logL_0 = NULL,
  parallel = FALSE,
  cpus = 1,
  verbose = FALSE
)

Arguments

partition

observed partition

nodes

node set (data frame)

effects

effects/sufficient statistics (list with a vector "names", and a vector "objects")

objects

objects used for statistics calculation (list with a vector "name", and a vector "object")

theta

estimated model parameters

theta_0

model parameters if all other effects than "num-groups" are fixed to 0 (basic Dirichlet partition model)

M

number of steps in the path-sampling algorithm

num.steps

number of samples in each step

burnin

integer for the number of burn-in steps before sampling

thining

integer for the number of thining steps between sampling

neighborhoods

= c(0.7,0.3,0) way of choosing partitions

numgroups.allowed

= NULL, # vector containing the number of groups allowed in the partition (now, it only works with vectors like num_min:num_max)

numgroups.simulated

= NULL, # vector containing the number of groups simulated

sizes.allowed

= NULL, vector of group sizes allowed in sampling (now, it only works for vectors like size_min:size_max)

sizes.simulated

= NULL, vector of group sizes allowed in the Markov chain but not necessraily sampled (now, it only works for vectors like size_min:size_max)

logL_0

= NULL, if known, the value of the log likelihood of the basic dirichlet model

parallel

= FALSE, indicating whether the code should be run in parallel

cpus

= 1, number of cpus required for the parallelization

verbose

= FALSE, to print the current step the algorithm is in

Value

List with the log likelihood , AIC, lambda and the draws

Examples

# estimate the log-likelihood and AIC of an estimated model (e.g. useful to compare two models)

# define an arbitrary set of n = 6 nodes with attributes, and an arbitrary covariate matrix
n <- 6
nodes <- data.frame(label = c("A","B","C","D","E","F"),
                    gender = c(1,1,2,1,2,2),
                    age = c(20,22,25,30,30,31)) 
friendship <- matrix(c(0, 1, 1, 1, 0, 0,
                       1, 0, 0, 0, 1, 0,
                       1, 0, 0, 0, 1, 0,
                       1, 0, 0, 0, 0, 0,
                       0, 1, 1, 0, 0, 1,
                       0, 0, 0, 0, 1, 0), 6, 6, TRUE)

# choose the effects to be included (see manual for all effect names)
effects <- list(names = c("num_groups","same","diff","tie"),
objects = c("partition","gender","age","friendship"))
objects <- list()
objects[[1]] <- list(name = "friendship", object = friendship)

# define observed partition 
partition <- c(1,1,2,2,2,3)
# (an exemplary estimation is internally stored in order to save time)

# first: estimate the ML estimates of a simple model with only one parameter 
# for number of groups (this parameter should be in the model!)
likelihood_function <- function(x){ exp(x*max(partition)) / compute_numgroups_denominator(n,x)}
curve(likelihood_function, from=-2, to=0)
parameter_base <- optimize(likelihood_function, interval=c(-2, 0), maximum=TRUE)
parameters_basemodel <- c(parameter_base$maximum,0,0,0)


# estimate logL and AIC
logL_AIC <- estimate_logL(partition,
                          nodes,
                          effects, 
                          objects,
                          theta = estimation$results$est,
                          theta_0 = parameters_basemodel,
                          M = 3,
                          num.steps = 200,
                          burnin = 100,
                          thining = 20)
logL_AIC$logL
logL_AIC$AIC

Estimate ERPM for multiple observations

Description

Function to estimate a given model for given observed (multiple) partitions. All options of the algorithm can be specified here.

Usage

estimate_multipleERPM(
  partitions,
  presence.tables,
  nodes,
  objects,
  effects,
  startingestimates,
  gainfactor = 0.1,
  a.scaling = 0.8,
  r.truncation.p1 = -1,
  r.truncation.p2 = -1,
  burnin = 30,
  thining = 10,
  length.p1 = 100,
  min.iter.p2 = NULL,
  max.iter.p2 = NULL,
  multiplication.iter.p2 = 200,
  num.steps.p2 = 6,
  length.p3 = 1000,
  neighborhood = c(0.7, 0.3, 0),
  fixed.estimates = NULL,
  numgroups.allowed = NULL,
  numgroups.simulated = NULL,
  sizes.allowed = NULL,
  sizes.simulated = NULL,
  double.averaging = FALSE,
  inv.zcov = NULL,
  inv.scaling = NULL,
  parallel = FALSE,
  parallel2 = FALSE,
  cpus = 1,
  verbose = FALSE
)

Arguments

partitions

observed partitions

presence.tables

XXX

nodes

nodeset (data frame)

objects

objects used for statistics calculation (list with a vector "name", and a vector "object")

effects

effects/sufficient statistics (list with a vector "names", and a vector "objects")

startingestimates

first guess for the model parameters

gainfactor

numeric used to decrease the size of steps made in the Newton optimization

a.scaling

numeric used to reduce the influence of non-diagonal elements in the scaling matrix (for stability)

r.truncation.p1

numeric used to limit extreme values in the covariance matrix (for stability)

r.truncation.p2

numeric used to limit extreme values in the covariance matrix (for stability)

burnin

integer for the number of burn-in steps before sampling

thining

integer for the number of thining steps between sampling

length.p1

number of samples in phase 1

min.iter.p2

minimum number of sub-steps in phase 2

max.iter.p2

maximum number of sub-steps in phase 2

multiplication.iter.p2

value for the lengths of sub-steps in phase 2 (multiplied by 2.52^k)

num.steps.p2

number of optimisation steps in phase 2

length.p3

number of samples in phase 3

neighborhood

way of choosing partitions: probability vector (actors swap, merge/division, single actor move)

fixed.estimates

if some parameters are fixed, list with as many elements as effects, these elements equal a fixed value if needed, or NULL if they should be estimated

numgroups.allowed

vector containing the number of groups allowed in the partition (now, it only works with vectors like num_min:num_max)

numgroups.simulated

vector containing the number of groups simulated

sizes.allowed

vector of group sizes allowed in sampling (now, it only works for vectors like size_min:size_max)

sizes.simulated

vector of group sizes allowed in the Markov chain but not necessraily sampled (now, it only works for vectors like size_min:size_max)

double.averaging

option to average the statistics sampled in each sub-step of phase 2

inv.zcov

initial value of the inverted covariance matrix (if a phase 3 was run before) to bypass the phase 1

inv.scaling

initial value of the inverted scaling matrix (if a phase 3 was run before) to bypass the phase 1

parallel

whether the phase 1 and 3 should be parallelized

parallel2

whether there should be several phases 2 run in parallel

cpus

how many cores can be used

verbose

logical: should intermediate results during the estimation be printed or not? Defaults to FALSE.

Value

A list with the outputs of the three different phases of the algorithm

Examples

# define an arbitrary set of n = 6 nodes with attributes, and an arbitrary covariate matrix
n <- 6 
nodes <- data.frame(label = c("A","B","C","D","E","F"),
                    gender = c(1,1,2,1,2,2),
                    age = c(20,22,25,30,30,31)) 
friendship <- matrix(c(0, 1, 1, 1, 0, 0,
                       1, 0, 0, 0, 1, 0,
                       1, 0, 0, 0, 1, 0,
                       1, 0, 0, 0, 0, 0,
                       0, 1, 1, 0, 0, 1,
                       0, 0, 0, 0, 1, 0), 6, 6, TRUE) 

# specify whether nodes are present at different points of time
presence.tables <- matrix(c(1, 1, 1, 1, 1, 1,
                            0, 1, 1, 1, 1, 1,
                            1, 0, 1, 1, 1, 1), 6, 3)

# choose effects to be included in the estimated model
effects_multiple <- list(names = c("num_groups","same","diff","tie","inertia_1"),
                objects = c("partitions","gender","age","friendship","partitions"),
                objects2 = c("","","","",""))
objects_multiple <- list()
objects_multiple[[1]] <- list(name = "friendship", object = friendship)

# define the observation
partitions <- matrix(c(1, 1, 2, 2, 2, 3,
                       NA, 1, 1, 2, 2, 2,
                       1, NA, 2, 3, 3, 1), 6, 3) 


# estimate
startingestimates <- c(-2,0,0,0,0)
estimation <- estimate_multipleERPM(partitions,
                                    presence.tables,          
                                    nodes, 
                                    objects_multiple, 
                                    effects_multiple, 
                                    startingestimates = startingestimates, 
                                    burnin = 100, 
                                    thining = 50,
                                    gainfactor = 0.6,
                                    length.p1 = 200, 
                                    multiplication.iter.p2 = 20, 
                                    num.steps.p2 = 4, 
                                    length.p3 = 1000) 

# get results table
estimation

Exact estimates number of groups

Description

This function finds the best estimate for a model only including the statistics of number of groups. It does a grid search for a vector of potential parameters, for all numbers of groups.

Usage

exactestimates_numgroups(num.nodes, pmin, pmax, pinc)

Arguments

num.nodes

number of nodes

pmin

lowest parameter value

pmax

highest parameter value

pinc

increment between different parameter values

Value

a list

Function to enumerate all possible partitions for a given n

Description

Function to enumerate all possible partitions for a given n

Usage

find_all_partitions(n)

Arguments

n

number of nodes

Value

matrix where each line corresponds to a possible partition

Examples

n <- 6
all_partitions <- find_all_partitions(n)

Grid - search burnin single

Description

Function that can be used to find a good length for the burn-in of the Markov chain for a given model and differents sets of transitions in the chain (the neighborhoods). For each neighborhood, it draws a chain and calculates the mean statistics for different burn-ins.

Usage

gridsearch_burnin_single(
  partition,
  theta,
  nodes,
  effects,
  objects,
  num.steps,
  neighborhoods,
  numgroups.allowed,
  numgroups.simulated,
  sizes.allowed,
  sizes.simulated,
  parallel = FALSE,
  cpus = 1
)

Arguments

partition

A partition (vector)

theta

Initial model parameters

nodes

Node set (data frame)

effects

Effects/sufficient statistics (list with a vector "names", and a vector "objects")

objects

Objects used for statistics calculation (list with a vector "name", and a vector "object")

num.steps

Number of samples wanted

neighborhoods

List of probability vectors (proba actors swap, proba merge/division, proba single actor move)

numgroups.allowed

= NULL, # vector containing the number of groups allowed in the partition (now, it only works with vectors like num_min:num_max)

numgroups.simulated

vector containing the number of groups simulated

sizes.allowed

Vector of group sizes allowed in sampling (now, it only works for vectors like size_min:size_max)

sizes.simulated

Vector of group sizes allowed in the Markov chain but not necessraily sampled (now, it only works for vectors like size_min:size_max)

parallel

False, to run different neighborhoods in parallel

cpus

Equal to 1

Value

all simulations

Grid - search burnin thining multiple

Description

Function that simulates the Markov chain for a given model and several sets of transitions (the neighborhoods), for multiple partitions. For each neighborhood, it calculates the autocorrelation of statistics for different thinings and the average statistics for different burn-ins. Then the best neighborhood can be selected along with good values for burn-in and thining

Usage

gridsearch_burninthining_multiple(
  partitions,
  presence.tables,
  theta,
  nodes,
  effects,
  objects,
  num.steps,
  neighborhoods,
  numgroups.allowed,
  numgroups.simulated,
  sizes.allowed,
  sizes.simulated,
  max.thining,
  parallel = FALSE,
  cpus = 1
)

Arguments

partitions

Observed partitions

presence.tables

Presence of nodes

theta

Initial model parameters

nodes

Node set (data frame)

effects

Effects/sufficient statistics (list with a vector "names", and a vector "objects")

objects

Objects used for statistics calculation (list with a vector "name", and a vector "object")

num.steps

Number of samples wanted

neighborhoods

List of probability vectors (proba actors swap, proba merge/division, proba single actor move)

numgroups.allowed

vector containing the number of groups allowed in the partition (now, it only works with vectors like num_min:num_max)

numgroups.simulated

vector containing the number of groups simulated

sizes.allowed

Vector of group sizes allowed in sampling (now, it only works for vectors like size_min:size_max)

sizes.simulated

Vector of group sizes allowed in the Markov chain but not necessraily sampled (now, it only works for vectors like size_min:size_max)

max.thining

Where to stop adding thining

parallel

False, to run different neighborhoods in parallel

cpus

Equal to 1

Value

list

Grid - search burnin thining single

Description

Function that simulates the Markov chain for a given model and several sets of transitions (the neighborhoods), for a single partition. For each neighborhood, it calculates the autocorrelation of statistics for different thinings and the average statistics for different burn-ins. Then the best neighborhood can be selected along with good values for burn-in and thining

Usage

gridsearch_burninthining_single(
  partition,
  theta,
  nodes,
  effects,
  objects,
  num.steps,
  neighborhoods,
  numgroups.allowed,
  numgroups.simulated,
  sizes.allowed,
  sizes.simulated,
  max.thining,
  parallel = FALSE,
  cpus = 1
)

Arguments

partition

A partition (vector)

theta

Initial model parameters

nodes

Node set (data frame)

effects

Effects/sufficient statistics (list with a vector "names", and a vector "objects")

objects

Objects used for statistics calculation (list with a vector "name", and a vector "object")

num.steps

Number of samples wanted

neighborhoods

List of probability vectors (proba actors swap, proba merge/division, proba single actor move)

numgroups.allowed

vector containing the number of groups allowed in the partition (now, it only works with vectors like num_min:num_max)

numgroups.simulated

vector containing the number of groups simulated

sizes.allowed

Vector of group sizes allowed in sampling (now, it only works for vectors like size_min:size_max)

sizes.simulated

Vector of group sizes allowed in the Markov chain but not necessraily sampled (now, it only works for vectors like size_min:size_max)

max.thining

Where to stop adding thining

parallel

False, to run different neighborhoods in parallel

cpus

Equal to 1

Value

list

Grid - search thining single

Description

Function that can be used to find a good length for the thining of the Markov chain for a given model and differents sets of transitions in the chain (the neighborhoods). For each neighborhood, it draws a chain and calculates the autocorrelation of statistics for different thinings.

Usage

gridsearch_thining_single(
  partition,
  theta,
  nodes,
  effects,
  objects,
  num.steps,
  neighborhoods,
  numgroups.allowed,
  numgroups.simulated,
  sizes.allowed,
  sizes.simulated,
  burnin,
  max.thining,
  parallel = FALSE,
  cpus = 1
)

Arguments

partition

A partition (vector)

theta

Initial model parameters

nodes

Node set (data frame)

effects

Effects/sufficient statistics (list with a vector "names", and a vector "objects")

objects

Objects used for statistics calculation (list with a vector "name", and a vector "object")

num.steps

Number of samples wanted

neighborhoods

List of probability vectors (proba actors swap, proba merge/division, proba single actor move)

numgroups.allowed

vector containing the number of groups allowed in the partition (now, it only works with vectors like num_min:num_max)

numgroups.simulated

vector containing the number of groups simulated

sizes.allowed

Vector of group sizes allowed in sampling (now, it only works for vectors like size_min:size_max)

sizes.simulated

Vector of group sizes allowed in the Markov chain but not necessraily sampled (now, it only works for vectors like size_min:size_max)

burnin

length of the burn-in period

max.thining

maximal value for the thining to be tested

parallel

False, to run different neighborhoods in parallel

cpus

Equal to 1

Value

all simulations

Statistics on the size of groups in a partition

Description

This function computes the average or the standard deviation of the size of groups in a partition.

Usage

group_size(partition, stat)

Arguments

partition

A partition (vector)

stat

The statistic to compute : 'avg' for average and 'sd' for standard deviation

Value

A number corresponding to the correlation coefficient if the attribute is numerical or the correlation ratio if the attribute is categorical.

Examples

p <- c(1,2,2,3,3,4,4,4,5)
group_size(p,'avg')
group_size(p,'sd')

Intra class correlation

Description

This function computes the intra class correlation correlation of attributes for 2 randomly drawn individuals in the same group.

Usage

icc(partition, attribute)

Arguments

partition

A partition

attribute

A vector containing the values of the attribute

Value

A number corresponding to the ICC

Examples

p <- c(1,2,2,3,3,4,4,4,5)
at <- c(3,5,23,2,1,0,3,9,2)
icc(p, at)

Number of individuals having an attribute

Description

This function computes the total number of individuals being in a category of an attribute in a partition. It also computes the sum of the proportion in each group of individuals being in a category.

Usage

number_categories(partition, attribute, stat, category)

Arguments

partition

A partition (vector)

attribute

A vector containing the values of the attribute

stat

The statistic to compute : 'avg' for the sum of proportion per group and 'sum' for the total number

category

The category to consider or category = 'all' if all categories have to be considered

Value

The statisic chosen in stat depending on the value of category. If category = 'all', returns a vector.

Examples

p <- c(1,2,2,3,3,4,4,4,5)
at <- c(1,0,0,0,1,1,0,0,1)
number_categories(p,at,'avg','all')

Same pairs of individuals in a partition

Description

This function computes the number of ties.

Usage

number_ties(partition, dyadic_attribute, stat)

Arguments

partition

A partition (vector)

dyadic_attribute

A matrix containing the values of the attribute

stat

The statistic to compute : 'avg_pergroup' for the average per group , 'sum_pergroup' for the sum, 'sum_perind' and 'avg_perind' for the number of ties per individuals each individual has in its group.

Value

The statisic chosen in stat

Examples

p <- c(1,2,2,3,3,4)
v <- c(0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0)
at <- matrix(v,6,6, byrow = TRUE)
number_ties(p,at,'avg_pergroup')

Function to replace the ids of the group without forgetting an id and put in the first appearance order for example: `⁠[2 1 1 4 2]⁠` becomes `⁠[1 2 2 3 1]⁠`

Description

Function to replace the ids of the group without forgetting an id and put in the first appearance order for example: ⁠[2 1 1 4 2]⁠ becomes ⁠[1 2 2 3 1]⁠

Usage

order_groupids(partition)

Arguments

partition

observed partition

Value

a vector (partition)

Exemplary outcome objects for the ERPM Package

Description

These are exemplary outcome objects for the ERPM package and can be used in order not to run all precedent functions and thus save time. The following products are provided:

Format

estimation An results object created by the function estimate_ERPM().

Core function for Phase 1

Description

Core function for Phase 1

Usage

phase1(
  startingestimates,
  inv.zcov,
  inv.scaling,
  z.phase1,
  z.obs,
  nodes,
  effects,
  objects,
  r.truncation.p1,
  length.p1,
  fixed.estimates,
  verbose = FALSE
)

Arguments

startingestimates

vector containing initial parameter values

inv.zcov

inverted covariance matrix

inv.scaling

scaling matrix

z.phase1

statistics retrieved from phase 1

z.obs

observed statistics

nodes

node set (data frame)

effects

effects/sufficient statistics (list with a vector "names", and a vector "objects")

objects

objects used for statistics calculation (list with a vector "name", and a vector "object")

r.truncation.p1

numeric used to limit extreme values in the covariance matrix (for stability)

length.p1

number of samples in phase 1

fixed.estimates

if some parameters are fixed, list with as many elements as effects, these elements equal a fixed value if needed, or NULL if they should be estimated

verbose

logical: should intermediate results during the estimation be printed or not? Defaults to FALSE.

Value

estimated parameters after phase 1

Plot average sizes

Description

Function to plot the average size of a random partition depending on the number of nodes

Usage

plot_averagesizes(nmin, nmax, ninc)

Arguments

nmin

minimum number of nodes

nmax

maximum number of nodes

ninc

increment between the different number of nodes

Value

a vector

Plot likelihood of number groups

Description

Function to plot the log-likelihood of the model with a single statistic (number of groups) depending on the parameter value for this statistic

Usage

plot_numgroups_likelihood(m.obs, num.nodes, pmin, pmax, pinc)

Arguments

m.obs

observed number of groups

num.nodes

number of nodes

pmin

lowest parameter value

pmax

highest parameter value

pinc

increment between different parameter values

Value

a vector

Visualization of partition

Description

This function plot the groups of a partition

Usage

plot_partition(
  partition,
  title = NULL,
  group.color = NULL,
  attribute.color = NULL,
  attribute.shape = NULL
)

Arguments

partition

A partition (vector)

title

Character, the title of the plot (default=NULL)

group.color

A vector with the colors of the groups (default=NULL)

attribute.color

A vector, attribute to represent with colors (default=NULL)

attribute.shape

A vector, attribute to represent with shapes (default=NULL)

Value

A plot of the partition

Examples

p <- c(1,1,1,2,2,2,2,3,3,3,4,4,4,4,4,4)
attr1 <- c(1,0,0,1,0,0,1,0,1,0,1,1,1,1,1,2)
attr2 <- c(1,1,1,1,0,0,3,0,1,0,1,1,1,1,1,2)
plot_partition(p,attribute.color = attr1, attribute.shape = attr2)

Print results of bayesian estimation (beta version)

Description

Print results of bayesian estimation (beta version)

Usage

## S3 method for class 'results.bayesian.erpm'
print(x, ...)

Arguments

x

output of the bayesian estimate function

...

For internal use only.

Value

a data frame

Print estimation results

Description

Print estimation results

Usage

## S3 method for class 'results.list.erpm'
print(x, ...)

Arguments

x

output of the estimate function

...

For internal use only.

Value

a data frame

Print results of estimation of phase 3

Description

Print results of estimation of phase 3

Usage

## S3 method for class 'results.p3.erpm'
print(x, ...)

Arguments

x

output of the estimate function

...

For internal use only.

Value

a data frame

Proportion of isolates

Description

This function computes the proportion of individuals not joining others.

Usage

proportion_isolate(partition)

Arguments

partition

A partition (vector)

Value

A number corresponding to proportion of individuals alone.

Examples

p <- c(1,2,2,3,3,4,4,4,5)
proportion_isolate(p)

Range of attribute in groups

Description

This function computes the sum or the average range of an attribute for groups in a partition.

Usage

range_attribute(partition, attribute, stat)

Arguments

partition

A partition (vector)

attribute

A vector containing the values of the attribute

stat

The statistic to compute : 'avg_pergroup' for the average per group and 'sum_pergroup' for the sum of the ranges

Value

The statisic chosen in stat

Examples

p <- c(1,2,2,3,3,4,4,4,5)
at <- c(3,5,23,2,1,0,3,9,2)
range_attribute(p,at,'avg_pergroup')

Phase 1 wrapper for multiple observations

Description

Phase 1 wrapper for multiple observations

Usage

run_phase1_multiple(
  partitions,
  startingestimates,
  z.obs,
  presence.tables,
  nodes,
  effects,
  objects,
  burnin,
  thining,
  gainfactor,
  a.scaling,
  r.truncation.p1,
  length.p1,
  neighborhood,
  fixed.estimates,
  numgroups.allowed,
  numgroups.simulated,
  sizes.allowed,
  sizes.simulated,
  parallel = FALSE,
  cpus = 1,
  verbose = FALSE
)

Arguments

partitions

observed partitions

startingestimates

vector containing initial parameter values

z.obs

observed statistics

presence.tables

data frame to indicate which times nodes are present in the partition

nodes

node set (data frame)

effects

effects/sufficient statistics (list with a vector "names", and a vector "objects")

objects

objects used for statistics calculation (list with a vector "name", and a vector "object")

burnin

integer for the number of burn-in steps before sampling

thining

integer for the number of thining steps between sampling

gainfactor

gain factor (useless now)

a.scaling

scaling factor

r.truncation.p1

truncation factor (for stability)

length.p1

number of samples for phase 1

neighborhood

vector for the probability of choosing a particular transition in the chain

fixed.estimates

if some parameters are fixed, list with as many elements as effects, these elements equal a fixed value if needed, or NULL if they should be estimated

numgroups.allowed

vector containing the number of groups allowed in the partition (now, it only works with vectors like num_min:num_max)

numgroups.simulated

vector containing the number of groups simulated

sizes.allowed

vector of group sizes allowed in sampling (now, it only works for vectors like size_min:size_max)

sizes.simulated

vector of group sizes allowed in the Markov chain but not necessarily sampled (now, it only works for vectors like size_min:size_max)

parallel

boolean to indicate whether the code should be run in parallel

cpus

number of cpus if parallel = TRUE

verbose

logical: should intermediate results during the estimation be printed or not? Defaults to FALSE.

Value

a list

Phase 1 wrapper for single observation

Description

Phase 1 wrapper for single observation

Usage

run_phase1_single(
  partition,
  startingestimates,
  z.obs,
  nodes,
  effects,
  objects,
  burnin,
  thining,
  gainfactor,
  a.scaling,
  r.truncation.p1,
  length.p1,
  neighborhood,
  fixed.estimates,
  numgroups.allowed,
  numgroups.simulated,
  sizes.allowed,
  sizes.simulated,
  parallel = TRUE,
  cpus = 1,
  verbose = FALSE
)

Arguments

partition

observed partition

startingestimates

vector containing initial parameter values

z.obs

observed statistics

nodes

node set (data frame)

effects

effects/sufficient statistics (list with a vector "names", and a vector "objects")

objects

objects used for statistics calculation (list with a vector "name", and a vector "object")

burnin

integer for the number of burn-in steps before sampling

thining

integer for the number of thining steps between sampling

gainfactor

gain factor (useless now)

a.scaling

scaling factor

r.truncation.p1

truncation factor (for stability)

length.p1

number of samples for phase 1

neighborhood

vector for the probability of choosing a particular transition in the chain

fixed.estimates

if some parameters are fixed, list with as many elements as effects, these elements equal a fixed value if needed, or NULL if they should be estimated

numgroups.allowed

vector containing the number of groups allowed in the partition (now, it only works with vectors like num_min:num_max)

numgroups.simulated

vector containing the number of groups simulated

sizes.allowed

vector of group sizes allowed in sampling (now, it only works for vectors like size_min:size_max)

sizes.simulated

vector of group sizes allowed in the Markov chain but not necessarily sampled (now, it only works for vectors like size_min:size_max)

parallel

boolean to indicate whether the code should be run in parallel

cpus

number of cpus if parallel = TRUE

verbose

logical: should intermediate results during the estimation be printed or not? Defaults to FALSE.

Value

a list

Phase 2 wrapper for multiple observation

Description

Phase 2 wrapper for multiple observation

Usage

run_phase2_multiple(
  partitions,
  estimates.phase1,
  inv.zcov,
  inv.scaling,
  z.obs,
  presence.tables,
  nodes,
  effects,
  objects,
  burnin,
  thining,
  num.steps,
  gainfactors,
  r.truncation.p2,
  min.iter,
  max.iter,
  multiplication.iter,
  neighborhood,
  fixed.estimates,
  numgroups.allowed,
  numgroups.simulated,
  sizes.allowed,
  sizes.simulated,
  double.averaging,
  parallel = FALSE,
  cpus = 1,
  verbose = FALSE
)

Arguments

partitions

observed partitions

estimates.phase1

vector containing parameter values after phase 1

inv.zcov

inverted covariance matrix

inv.scaling

scaling matrix

z.obs

observed statistics

presence.tables

data frame to indicate which times nodes are present in the partition

nodes

node set (data frame)

effects

effects/sufficient statistics (list with a vector "names", and a vector "objects")

objects

objects used for statistics calculation (list with a vector "name", and a vector "object")

burnin

integer for the number of burn-in steps before sampling

thining

integer for the number of thining steps between sampling

num.steps

number of sub-phases in phase 2

gainfactors

vector of gain factors

r.truncation.p2

truncation factor

min.iter

minimum numbers of steps in each subphase

max.iter

maximum numbers of steps in each subphase

multiplication.iter

used to calculate min.iter and max.iter if not specified

neighborhood

vector for the probability of choosing a particular transition in the chain

fixed.estimates

if some parameters are fixed, list with as many elements as effects, these elements equal a fixed value if needed, or NULL if they should be estimated

numgroups.allowed

vector containing the number of groups allowed in the partition (now, it only works with vectors like num_min:num_max)

numgroups.simulated

vector containing the number of groups simulated

sizes.allowed

vector of group sizes allowed in sampling (now, it only works for vectors like size_min:size_max)

sizes.simulated

vector of group sizes allowed in the Markov chain but not necessraily sampled (now, it only works for vectors like size_min:size_max)

double.averaging

boolean to indicate whether we follow the double-averaging procedure (often leads to better convergence)

parallel

boolean to indicate whether the code should be run in parallel

cpus

number of cpus if parallel = TRUE

verbose

logical: should intermediate results during the estimation be printed or not? Defaults to FALSE.

Value

a list

Phase 2 wrapper for single observation

Description

Phase 2 wrapper for single observation

Usage

run_phase2_single(
  partition,
  estimates.phase1,
  inv.zcov,
  inv.scaling,
  z.obs,
  nodes,
  effects,
  objects,
  burnin,
  thining,
  num.steps,
  gainfactors,
  r.truncation.p2,
  min.iter,
  max.iter,
  multiplication.iter,
  neighborhood,
  fixed.estimates,
  numgroups.allowed,
  numgroups.simulated,
  sizes.allowed,
  sizes.simulated,
  double.averaging,
  parallel = FALSE,
  cpus = 1,
  verbose = FALSE
)

Arguments

partition

observed partition

estimates.phase1

vector containing parameter values after phase 1

inv.zcov

inverted covariance matrix

inv.scaling

scaling matrix

z.obs

observed statistics

nodes

node set (data frame)

effects

effects/sufficient statistics (list with a vector "names", and a vector "objects")

objects

objects used for statistics calculation (list with a vector "name", and a vector "object")

burnin

integer for the number of burn-in steps before sampling

thining

integer for the number of thining steps between sampling

num.steps

number of sub-phases in phase 2

gainfactors

vector of gain factors

r.truncation.p2

truncation factor

min.iter

minimum numbers of steps in each subphase

max.iter

maximum numbers of steps in each subphase

multiplication.iter

used to calculate min.iter and max.iter if not specified

neighborhood

vector for the probability of choosing a particular transition in the chain

fixed.estimates

if some parameters are fixed, list with as many elements as effects, these elements equal a fixed value if needed, or NULL if they should be estimated

numgroups.allowed

vector containing the number of groups allowed in the partition (now, it only works with vectors like num_min:num_max)

numgroups.simulated

vector containing the number of groups simulated

sizes.allowed

vector of group sizes allowed in sampling (now, it only works for vectors like size_min:size_max)

sizes.simulated

vector of group sizes allowed in the Markov chain but not necessraily sampled (now, it only works for vectors like size_min:size_max)

double.averaging

boolean to indicate whether we follow the double-averaging procedure (often leads to better convergence)

parallel

boolean to indicate whether the code should be run in parallel

cpus

number of cpus if parallel = TRUE

verbose

logical: should intermediate results during the estimation be printed or not? Defaults to FALSE.

Value

a list

Phase 3 wrapper for multiple observation

Description

Phase 3 wrapper for multiple observation

Usage

run_phase3_multiple(
  partitions,
  estimates.phase2,
  z.obs,
  presence.tables,
  nodes,
  effects,
  objects,
  burnin,
  thining,
  a.scaling,
  length.p3,
  neighborhood,
  numgroups.allowed,
  numgroups.simulated,
  sizes.allowed,
  sizes.simulated,
  fixed.estimates,
  parallel = FALSE,
  cpus = 1,
  verbose = FALSE
)

Arguments

partitions

observed partitions

estimates.phase2

vector containing parameter values after phase 2

z.obs

observed statistics

presence.tables

data frame to indicate which times nodes are present in the partition

nodes

node set (data frame)

effects

effects/sufficient statistics (list with a vector "names", and a vector "objects")

objects

objects used for statistics calculation (list with a vector "name", and a vector "object")

burnin

integer for the number of burn-in steps before sampling

thining

integer for the number of thining steps between sampling

a.scaling

multiplicative factor for out-of-diagonal elements of the covariance matrix

length.p3

number of samples in phase 3

neighborhood

vector for the probability of choosing a particular transition in the chain

numgroups.allowed

vector containing the number of groups allowed in the partition (now, it only works with vectors like num_min:num_max)

numgroups.simulated

vector containing the number of groups simulated

sizes.allowed

vector of group sizes allowed in sampling (now, it only works for vectors like size_min:size_max)

sizes.simulated

vector of group sizes allowed in the Markov chain but not necessraily sampled (now, it only works for vectors like size_min:size_max)

fixed.estimates

if some parameters are fixed, list with as many elements as effects, these elements equal a fixed value if needed, or NULL if they should be estimated

parallel

boolean to indicate whether the code should be run in parallel

cpus

number of cpus if parallel = TRUE

verbose

logical: should intermediate results during the estimation be printed or not? Defaults to FALSE.

Value

a list

Phase 3 wrapper for single observation

Description

Phase 3 wrapper for single observation

Usage

run_phase3_single(
  partition,
  estimates.phase2,
  z.obs,
  nodes,
  effects,
  objects,
  burnin,
  thining,
  a.scaling,
  length.p3,
  neighborhood,
  numgroups.allowed,
  numgroups.simulated,
  sizes.allowed,
  sizes.simulated,
  fixed.estimates,
  parallel = FALSE,
  cpus = 1,
  verbose = FALSE
)

Arguments

partition

observed partition

estimates.phase2

vector containing parameter values after phase 2

z.obs

observed statistics

nodes

node set (data frame)

effects

effects/sufficient statistics (list with a vector "names", and a vector "objects")

objects

objects used for statistics calculation (list with a vector "name", and a vector "object")

burnin

integer for the number of burn-in steps before sampling

thining

integer for the number of thining steps between sampling

a.scaling

multiplicative factor for out-of-diagonal elements of the covariance matrix

length.p3

number of sampled partitions in phase 3

neighborhood

vector for the probability of choosing a particular transition in the chain

numgroups.allowed

vector containing the number of groups allowed in the partition (now, it only works with vectors like num_min:num_max)

numgroups.simulated

vector containing the number of groups simulated

sizes.allowed

vector of group sizes allowed in sampling (now, it only works for vectors like size_min:size_max)

sizes.simulated

vector of group sizes allowed in the Markov chain but not necessraily sampled (now, it only works for vectors like size_min:size_max)

fixed.estimates

if some parameters are fixed, list with as many elements as effects, these elements equal a fixed value if needed, or NULL if they should be estimated

parallel

boolean to indicate whether the code should be run in parallel

cpus

number of cpus if parallel = TRUE

verbose

logical: should intermediate results during the estimation be printed or not? Defaults to FALSE.

Value

a list

Same pairs of individuals in a partition

Description

This function computes the total number, the average number having the same value of a categorical variable and the number of individuals a partition.

Usage

same_pairs(partition, attribute, stat)

Arguments

partition

A partition (vector)

attribute

A vector containing the values of the attribute

stat

The statistic to compute : 'avg_pergroup' for the average, 'sum_pergroup' for the sum, 'sum_perind' and 'avg_perind' for the number of ties per individual each individual has in its group.

Value

The statistic chosen in stat

Examples

p <- c(1,2,2,3,3,4,4,4,5)
at <- c(0,1,1,1,1,0,0,0,0)
same_pairs(p,at,'avg_pergroup')

Similar pairs of individuals in a partition

Description

This function computes the total number, the average number having the close values of a numerical variable and the number of individuals a partition.

Usage

similar_pairs(partition, attribute, stat, threshold)

Arguments

partition

A partition (vector)

attribute

A vector containing the values of the attribute

stat

The statistic to compute : 'avg_pergroup' for the average, 'sum_pergroup' for the sum, 'sum_perind' and 'avg_perind' for individuals

threshold

Threshold to determine if 2 individuals attributes values are close

Value

The statisic chosen in stat

Examples

p <- c(1,2,2,3,3,4,4,4,5)
at <- c(3,5,23,2,1,0,3,9,2)
similar_pairs(p,at,1,'avg_pergroup')

Simulate burn in single

Description

Function that can be used to find a good length for the burn-in of the Markov chain for a given model and a given set of transitions in the chain (the neighborhood). It draws a chain and calculates the mean statistics for different burn-ins.

Usage

simulate_burnin_single(
  partition,
  theta,
  nodes,
  effects,
  objects,
  num.steps,
  neighborhood,
  numgroups.allowed,
  numgroups.simulated,
  sizes.allowed,
  sizes.simulated
)

Arguments

partition

A partition (vector)

theta

Initial model parameters

nodes

Node set (data frame)

effects

Effects/sufficient statistics (list with a vector "names", and a vector "objects")

objects

Objects used for statistics calculation (list with a vector "name", and a vector "object")

num.steps

Number of samples wanted

neighborhood

Way of choosing partitions: probability vector (proba actors swap, proba merge/division, proba single actor move)

numgroups.allowed

vector containing the number of groups allowed in the partition (now, it only works with vectors like num_min:num_max)

numgroups.simulated

vector containing the number of groups simulated

sizes.allowed

Vector of group sizes allowed in sampling (now, it only works for vectors like size_min:size_max)

sizes.simulated

Vector of group sizes allowed in the Markov chain but not necessraily sampled (now, it only works for vectors like size_min:size_max)

Value

A list with list the draws, the moving.means and the moving means smoothed

Simulate burnin thining multiple

Description

Function that simulates the Markov chain for a given model and a set of transitions (the neighborhood), for multiple partitions. It calculates the autocorrelation of statistics for different thinings and the average statistics for different burn-ins.

Usage

simulate_burninthining_multiple(
  partitions,
  presence.tables,
  theta,
  nodes,
  effects,
  objects,
  num.steps,
  neighborhood,
  numgroups.allowed,
  numgroups.simulated,
  sizes.allowed,
  sizes.simulated,
  max.thining,
  verbose = FALSE
)

Arguments

partitions

Observed partitions

presence.tables

to indicate which nodes were present when

theta

Initial model parameters

nodes

Node set (data frame)

effects

Effects/sufficient statistics (list with a vector "names", and a vector "objects")

objects

Objects used for statistics calculation (list with a vector "name", and a vector "object")

num.steps

Number of samples wanted

neighborhood

Way of choosing partitions: probability vector (proba actors swap, proba merge/division, proba single actor move)

numgroups.allowed

vector containing the number of groups allowed in the partition (now, it only works with vectors like num_min:num_max)

numgroups.simulated

vector containing the number of groups simulated

sizes.allowed

Vector of group sizes allowed in sampling (now, it only works for vectors like size_min:size_max)

sizes.simulated

Vector of group sizes allowed in the Markov chain but not necessraily sampled (now, it only works for vectors like size_min:size_max)

max.thining

maximal number of simulated steps in the thining

verbose

logical: should intermediate results during the estimation be printed or not? Defaults to FALSE.

Value

A list

Simulate burnin thining single

Description

Function that simulates the Markov chain for a given model and a set of transitions (the neighborhood), for a single partition. It calculates the autocorrelation of statistics for different thinings and the average statistics for different burn-ins.

Usage

simulate_burninthining_single(
  partition,
  theta,
  nodes,
  effects,
  objects,
  num.steps,
  neighborhood,
  numgroups.allowed,
  numgroups.simulated,
  sizes.allowed,
  sizes.simulated,
  max.thining,
  verbose = FALSE
)

Arguments

partition

Observed partition (vector)

theta

Initial model parameters

nodes

Node set (data frame)

effects

Effects/sufficient statistics (list with a vector "names", and a vector "objects")

objects

Objects used for statistics calculation (list with a vector "name", and a vector "object")

num.steps

Number of samples wanted

neighborhood

Way of choosing partitions: probability vector (proba actors swap, proba merge/division, proba single actor move)

numgroups.allowed

vector containing the number of groups allowed in the partition (now, it only works with vectors like num_min:num_max)

numgroups.simulated

vector containing the number of groups simulated

sizes.allowed

Vector of group sizes allowed in sampling (now, it only works for vectors like size_min:size_max)

sizes.simulated

Vector of group sizes allowed in the Markov chain but not necessraily sampled (now, it only works for vectors like size_min:size_max)

max.thining

maximal number of simulated steps in the thining

verbose

logical: should intermediate results during the estimation be printed or not? Defaults to FALSE.

Value

A list

Simulate thining single

Description

Function that can be used to find a good length for the thining of the Markov chain for a given model and a set of transitions in the chain (the neighborhood). It draws a chain and calculates the autocorrelation of statistics for different thinings.

Usage

simulate_thining_single(
  partition,
  theta,
  nodes,
  effects,
  objects,
  num.steps,
  neighborhood,
  numgroups.allowed,
  numgroups.simulated,
  sizes.allowed,
  sizes.simulated,
  burnin,
  max.thining,
  verbose = FALSE
)

Arguments

partition

A partition (vector)

theta

Initial model parameters

nodes

Node set (data frame)

effects

Effects/sufficient statistics (list with a vector "names", and a vector "objects")

objects

Objects used for statistics calculation (list with a vector "name", and a vector "object")

num.steps

Number of samples wanted

neighborhood

Way of choosing partitions: probability vector (proba actors swap, proba merge/division, proba single actor move)

numgroups.allowed

vector containing the number of groups allowed in the partition (now, it only works with vectors like num_min:num_max)

numgroups.simulated

vector containing the number of groups simulated

sizes.allowed

Vector of group sizes allowed in sampling (now, it only works for vectors like size_min:size_max)

sizes.simulated

Vector of group sizes allowed in the Markov chain but not necessraily sampled (now, it only works for vectors like size_min:size_max)

burnin

number of simulated steps for the burn-in

max.thining

maximal number of simulated steps in the thining

verbose

logical: should intermediate results during the estimation be printed or not? Defaults to FALSE.

Value

A list

ERPM: Exponential Random Partition Models

Description

Author(s)

See Also

Function to calculate the number of partitions with groups of sizes between smin and smax

Description

Usage

Arguments

Value

Examples

CUP

Description

Usage

Arguments

Details

Value

Examples

Function to calculate the number of partitions with k groups of sizes between smin and smax

Description

Usage

Arguments

Value

Examples

Calculate Dirichlet denominator

Description

Usage

Arguments

Value

Calculate Dirichlet probability

Description

Usage

Arguments

Value

Function to determine whether a partition contains the allowed group sizes

Description

Usage

Arguments

Value

Compute Statistics

Description

Usage

Arguments

Value

Compute Statistics multiple

Description

Usage

Arguments

Value

Compute the average size of a random partition

Description

Usage

Arguments

Value

Examples

Compute denominator for model with number of groups

Description

Usage

Arguments

Value

Between groups correlation

Description

Usage

Arguments

Value

Examples

Correlation with size

Description

Usage

Arguments

Value

Examples

Within groups correlation

Description

Usage

Arguments

Value

Examples

Function to count the number of partitions with a certain group size structure, for all possible group size structure. Function to use after calling the "find_all_partitions" function.

Description

Usage