Type: | Package |
Title: | Comparison of Bioregionalisation Methods |
Version: | 1.2.0 |
Description: | The main purpose of this package is to propose a transparent methodological framework to compare bioregionalisation methods based on hierarchical and non-hierarchical clustering algorithms (Kreft & Jetz (2010) <doi:10.1111/j.1365-2699.2010.02375.x>) and network algorithms (Lenormand et al. (2019) <doi:10.1002/ece3.4718> and Leroy et al. (2019) <doi:10.1111/jbi.13674>). |
Depends: | R (≥ 4.0.0) |
License: | GPL-3 |
Encoding: | UTF-8 |
LazyData: | true |
Imports: | ape, apcluster, bipartite, cluster, data.table, dbscan, dynamicTreeCut, fastcluster, fastkmedoids, ggplot2, grDevices, httr, igraph, mathjaxr, Matrix, phangorn, Rdpack, rlang, rmarkdown, segmented, sf, stats, tidyr, utils |
RdMacros: | mathjaxr, Rdpack |
LinkingTo: | Rcpp |
Suggests: | ade4, dplyr, knitr, microbenchmark, rnaturalearth, rnaturalearthdata, testthat (≥ 3.0.0) |
VignetteBuilder: | knitr |
RoxygenNote: | 7.3.2 |
URL: | https://github.com/bioRgeo/bioregion, https://bioRgeo.github.io/bioregion/ |
BugReports: | https://github.com/bioRgeo/bioregion/issues |
Config/testthat/edition: | 3 |
NeedsCompilation: | yes |
Packaged: | 2025-01-31 15:50:27 UTC; maxime |
Author: | Maxime Lenormand |
Maintainer: | Maxime Lenormand <maxime.lenormand@inrae.fr> |
Repository: | CRAN |
Date/Publication: | 2025-01-31 16:40:02 UTC |
Convert betapart dissimilarity to bioregion dissimilarity
Description
This function converts dissimilarity results produced by the betapart package (and packages using betapart, such as phyloregion) into a dissimilarity object compatible with the bioregion package. This function only converts object types to make them compatible with bioregion; it does not modify the beta-diversity values. This function allows the inclusion of phylogenetic beta diversity to compute bioregions with bioregion.
Usage
betapart_to_bioregion(betapart_result)
Arguments
betapart_result |
An object produced by the betapart package (e.g.,
using the |
Value
A dissimilarity object of class bioregion.pairwise.metric
,
compatible with the bioregion package.
Author(s)
Boris Leroy (leroy.boris@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Pierre Denelle (pierre.denelle@gmail.com)
Examples
comat <- matrix(sample(0:1000, size = 50, replace = TRUE,
prob = 1 / 1:1001), 5, 10)
rownames(comat) <- paste0("Site", 1:5)
colnames(comat) <- paste0("Species", 1:10)
## Not run:
beta_div <- betapart::beta.pair.abund(comat)
betapart_to_bioregion(beta_div)
## End(Not run)
Calculate contribution metrics for bioregions
Description
This function calculates the number of sites per bioregion, as well as the number of species these sites have, the number of endemic species, and the proportion of endemism.
Usage
bioregion_metrics(bioregionalization, comat, map = NULL, col_bioregion = NULL)
Arguments
bioregionalization |
A |
comat |
A co-occurrence |
map |
A spatial |
col_bioregion |
An |
Details
Endemic species are species found only in the sites belonging to one bioregion.
Value
A data.frame
with 5 columns, or 6 if spatial coherence is computed.
Author(s)
Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a5_3_summary_metrics.html.
Associated functions: site_species_metrics bioregionalization_metrics
Examples
comat <- matrix(sample(1000, 50), 5, 10)
rownames(comat) <- paste0("Site", 1:5)
colnames(comat) <- paste0("Species", 1:10)
net <- similarity(comat, metric = "Simpson")
clust <- netclu_louvain(net)
bioregion_metrics(bioregionalization = clust,
comat = comat)
Calculate metrics for one or several bioregionalizations
Description
This function calculates metrics for one or several bioregionalizations,
typically based on outputs from netclu_
, hclu_
, or nhclu_
functions.
Some metrics may require users to provide either a similarity or dissimilarity
matrix, or the initial species-site table.
Usage
bioregionalization_metrics(
bioregionalization,
dissimilarity = NULL,
dissimilarity_index = NULL,
net = NULL,
site_col = 1,
species_col = 2,
eval_metric = "all"
)
Arguments
bioregionalization |
A |
dissimilarity |
A |
dissimilarity_index |
A |
net |
The site-species network (i.e., bipartite network). Should be
provided as a |
site_col |
The name or index of the column representing site nodes
(i.e., primary nodes). Should be provided if |
species_col |
The name or index of the column representing species nodes
(i.e., feature nodes). Should be provided if |
eval_metric |
A |
Details
Evaluation metrics:
pc_distance
: This metric, as used by Holt et al. (2013), is the ratio of the between-cluster sum of dissimilarities (beta-diversity) to the total sum of dissimilarities for the full dissimilarity matrix. It is calculated in two steps:Compute the total sum of dissimilarities by summing all elements of the dissimilarity matrix.
Compute the between-cluster sum of dissimilarities by setting within-cluster dissimilarities to zero and summing the matrix. The
pc_distance
ratio is obtained by dividing the between-cluster sum of dissimilarities by the total sum of dissimilarities.
anosim
: This metric is the statistic used in the Analysis of Similarities, as described in Castro-Insua et al. (2018). It compares between-cluster and within-cluster dissimilarities. The statistic is computed as: R = (r_B - r_W) / (N (N-1) / 4), where r_B and r_W are the average ranks of between-cluster and within-cluster dissimilarities, respectively, and N is the total number of sites. Note: This function does not estimate significance; for significance testing, use vegan::anosim().avg_endemism
: This metric is the average percentage of endemism in clusters, as recommended by Kreft & Jetz (2010). It is calculated as: End_mean = sum_i (E_i / S_i) / K, where E_i is the number of endemic species in cluster i, S_i is the number of species in cluster i, and K is the total number of clusters.tot_endemism
: This metric is the total endemism across all clusters, as recommended by Kreft & Jetz (2010). It is calculated as: End_tot = E / C, where E is the total number of endemic species (i.e., species found in only one cluster) and C is the number of non-endemic species.
Value
A list
of class bioregion.bioregionalization.metrics
with two to three elements:
args
: Input arguments.evaluation_df
: Adata.frame
containing theeval_metric
values for all explored numbers of clusters.endemism_results
: If endemism calculations are requested, a list with the endemism results for each bioregionalization.
Author(s)
Boris Leroy (leroy.boris@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Pierre Denelle (pierre.denelle@gmail.com)
References
Castro-Insua A, Gómez-Rodríguez C & Baselga A (2018) Dissimilarity measures affected by richness differences yield biased delimitations of biogeographic realms. Nature Communications 9, 9-11.
Holt BG, Lessard J, Borregaard MK, Fritz SA, Araújo MB, Dimitrov D, Fabre P, Graham CH, Graves GR, Jønsson Ka, Nogués-Bravo D, Wang Z, Whittaker RJ, Fjeldså J & Rahbek C (2013) An update of Wallace's zoogeographic regions of the world. Science 339, 74-78.
Kreft H & Jetz W (2010) A framework for delineating biogeographical regions based on species distributions. Journal of Biogeography 37, 2029-2053.
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_1_hierarchical_clustering.html#optimaln.
Associated functions: compare_bioregionalizations find_optimal_n
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
comnet <- mat_to_net(comat)
dissim <- dissimilarity(comat, metric = "all")
# User-defined number of clusters
tree1 <- hclu_hierarclust(dissim,
n_clust = 10:15,
index = "Simpson")
tree1
a <- bioregionalization_metrics(tree1,
dissimilarity = dissim,
net = comnet,
site_col = "Node1",
species_col = "Node2",
eval_metric = c("tot_endemism",
"avg_endemism",
"pc_distance",
"anosim"))
a
Compare cluster memberships among multiple bioregionalizations
Description
This function computes pairwise comparisons for several
bioregionalizations, usually outputs from netclu_
, hclu_
, or nhclu_
functions. It also provides the confusion matrix from pairwise comparisons,
enabling the user to compute additional comparison metrics.
Usage
compare_bioregionalizations(
bioregionalizations,
indices = c("rand", "jaccard"),
cor_frequency = FALSE,
store_pairwise_membership = TRUE,
store_confusion_matrix = TRUE
)
Arguments
bioregionalizations |
A |
indices |
|
cor_frequency |
A |
store_pairwise_membership |
A |
store_confusion_matrix |
A |
Details
This function operates in two main steps:
Within each bioregionalization, the function compares all pairs of items and documents whether they are clustered together (
TRUE
) or separately (FALSE
). For example, if site 1 and site 2 are clustered in the same cluster in bioregionalization 1, their pairwise membershipsite1_site2
will beTRUE
. This output is stored in thepairwise_membership
slot ifstore_pairwise_membership = TRUE
.Across all bioregionalizations, the function compares their pairwise memberships to determine similarity. For each pair of bioregionalizations, it computes a confusion matrix with the following elements:
-
a
: Number of item pairs grouped in both bioregionalizations. -
b
: Number of item pairs grouped in the first but not in the second bioregionalization. -
c
: Number of item pairs grouped in the second but not in the first bioregionalization. -
d
: Number of item pairs not grouped in either bioregionalization.
The confusion matrix is stored in confusion_matrix
if
store_confusion_matrix = TRUE
.
Based on these confusion matrices, various indices can be computed to measure agreement among bioregionalizations. The currently implemented indices are:
-
Rand index:
(a + d) / (a + b + c + d)
Measures agreement by considering both grouped and ungrouped item pairs. -
Jaccard index:
a / (a + b + c)
Measures agreement based only on grouped item pairs.
These indices are complementary: the Jaccard index evaluates clustering similarity, while the Rand index considers both clustering and separation. For example, if two bioregionalizations never group the same pairs, their Jaccard index will be 0, but their Rand index may be > 0 due to ungrouped pairs.
Users can compute additional indices manually using the list of confusion matrices.
To identify which bioregionalization is most representative of the others,
the function can compute the correlation between the pairwise membership of
each bioregionalization and the total frequency of pairwise membership across
all bioregionalizations. This is enabled by setting cor_frequency = TRUE
.
Value
A list
containing 4 to 7 elements:
args: A
list
of user-provided arguments.inputs: A
list
containing information on the input bioregionalizations, such as the number of items clustered.pairwise_membership (optional): If
store_pairwise_membership = TRUE
, aboolean matrix
whereTRUE
indicates two items are in the same cluster, andFALSE
indicates they are not.freq_item_pw_membership: A
numeric vector
containing the number of times each item pair is clustered together, corresponding to the sum of rows inpairwise_membership
.bioregionalization_freq_cor (optional): If
cor_frequency = TRUE
, anumeric vector
of correlations between individual bioregionalizations and the total frequency of pairwise membership.confusion_matrix (optional): If
store_confusion_matrix = TRUE
, alist
of confusion matrices for each pair of bioregionalizations.bioregionalization_comparison: A
data.frame
containing comparison results, where the first column indicates the bioregionalizations compared, and the remaining columns contain the requestedindices
.
Author(s)
Boris Leroy (leroy.boris@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Pierre Denelle (pierre.denelle@gmail.com)
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a5_2_compare_bioregionalizations.html.
Associated functions: bioregionalization_metrics
Examples
# We here compare three different bioregionalizations
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
dissim <- dissimilarity(comat, metric = "Simpson")
bioregion1 <- nhclu_kmeans(dissim, n_clust = 3, index = "Simpson")
net <- similarity(comat, metric = "Simpson")
bioregion2 <- netclu_greedy(net)
bioregion3 <- netclu_walktrap(net)
# Make one single data.frame with the bioregionalizations to compare
compare_df <- merge(bioregion1$clusters, bioregion2$clusters, by = "ID")
compare_df <- merge(compare_df, bioregion3$clusters, by = "ID")
colnames(compare_df) <- c("Site", "Hclu", "Greedy", "Walktrap")
rownames(compare_df) <- compare_df$Site
compare_df <- compare_df[, c("Hclu", "Greedy", "Walktrap")]
# Running the function
compare_bioregionalizations(compare_df)
# Find out which bioregionalizations are most representative
compare_bioregionalizations(compare_df,
cor_frequency = TRUE)
Cut a hierarchical tree
Description
This function is designed to work on a hierarchical tree and cut it
at user-selected heights. It works with outputs from either
hclu_hierarclust
or hclust
objects. The function allows for cutting
the tree based on the chosen number(s) of clusters or specified height(s).
Additionally, it includes a procedure to automatically determine the cutting
height for the requested number(s) of clusters.
Usage
cut_tree(
tree,
n_clust = NULL,
cut_height = NULL,
find_h = TRUE,
h_max = 1,
h_min = 0,
dynamic_tree_cut = FALSE,
dynamic_method = "tree",
dynamic_minClusterSize = 5,
dissimilarity = NULL,
...
)
Arguments
tree |
A |
n_clust |
An |
cut_height |
A |
find_h |
A |
h_max |
A |
h_min |
A |
dynamic_tree_cut |
A |
dynamic_method |
A |
dynamic_minClusterSize |
An |
dissimilarity |
Relevant only if |
... |
Additional arguments passed to dynamicTreeCut::cutreeDynamic() to customize the dynamic tree cut method. |
Details
The function supports two main methods for cutting the tree. First, the tree
can be cut at a uniform height (specified by cut_height
or determined
automatically for the requested n_clust
). Second, the dynamic tree cut
method (Langfelder et al., 2008) can be applied, which adapts to the shape
of branches in the tree, cutting at varying heights based on cluster
positions.
The dynamic tree cut method has two variants:
The tree-based variant (
dynamic_method = "tree"
) uses a top-down approach, relying solely on the tree and the order of clustered objects.The hybrid variant (
dynamic_method = "hybrid"
) employs a bottom-up approach, leveraging both the tree and the dissimilarity matrix to identify clusters based on dissimilarity among sites. This approach is useful for detecting outliers within clusters.
Value
If tree
is an output from hclu_hierarclust()
, the same
object is returned with updated content (i.e., args
and clusters
). If
tree
is an hclust
object, a data.frame
containing the clusters is
returned.
Note
The find_h
argument is ignored if dynamic_tree_cut = TRUE
,
as cutting heights cannot be determined in this case.
Author(s)
Pierre Denelle (pierre.denelle@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Boris Leroy (leroy.boris@gmail.com)
References
Langfelder P, Zhang B & Horvath S (2008) Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. BIOINFORMATICS 24, 719-720.
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_1_hierarchical_clustering.html.
Associated functions: hclu_hierarclust
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site", 1:20)
colnames(comat) <- paste0("Species", 1:25)
simil <- similarity(comat, metric = "all")
dissimilarity <- similarity_to_dissimilarity(simil)
# User-defined number of clusters
tree1 <- hclu_hierarclust(dissimilarity,
n_clust = 5)
tree2 <- cut_tree(tree1, cut_height = .05)
tree3 <- cut_tree(tree1, n_clust = c(3, 5, 10))
tree4 <- cut_tree(tree1, cut_height = c(.05, .1, .15, .2, .25))
tree5 <- cut_tree(tree1, n_clust = c(3, 5, 10), find_h = FALSE)
hclust_tree <- tree2$algorithm$final.tree
clusters_2 <- cut_tree(hclust_tree, n_clust = 10)
cluster_dynamic <- cut_tree(tree1, dynamic_tree_cut = TRUE,
dissimilarity = dissimilarity)
Compute dissimilarity metrics (beta-diversity) between sites based on species composition
Description
This function generates a data.frame
where each row provides one or
several dissimilarity metrics between pairs of sites, based on a
co-occurrence matrix
with sites as rows and species as columns.
Usage
dissimilarity(comat, metric = "Simpson", formula = NULL, method = "prodmat")
Arguments
comat |
A co-occurrence |
metric |
A |
formula |
A |
method |
A |
Details
With a
the number of species shared by a pair of sites, b
species only
present in the first site and c
species only present in the second site.
Jaccard = (b + c) / (a + b + c)
Jaccardturn = 2min(b, c) / (a + 2min(b, c)) (Baselga, 2012)
Sorensen = (b + c) / (2a + b + c)
Simpson = min(b, c) / (a + min(b, c))
If abundances data are available, Bray-Curtis and its turnover component can also be computed with the following equation:
Bray = (B + C) / (2A + B + C)
Brayturn = min(B, C)/(A + min(B, C)) (Baselga, 2013)
with A
the sum of the lesser values for common species shared by a pair of
sites. B
and C
are the total number of specimens counted at both sites
minus A
.
formula
can be used to compute customized metrics with the terms
a
, b
, c
, A
, B
, and C
. For example
formula = c("pmin(b,c) / (a + pmin(b,c))", "(B + C) / (2*A + B + C)")
will compute the Simpson and Bray-Curtis dissimilarity metrics, respectively.
Note that pmin
is used in the Simpson formula because a
, b
, c
, A
,
B
and C
are numeric
vectors.
Euclidean computes the Euclidean distance between each pair of sites.
Value
A data.frame
with the additional class
bioregion.pairwise.metric
, containing one or several dissimilarity
metrics between pairs of sites. The first two columns represent the pairs of
sites. There is one column per similarity metric provided in metric
and
formula
, except for the abc
and ABC
metrics, which are stored in three
separate columns (one for each letter).
Author(s)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
References
Baselga, A. (2012) The Relationship between Species Replacement, Dissimilarity Derived from Nestedness, and Nestedness. Global Ecology and Biogeography, 21(12), 1223–1232.
Baselga, A. (2013) Separating the two components of abundance-based dissimilarity: balanced changes in abundance vs. abundance gradients. Methods in Ecology and Evolution, 4(6), 552–557.
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a3_pairwise_metrics.html.
Associated functions: similarity dissimilarity_to_similarity
Examples
comat <- matrix(sample(0:1000, size = 50, replace = TRUE,
prob = 1 / 1:1001), 5, 10)
rownames(comat) <- paste0("Site", 1:5)
colnames(comat) <- paste0("Species", 1:10)
dissim <- dissimilarity(comat,
metric = c("abc", "ABC", "Simpson", "Brayturn"))
dissim <- dissimilarity(comat, metric = "all",
formula = "1 - (b + c) / (a + b + c)")
Convert dissimilarity metrics to similarity metrics
Description
This function converts a data.frame
of dissimilarity metrics
(beta diversity) between sites into similarity metrics.
Usage
dissimilarity_to_similarity(dissimilarity, include_formula = TRUE)
Arguments
dissimilarity |
the output object from |
include_formula |
a |
Value
A data.frame
with the additional class
bioregion.pairwise.metric
, providing similarity metrics for each pair of
sites based on a dissimilarity object.
Note
The behavior of this function changes depending on column names. Columns
Site1
and Site2
are copied identically. If there are columns called
a
, b
, c
, A
, B
, C
they will also be copied identically. If there
are columns based on your own formula (argument formula
in
dissimilarity()
) or not in the original list of dissimilarity metrics
(argument metrics
in dissimilarity()
) and if the argument
include_formula
is set to FALSE
, they will also be copied identically.
Otherwise there are going to be converted like they other columns (default
behavior).
If a column is called Euclidean
, the similarity will be calculated based
on the following formula:
Euclidean similarity = 1 / (1 - Euclidean distance)
Otherwise, all other columns will be transformed into dissimilarity with the following formula:
similarity = 1 - dissimilarity
Author(s)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Boris Leroy (leroy.boris@gmail.com)
Pierre Denelle (pierre.denelle@gmail.com)
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a3_pairwise_metrics.html.
Associated functions: similarity dissimilarity_to_similarity
Examples
comat <- matrix(sample(0:1000, size = 50, replace = TRUE,
prob = 1 / 1:1001), 5, 10)
rownames(comat) <- paste0("Site", 1:5)
colnames(comat) <- paste0("Species", 1:10)
dissimil <- dissimilarity(comat, metric = "all")
dissimil
similarity <- dissimilarity_to_similarity(dissimil)
similarity
Search for an optimal number of clusters in a list of bioregionalizations
Description
This function aims to optimize one or several criteria on a set of ordered bioregionalizations. It is typically used to find one or more optimal cluster counts on hierarchical trees to cut or ranges of bioregionalizations from k-means or PAM. Users should exercise caution in other cases (e.g., unordered bioregionalizations or unrelated bioregionalizations).
Usage
find_optimal_n(
bioregionalizations,
metrics_to_use = "all",
criterion = "elbow",
step_quantile = 0.99,
step_levels = NULL,
step_round_above = TRUE,
metric_cutoffs = c(0.5, 0.75, 0.9, 0.95, 0.99, 0.999),
n_breakpoints = 1,
plot = TRUE
)
Arguments
bioregionalizations |
A |
metrics_to_use |
A |
criterion |
A |
step_quantile |
For |
step_levels |
For |
step_round_above |
A |
metric_cutoffs |
For |
n_breakpoints |
Specifies the number of breakpoints to find in the curve. Defaults to 1. |
plot |
A |
Details
This function explores evaluation metric ~ cluster relationships, applying criteria to find optimal cluster counts.
Note on criteria: Several criteria can return multiple optimal cluster counts, emphasizing hierarchical or nested bioregionalizations. This approach aligns with modern recommendations for biological datasets, as seen in Ficetola et al. (2017)'s reanalysis of Holt et al. (2013).
Criteria for optimal clusters:
elbow
: Identifies the "elbow" point in the evaluation metric curve, where incremental improvements diminish. Based on a method to find the maximum distance from a straight line linking curve endpoints.increasing_step
ordecreasing_step
: Highlights significant increases or decreases in metrics by analyzing pairwise differences between bioregionalizations. Users specifystep_quantile
orstep_levels
.cutoffs
: Derives clusters from specified metric cutoffs, e.g., as in Holt et al. (2013). Adjust cutoffs based on spatial scale.breakpoints
: Uses segmented regression to find breakpoints. Requires specifyingn_breakpoints
.min
&max
: Selects clusters at minimum or maximum metric values.
Value
A list
of class bioregion.optimal.n
with these elements:
args
: Input arguments.evaluation_df
: The input evaluationdata.frame
, appended withboolean
columns for optimal cluster counts.optimal_nb_clusters
: Alist
with optimal cluster counts for each metric in"metrics_to_use"
, based on the chosencriterion
.plot
: The plot (if requested).
Note
Please note that finding the optimal number of clusters is a procedure which normally requires decisions from the users, and as such can hardly be fully automatized. Users are strongly advised to read the references indicated below to look for guidance on how to choose their optimal number(s) of clusters. Consider the "optimal" numbers of clusters returned by this function as first approximation of the best numbers for your bioregionalization.
Author(s)
Boris Leroy (leroy.boris@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Pierre Denelle (pierre.denelle@gmail.com)
References
Holt BG, Lessard J, Borregaard MK, Fritz SA, Araújo MB, Dimitrov D, Fabre P, Graham CH, Graves GR, Jønsson Ka, Nogués-Bravo D, Wang Z, Whittaker RJ, Fjeldså J & Rahbek C (2013) An update of Wallace's zoogeographic regions of the world. Science 339, 74-78.
Ficetola GF, Mazel F & Thuiller W (2017) Global determinants of zoogeographical boundaries. Nature Ecology & Evolution 1, 0089.
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_1_hierarchical_clustering.html#optimaln.
Associated functions: hclu_hierarclust
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
dissim <- dissimilarity(comat, metric = "all")
# User-defined number of clusters
tree <- hclu_hierarclust(dissim,
optimal_tree_method = "best",
n_clust = 5:10)
tree
a <- bioregionalization_metrics(tree,
dissimilarity = dissim,
species_col = "Node2",
site_col = "Node1",
eval_metric = "anosim")
find_optimal_n(a, criterion = 'increasing_step', plot = FALSE)
Spatial distribution of fish in Europe (data.frame)
Description
A dataset containing the abundance of 195 species in 338 sites.
Usage
fishdf
Format
A data.frame
with 2,703 rows and 3 columns:
- Site
Unique site identifier (corresponding to the field ID of fishsf)
- Species
Unique species identifier
- Abundance
Species abundance
Spatial distribution of fish in Europe (co-occurrence matrix)
Description
A dataset containing the abundance of each of the 195 species in each of the 338 sites.
Usage
fishmat
Format
A co-occurrence matrix
with sites as rows and species as
columns. Each element of the matrix
represents the abundance of the species in the site.
Spatial distribution of fish in Europe
Description
A dataset containing the geometry of the 338 sites.
Usage
fishsf
Format
A
- ID
Unique site identifier
- geometry
Geometry of the site
Divisive hierarchical clustering based on dissimilarity or beta-diversity
Description
This function computes a divisive hierarchical clustering from a
dissimilarity (beta-diversity) data.frame
, calculates the cophenetic
correlation coefficient, and can generate clusters from the tree if requested
by the user. The function implements randomization of the dissimilarity matrix
to generate the tree, with a selection method based on the optimal cophenetic
correlation coefficient. Typically, the dissimilarity data.frame
is a
bioregion.pairwise.metric
object obtained by running similarity
or similarity
followed by similarity_to_dissimilarity
.
Usage
hclu_diana(
dissimilarity,
index = names(dissimilarity)[3],
n_clust = NULL,
cut_height = NULL,
find_h = TRUE,
h_max = 1,
h_min = 0
)
Arguments
dissimilarity |
The output object from |
index |
The name or number of the dissimilarity column to use. By default,
the third column name of |
n_clust |
An |
cut_height |
A |
find_h |
A |
h_max |
A |
h_min |
A |
Details
The function is based on diana. Chapter 6 of Kaufman & Rousseeuw (1990) fully details the functioning of the diana algorithm.
To find an optimal number of clusters, see bioregionalization_metrics()
Value
A list
of class bioregion.clusters
with five slots:
name: A
character
string containing the name of the algorithm.args: A
list
of input arguments as provided by the user.inputs: A
list
describing the characteristics of the clustering process.algorithm: A
list
containing all objects associated with the clustering procedure, such as the original cluster objects.clusters: A
data.frame
containing the clustering results.
Author(s)
Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
References
Kaufman L & Rousseeuw PJ (2009) Finding groups in data: An introduction to cluster analysis. In & Sons. JW (ed.), Finding groups in data: An introduction to cluster analysis.
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_1_hierarchical_clustering.html.
Associated functions: cut_tree
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
dissim <- dissimilarity(comat, metric = "all")
data("fishmat")
fishdissim <- dissimilarity(fishmat)
fish_diana <- hclu_diana(fishdissim, index = "Simpson")
Hierarchical clustering based on dissimilarity or beta-diversity
Description
This function generates a hierarchical tree from a dissimilarity
(beta-diversity) data.frame
, calculates the cophenetic correlation
coefficient, and optionally retrieves clusters from the tree upon user
request. The function includes a randomization process for the dissimilarity
matrix to generate the tree, with two methods available for constructing the
final tree. Typically, the dissimilarity data.frame
is a
bioregion.pairwise.metric
object obtained by running similarity
,
or by running similarity
followed by similarity_to_dissimilarity
.
Usage
hclu_hierarclust(
dissimilarity,
index = names(dissimilarity)[3],
method = "average",
randomize = TRUE,
n_runs = 100,
keep_trials = FALSE,
optimal_tree_method = "iterative_consensus_tree",
n_clust = NULL,
cut_height = NULL,
find_h = TRUE,
h_max = 1,
h_min = 0,
consensus_p = 0.5,
verbose = TRUE
)
Arguments
dissimilarity |
The output object from |
index |
The name or number of the dissimilarity column to use. By
default, the third column name of |
method |
The name of the hierarchical classification method, as in
hclust. Should be one of |
randomize |
A |
n_runs |
The number of trials for randomizing the dissimilarity matrix. |
keep_trials |
A |
optimal_tree_method |
A |
n_clust |
An |
cut_height |
A |
find_h |
A |
h_max |
A |
h_min |
A |
consensus_p |
A |
verbose |
A |
Details
The function is based on hclust.
The default method for the hierarchical tree is average
, i.e.
UPGMA as it has been recommended as the best method to generate a tree
from beta diversity dissimilarity (Kreft & Jetz, 2010).
Clusters can be obtained by two methods:
Specifying a desired number of clusters in
n_clust
Specifying one or several heights of cut in
cut_height
To find an optimal number of clusters, see bioregionalization_metrics()
It is important to pay attention to the fact that the order of rows in the input distance matrix influences the tree topology as explained in Dapporto (2013). To address this, the function generates multiple trees by randomizing the distance matrix.
Two methods are available to obtain the final tree:
optimal_tree_method = "iterative_consensus_tree"
: The Iterative Hierarchical Consensus Tree (IHCT) method reconstructs a consensus tree by iteratively splitting the dataset into two subclusters based on the pairwise dissimilarity of sites acrossn_runs
trees based onn_runs
randomizations of the distance matrix. At each iteration, it identifies the majority membership of sites into two stable groups across all trees, calculates the height based on the selected linkage method (method
), and enforces monotonic constraints on node heights to produce a coherent tree structure. This approach provides a robust, hierarchical representation of site relationships, balancing cluster stability and hierarchical constraints.optimal_tree_method = "best"
: This method selects one tree among with the highest cophenetic correlation coefficient, representing the best fit between the hierarchical structure and the original distance matrix.optimal_tree_method = "consensus"
: This method constructs a consensus tree using phylogenetic methods with the function consensus. When using this option, you must set theconsensus_p
parameter, which indicates the proportion of trees that must contain a region/cluster for it to be included in the final consensus tree. Consensus trees lack an inherent height because they represent a majority structure rather than an actual hierarchical clustering. To assign heights, we use a non-negative least squares method (nnls.tree) based on the initial distance matrix, ensuring that the consensus tree preserves approximate distances among clusters.
We recommend using the "iterative_consensus_tree"
as all the branches of
this tree will always reflect the majority decision among many randomized
versions of the distance matrix. This method is inspired by
Dapporto et al. (2015), which also used the majority decision
among many randomized versions of the distance matrix, but it expands it
to reconstruct the entire topology of the tree iteratively.
We do not recommend using the basic consensus
method because in many
contexts it provides inconsistent results, with a meaningless tree topology
and a very low cophenetic correlation coefficient.
For a fast exploration of the tree, we recommend using the best
method
which will only select the tree with the highest cophenetic correlation
coefficient among all randomized versions of the distance matrix.
Value
A list
of class bioregion.clusters
with five slots:
name: A
character
string containing the name of the algorithm.args: A
list
of input arguments as provided by the user.inputs: A
list
describing the characteristics of the clustering process.algorithm: A
list
containing all objects associated with the clustering procedure, such as the original cluster objects.clusters: A
data.frame
containing the clustering results.
In the algorithm
slot, users can find the following elements:
trials
: A list containing all randomization trials. Each trial includes the dissimilarity matrix with randomized site order, the associated tree, and the cophenetic correlation coefficient (Spearman) for that tree.final.tree
: Anhclust
object representing the final hierarchical tree to be used.final.tree.coph.cor
: The cophenetic correlation coefficient between the initial dissimilarity matrix and thefinal.tree
.
Author(s)
Boris Leroy (leroy.boris@gmail.com)
Pierre Denelle (pierre.denelle@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
References
Kreft H & Jetz W (2010) A framework for delineating biogeographical regions based on species distributions. Journal of Biogeography 37, 2029-2053.
Dapporto L, Ramazzotti M, Fattorini S, Talavera G, Vila R & Dennis, RLH (2013) Recluster: an unbiased clustering procedure for beta-diversity turnover. Ecography 36, 1070–1075.
Dapporto L, Ciolli G, Dennis RLH, Fox R & Shreeve TG (2015) A new procedure for extrapolating turnover regionalization at mid-small spatial scales, tested on British butterflies. Methods in Ecology and Evolution 6 , 1287–1297.
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_1_hierarchical_clustering.html.
Associated functions: cut_tree
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
dissim <- dissimilarity(comat, metric = "Simpson")
# User-defined number of clusters
tree1 <- hclu_hierarclust(dissim,
n_clust = 5)
tree1
plot(tree1)
str(tree1)
tree1$clusters
# User-defined height cut
# Only one height
tree2 <- hclu_hierarclust(dissim,
cut_height = .05)
tree2
tree2$clusters
# Multiple heights
tree3 <- hclu_hierarclust(dissim,
cut_height = c(.05, .15, .25))
tree3$clusters # Mind the order of height cuts: from deep to shallow cuts
# Info on each partition can be found in table cluster_info
tree3$cluster_info
plot(tree3)
OPTICS hierarchical clustering algorithm
Description
This function performs semi-hierarchical clustering based on dissimilarity using the OPTICS algorithm (Ordering Points To Identify the Clustering Structure).
Usage
hclu_optics(
dissimilarity,
index = names(dissimilarity)[3],
minPts = NULL,
eps = NULL,
xi = 0.05,
minimum = FALSE,
show_hierarchy = FALSE,
algorithm_in_output = TRUE,
...
)
Arguments
dissimilarity |
The output object from |
index |
The name or number of the dissimilarity column to use. By
default, the third column name of |
minPts |
A |
eps |
A |
xi |
A |
minimum |
A |
show_hierarchy |
A |
algorithm_in_output |
A |
... |
Additional arguments to be passed to |
Details
The OPTICS (Ordering points to identify the clustering structure) is a
semi-hierarchical clustering algorithm which orders the points in the
dataset such that points which are closest become neighbors, and calculates
a reachability distance for each point. Then, clusters can be extracted in a
hierarchical manner from this reachability distance, by identifying clusters
depending on changes in the relative cluster density. The reachability plot
should be explored to understand the clusters and their hierarchical nature,
by running plot on the output of the function
if algorithm_in_output = TRUE
: plot(object$algorithm)
.
We recommend reading (Hahsler et al., 2019) to grasp the
algorithm, how it works, and what the clusters mean.
To extract the clusters, we use the extractXi function which is based on the steepness of the reachability plot (see optics)
Value
A list
of class bioregion.clusters
with five slots:
name: A
character
string containing the name of the algorithm.args: A
list
of input arguments as provided by the user.inputs: A
list
describing the characteristics of the clustering process.algorithm: A
list
containing all objects associated with the clustering procedure, such as the original cluster objects.clusters: A
data.frame
containing the clustering results.
In the algorithm
slot, if algorithm_in_output = TRUE
, users can
find the output of optics.
Author(s)
Boris Leroy (leroy.boris@gmail.com)
Pierre Denelle (pierre.denelle@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
References
Hahsler M, Piekenbrock M & Doran D (2019) Dbscan: Fast density-based clustering with R. Journal of Statistical Software 91, 1–30.
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_1_hierarchical_clustering.html.
Associated functions: nhclu_dbscan
Examples
dissim <- dissimilarity(fishmat, metric = "all")
clust1 <- hclu_optics(dissim, index = "Simpson")
clust1
# Visualize the optics plot (the hierarchy of clusters is illustrated at the
# bottom)
plot(clust1$algorithm)
# Extract the hierarchy of clusters
clust1 <- hclu_optics(dissim, index = "Simpson", show_hierarchy = TRUE)
clust1
Download, unzip, check permissions, and test the bioregion's binary files
Description
This function downloads and unzips the 'bin' folder required to run certain
functions of the bioregion
package. It also verifies if the files have the
necessary permissions to be executed as programs. Finally, it tests whether
the binary files are running correctly.
Usage
install_binaries(
binpath = "tempdir",
download_only = FALSE,
infomap_version = c("2.1.0", "2.6.0", "2.7.1", "2.8.0")
)
Arguments
binpath |
A |
download_only |
A |
infomap_version |
A |
Details
By default, the binary files are installed in R's temporary
directory (binpath = "tempdir"
). In this case, the bin
folder will be
automatically removed at the end of the R session. Alternatively, the binary
files can be installed in the bioregion
package folder
(binpath = "pkgfolder"
).
A custom folder path can also be specified. In this case, and only in this
case, download_only
can be set to TRUE
, but you must ensure that the
files have the required permissions to be executed as programs.
In all cases, PLEASE MAKE SURE to update the binpath
and check_install
parameters accordingly in netclu_infomap, netclu_louvain, and
netclu_oslom.
Value
No return value.
Note
Currently, only Infomap versions 2.1.0, 2.6.0, 2.7.1, and 2.8.0 are available.
Author(s)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Boris Leroy (leroy.boris@gmail.com)
Pierre Denelle (pierre.denelle@gmail.com)
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a1_install_binary_files.html.
Create a map of bioregions
Description
This plot function can be used to visualize bioregions based on a
bioregion.clusters
object combined with a geometry (sf
objects).
Usage
map_bioregions(clusters, geometry, write_clusters = FALSE, plot = TRUE, ...)
Arguments
clusters |
An object of class |
geometry |
A spatial object that can be handled by the |
write_clusters |
A |
plot |
A |
... |
Further arguments to be passed to |
Details
The clusters
and geometry
site IDs should correspond. They should
have the same type (i.e., character
if clusters
is a
bioregion.clusters
object) and the sites of clusters
should be
included in the sites of geometry
.
Value
One or several maps of bioregions if plot = TRUE
and the
geometry with additional clusters' attributes if write_clusters = TRUE
.
Author(s)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Boris Leroy (leroy.boris@gmail.com)
Pierre Denelle (pierre.denelle@gmail.com)
Examples
data(fishmat)
data(fishsf)
net <- similarity(fishmat, metric = "Simpson")
clu <- netclu_greedy(net)
map <- map_bioregions(clu, fishsf, write_clusters = TRUE, plot = FALSE)
Create a data.frame from a contingency table
Description
This function generates a two- or three-column data.frame
, where
each row represents the interaction between two nodes (e.g., site and species)
and an optional third column indicates the weight of the interaction
(if weight = TRUE
). The input is a contingency table, with rows
representing one set of entities (e.g., site) and columns representing
another set (e.g., species).
Usage
mat_to_net(
mat,
weight = FALSE,
remove_zeroes = TRUE,
include_diag = TRUE,
include_lower = TRUE
)
Arguments
mat |
A contingency table (i.e., a |
weight |
A |
remove_zeroes |
A |
include_diag |
A |
include_lower |
A |
Value
A data.frame
where each row represents the interaction
between two nodes. If weight = TRUE
, the data.frame
includes a third
column representing the weight of each interaction.
Author(s)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a2_matrix_and_network_formats.html.
Associated functions: net_to_mat
Examples
mat <- matrix(sample(1000, 50), 5, 10)
rownames(mat) <- paste0("Site", 1:5)
colnames(mat) <- paste0("Species", 1:10)
net <- mat_to_net(mat, weight = TRUE)
Create a contingency table from a data.frame
Description
This function generates a contingency table from a two- or three-column
data.frame
, where each row represents the interaction between two
nodes (e.g., site and species) and an optional third column indicates
the weight of the interaction (if weight = TRUE
).
Usage
net_to_mat(
net,
weight = FALSE,
squared = FALSE,
symmetrical = FALSE,
missing_value = 0
)
Arguments
net |
A two- or three-column |
weight |
A |
squared |
A |
symmetrical |
A |
missing_value |
The value to assign to pairs of nodes not present
in |
Value
A matrix
with the first nodes (from the first column of net
)
as rows and the second nodes (from the second column of net
) as columns.
If squared = TRUE
, the rows and columns will have the same number of
elements, corresponding to the unique union of objects in the first and
second columns of net
. If squared = TRUE
and symmetrical = TRUE
,
the matrix will be forced to be symmetrical based on the upper triangular
part of the matrix.
Author(s)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a2_matrix_and_network_formats.html.
Associated functions: mat_to_net
Examples
net <- data.frame(
Site = c(rep("A", 2), rep("B", 3), rep("C", 2)),
Species = c("a", "b", "a", "c", "d", "b", "d"),
Weight = c(10, 100, 1, 20, 50, 10, 20)
)
mat <- net_to_mat(net, weight = TRUE)
Community structure detection in weighted bipartite networks via modularity optimization
Description
This function takes a bipartite weighted graph and computes modules by applying Newman’s modularity measure in a bipartite weighted version.
Usage
netclu_beckett(
net,
weight = TRUE,
cut_weight = 0,
index = names(net)[3],
seed = NULL,
forceLPA = FALSE,
site_col = 1,
species_col = 2,
return_node_type = "both",
algorithm_in_output = TRUE
)
Arguments
net |
A |
weight |
A |
cut_weight |
A minimal weight value. If |
index |
The name or number of the column to use as weight. By default,
the third column name of |
seed |
The seed for the random number generator ( |
forceLPA |
A |
site_col |
The name or number of the column for site nodes (i.e., primary nodes). |
species_col |
The name or number of the column for species nodes (i.e., feature nodes). |
return_node_type |
A |
algorithm_in_output |
A |
Details
This function is based on the modularity optimization algorithm provided by Stephen Beckett (Beckett, 2016) as implemented in the bipartite package (computeModules).
Value
A list
of class bioregion.clusters
with five slots:
name: A
character
containing the name of the algorithm.args: A
list
of input arguments as provided by the user.inputs: A
list
of characteristics of the clustering process.algorithm: A
list
of all objects associated with the clustering procedure, such as original cluster objects (only ifalgorithm_in_output = TRUE
).clusters: A
data.frame
containing the clustering results.
If algorithm_in_output = TRUE
, users can find the output of
computeModules in the algorithm
slot.
Note
Beckett's algorithm is designed to handle weighted bipartite networks. If
weight = FALSE
, a weight of 1 will be assigned to each pair of nodes.
Ensure that the site_col
and species_col
arguments correctly identify
the respective columns for site nodes (primary nodes) and species nodes
(feature nodes). The type of nodes returned in the output can be selected
using the return_node_type
argument: "both"
to include both node types,
"site"
to return only site nodes, or "species"
to return only species
nodes.
Author(s)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
References
Beckett SJ (2016) Improved community detection in weighted bipartite networks. Royal Society Open Science 3, 140536.
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_3_network_clustering.html.
Associated functions: netclu_infomap netclu_louvain netclu_oslom
Examples
net <- data.frame(
Site = c(rep("A", 2), rep("B", 3), rep("C", 2)),
Species = c("a", "b", "a", "c", "d", "b", "d"),
Weight = c(10, 100, 1, 20, 50, 10, 20))
com <- netclu_beckett(net)
Community structure detection via greedy optimization of modularity
Description
This function finds communities in a (un)weighted undirected network via greedy optimization of modularity.
Usage
netclu_greedy(
net,
weight = TRUE,
cut_weight = 0,
index = names(net)[3],
bipartite = FALSE,
site_col = 1,
species_col = 2,
return_node_type = "both",
algorithm_in_output = TRUE
)
Arguments
net |
The output object from |
weight |
A |
cut_weight |
A minimal weight value. If |
index |
The name or number of the column to use as weight. By default,
the third column name of |
bipartite |
A |
site_col |
The name or number for the column of site nodes (i.e. primary nodes). |
species_col |
The name or number for the column of species nodes (i.e. feature nodes). |
return_node_type |
A |
algorithm_in_output |
A |
Details
This function is based on the fast greedy modularity optimization algorithm (Clauset et al., 2004) as implemented in the igraph package (cluster_fast_greedy).
Value
A list
of class bioregion.clusters
with five slots:
name:
character
containing the name of the algorithmargs:
list
of input arguments as provided by the userinputs:
list
of characteristics of the clustering processalgorithm:
list
of all objects associated with the clustering procedure, such as original cluster objects (only ifalgorithm_in_output = TRUE
)clusters:
data.frame
containing the clustering results
In the algorithm
slot, if algorithm_in_output = TRUE
, users can
find the output of
cluster_fast_greedy.
Note
Although this algorithm was not primarily designed to deal with bipartite
network, it is possible to consider the bipartite network as unipartite
network (bipartite = TRUE
).
Do not forget to indicate which of the first two columns is
dedicated to the site nodes (i.e. primary nodes) and species nodes (i.e.
feature nodes) using the arguments site_col
and species_col
.
The type of nodes returned in the output can be chosen with the argument
return_node_type
equal to both
to keep both types of nodes,
sites
to preserve only the sites nodes and species
to
preserve only the species nodes.
Author(s)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
References
Clauset A, Newman MEJ & Moore C (2004) Finding community structure in very large networks. Phys. Rev. E 70, 066111.
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_3_network_clustering.html.
Associated functions: netclu_infomap netclu_louvain netclu_oslom
Examples
comat <- matrix(sample(1000, 50), 5, 10)
rownames(comat) <- paste0("Site", 1:5)
colnames(comat) <- paste0("Species", 1:10)
net <- similarity(comat, metric = "Simpson")
com <- netclu_greedy(net)
net_bip <- mat_to_net(comat, weight = TRUE)
clust2 <- netclu_greedy(net_bip, bipartite = TRUE)
Infomap community finding
Description
This function finds communities in a (un)weighted (un)directed network based on the Infomap algorithm (https://github.com/mapequation/infomap).
Usage
netclu_infomap(
net,
weight = TRUE,
cut_weight = 0,
index = names(net)[3],
seed = NULL,
nbmod = 0,
markovtime = 1,
numtrials = 1,
twolevel = FALSE,
show_hierarchy = FALSE,
directed = FALSE,
bipartite_version = FALSE,
bipartite = FALSE,
site_col = 1,
species_col = 2,
return_node_type = "both",
version = "2.8.0",
binpath = "tempdir",
check_install = TRUE,
path_temp = "infomap_temp",
delete_temp = TRUE
)
Arguments
net |
The output object from |
weight |
A |
cut_weight |
A minimal weight value. If |
index |
The name or number of the column to use as weight. By default,
the third column name of |
seed |
The seed for the random number generator ( |
nbmod |
Penalize solutions the more they differ from this number ( |
markovtime |
Scales link flow to change the cost of moving between
modules, higher values result in fewer modules ( |
numtrials |
For the number of trials before picking up the best solution. |
twolevel |
A |
show_hierarchy |
A |
directed |
A |
bipartite_version |
A |
bipartite |
A |
site_col |
The name or number for the column of site nodes (i.e. primary nodes). |
species_col |
The name or number for the column of species nodes (i.e. feature nodes). |
return_node_type |
A |
version |
A |
binpath |
A |
check_install |
A |
path_temp |
A |
delete_temp |
A |
Details
Infomap is a network clustering algorithm based on the Map equation proposed in Rosvall & Bergstrom (2008) that finds communities in (un)weighted and (un)directed networks.
This function is based on the C++ version of Infomap (https://github.com/mapequation/infomap/releases). This function needs binary files to run. They can be installed with install_binaries.
If you changed the default path to the bin
folder
while running install_binaries PLEASE MAKE SURE to set binpath
accordingly.
If you did not use install_binaries to change the permissions and test
the binary files PLEASE MAKE SURE to set check_install
accordingly.
The C++ version of Infomap generates temporary folders and/or files that are
stored in the path_temp
folder ("infomap_temp" with a unique timestamp
located in the bin folder in binpath
by default). This temporary folder is
removed by default (delete_temp = TRUE
).
Several versions of Infomap are available in the package. See install_binaries for more details.
Value
A list
of class bioregion.clusters
with five slots:
name: A
character
containing the name of the algorithm.args: A
list
of input arguments as provided by the user.inputs: A
list
of characteristics of the clustering process.algorithm: A
list
of all objects associated with the clustering procedure, such as original cluster objects.clusters: A
data.frame
containing the clustering results.
In the algorithm
slot, users can find the following elements:
cmd
: The command line used to run Infomap.version
: The Infomap version.web
: Infomap's GitHub repository.
Note
Infomap has been designed to deal with bipartite networks. To use this
functionality, set the bipartite_version
argument to TRUE in order to
approximate a two-step random walker (see
https://www.mapequation.org/infomap/ for more information). Note that
a bipartite network can also be considered as a unipartite network
(bipartite = TRUE
).
In both cases, do not forget to indicate which of the first two columns is
dedicated to the site nodes (i.e., primary nodes) and species nodes (i.e.
feature nodes) using the arguments site_col
and species_col
.
The type of nodes returned in the output can be chosen with the argument
return_node_type
equal to "both"
to keep both types of nodes, "site"
to preserve only the site nodes, and "species"
to preserve only the
species nodes.
Author(s)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
References
Rosvall M & Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences 105, 1118-1123.
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_3_network_clustering.html.
Associated functions: netclu_greedy netclu_louvain netclu_oslom
Examples
comat <- matrix(sample(1000, 50), 5, 10)
rownames(comat) <- paste0("Site", 1:5)
colnames(comat) <- paste0("Species", 1:10)
net <- similarity(comat, metric = "Simpson")
com <- netclu_infomap(net)
Finding communities based on propagating labels
Description
This function finds communities in a (un)weighted undirected network based on propagating labels.
Usage
netclu_labelprop(
net,
weight = TRUE,
cut_weight = 0,
index = names(net)[3],
seed = NULL,
bipartite = FALSE,
site_col = 1,
species_col = 2,
return_node_type = "both",
algorithm_in_output = TRUE
)
Arguments
net |
The output object from |
weight |
A |
cut_weight |
A minimal weight value. If |
index |
The name or number of the column to use as weight. By default,
the third column name of |
seed |
The seed for the random number generator ( |
bipartite |
A |
site_col |
The name or number for the column of site nodes (i.e. primary nodes). |
species_col |
The name or number for the column of species nodes (i.e. feature nodes). |
return_node_type |
A |
algorithm_in_output |
A |
Details
This function is based on propagating labels (Raghavan et al., 2007) as implemented in the igraph package (cluster_label_prop).
Value
A list
of class bioregion.clusters
with five slots:
name: A
character
containing the name of the algorithm.args: A
list
of input arguments as provided by the user.inputs: A
list
of characteristics of the clustering process.algorithm: A
list
of all objects associated with the clustering procedure, such as original cluster objects (only ifalgorithm_in_output = TRUE
).clusters: A
data.frame
containing the clustering results.
In the algorithm
slot, if algorithm_in_output = TRUE
, users can
find a "communities" object, output of
cluster_label_prop.
Note
Although this algorithm was not primarily designed to deal with bipartite
networks, it is possible to consider the bipartite network as a unipartite
network (bipartite = TRUE
).
Do not forget to indicate which of the first two columns is
dedicated to the site nodes (i.e., primary nodes) and species nodes (i.e.
feature nodes) using the arguments site_col
and species_col
.
The type of nodes returned in the output can be chosen with the argument
return_node_type
equal to "both"
to keep both types of nodes,
"site"
to preserve only the site nodes, and "species"
to
preserve only the species nodes.
Author(s)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
References
Raghavan UN, Albert R & Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Physical Review E 76, 036106.
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_3_network_clustering.html.
Associated functions: netclu_infomap netclu_louvain netclu_oslom
Examples
comat <- matrix(sample(1000, 50), 5, 10)
rownames(comat) <- paste0("Site", 1:5)
colnames(comat) <- paste0("Species", 1:10)
net <- similarity(comat, metric = "Simpson")
com <- netclu_labelprop(net)
net_bip <- mat_to_net(comat, weight = TRUE)
clust2 <- netclu_labelprop(net_bip, bipartite = TRUE)
Finding communities based on the leading eigenvector of the community matrix
Description
This function finds communities in a (un)weighted undirected network based on the leading eigenvector of the community matrix.
Usage
netclu_leadingeigen(
net,
weight = TRUE,
cut_weight = 0,
index = names(net)[3],
bipartite = FALSE,
site_col = 1,
species_col = 2,
return_node_type = "both",
algorithm_in_output = TRUE
)
Arguments
net |
The output object from |
weight |
A |
cut_weight |
A minimal weight value. If |
index |
The name or number of the column to use as weight. By default,
the third column name of |
bipartite |
A |
site_col |
The name or number for the column of site nodes (i.e., primary nodes). |
species_col |
The name or number for the column of species nodes (i.e., feature nodes). |
return_node_type |
A |
algorithm_in_output |
A |
Details
This function is based on the leading eigenvector of the community matrix (Newman, 2006) as implemented in the igraph package (cluster_leading_eigen).
Value
A list
of class bioregion.clusters
with five slots:
name: A
character
containing the name of the algorithm.args: A
list
of input arguments as provided by the user.inputs: A
list
of characteristics of the clustering process.algorithm: A
list
of all objects associated with the clustering procedure, such as original cluster objects (only ifalgorithm_in_output = TRUE
).clusters: A
data.frame
containing the clustering results.
In the algorithm
slot, if algorithm_in_output = TRUE
, users can
find the output of cluster_leading_eigen.
Note
Although this algorithm was not primarily designed to deal with bipartite
networks, it is possible to consider the bipartite network as a unipartite
network (bipartite = TRUE
).
Do not forget to indicate which of the first two columns is
dedicated to the site nodes (i.e., primary nodes) and species nodes (i.e.
feature nodes) using the arguments site_col
and species_col
.
The type of nodes returned in the output can be chosen with the argument
return_node_type
equal to "both"
to keep both types of nodes,
"site"
to preserve only the site nodes, and "species"
to
preserve only the species nodes.
Author(s)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
References
Newman MEJ (2006) Finding community structure in networks using the eigenvectors of matrices. Physical Review E 74, 036104.
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_3_network_clustering.html.
Associated functions: netclu_infomap netclu_louvain netclu_oslom
Examples
comat <- matrix(sample(1000, 50), 5, 10)
rownames(comat) <- paste0("Site", 1:5)
colnames(comat) <- paste0("Species", 1:10)
net <- similarity(comat, metric = "Simpson")
com <- netclu_leadingeigen(net)
net_bip <- mat_to_net(comat, weight = TRUE)
clust2 <- netclu_leadingeigen(net_bip, bipartite = TRUE)
Finding communities using the Leiden algorithm
Description
This function finds communities in a (un)weighted undirected network based on the Leiden algorithm of Traag, van Eck & Waltman.
Usage
netclu_leiden(
net,
weight = TRUE,
cut_weight = 0,
index = names(net)[3],
seed = NULL,
objective_function = "CPM",
resolution_parameter = 1,
beta = 0.01,
n_iterations = 2,
vertex_weights = NULL,
bipartite = FALSE,
site_col = 1,
species_col = 2,
return_node_type = "both",
algorithm_in_output = TRUE
)
Arguments
net |
The output object from |
weight |
A |
cut_weight |
A minimal weight value. If |
index |
The name or number of the column to use as weight. By default,
the third column name of |
seed |
The random number generator seed (NULL for random by default). |
objective_function |
A string indicating the objective function to use, either the Constant Potts Model ("CPM") or "modularity" ("CPM" by default). |
resolution_parameter |
The resolution parameter to use. Higher resolutions lead to smaller communities, while lower resolutions lead to larger communities. |
beta |
A parameter affecting the randomness in the Leiden algorithm. This affects only the refinement step of the algorithm. |
n_iterations |
The number of iterations for the Leiden algorithm. Each iteration may further improve the partition. |
vertex_weights |
The vertex weights used in the Leiden algorithm. If not provided, they will be automatically determined based on the objective_function. Please see the details of this function to understand how to interpret the vertex weights. |
bipartite |
A |
site_col |
The name or number for the column of site nodes (i.e., primary nodes). |
species_col |
The name or number for the column of species nodes (i.e., feature nodes). |
return_node_type |
A |
algorithm_in_output |
A |
Details
This function is based on the Leiden algorithm (Traag et al., 2019) as implemented in the igraph package (cluster_leiden).
Value
A list
of class bioregion.clusters
with five slots:
name: A
character
containing the name of the algorithm.args: A
list
of input arguments as provided by the user.inputs: A
list
of characteristics of the clustering process.algorithm: A
list
of all objects associated with the clustering procedure, such as original cluster objects (only ifalgorithm_in_output = TRUE
).clusters: A
data.frame
containing the clustering results.
In the algorithm
slot, if algorithm_in_output = TRUE
, users can
find the output of cluster_leiden.
Note
Although this algorithm was not primarily designed to deal with bipartite
networks, it is possible to consider the bipartite network as a unipartite
network (bipartite = TRUE
).
Do not forget to indicate which of the first two columns is
dedicated to the site nodes (i.e., primary nodes) and species nodes (i.e.
feature nodes) using the arguments site_col
and species_col
.
The type of nodes returned in the output can be chosen with the argument
return_node_type
equal to "both"
to keep both types of nodes,
"site"
to preserve only the site nodes, and "species"
to
preserve only the species nodes.
Author(s)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
References
Traag VA, Waltman L & Van Eck NJ (2019) From Louvain to Leiden: guaranteeing well-connected communities. Scientific reports 9, 5233.
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_3_network_clustering.html.
Associated functions: netclu_infomap netclu_louvain netclu_oslom
Examples
comat <- matrix(sample(1000, 50), 5, 10)
rownames(comat) <- paste0("Site", 1:5)
colnames(comat) <- paste0("Species", 1:10)
net <- similarity(comat, metric = "Simpson")
com <- netclu_leiden(net)
net_bip <- mat_to_net(comat, weight = TRUE)
clust2 <- netclu_leiden(net_bip, bipartite = TRUE)
Louvain community finding
Description
This function finds communities in a (un)weighted undirected network based on the Louvain algorithm.
Usage
netclu_louvain(
net,
weight = TRUE,
cut_weight = 0,
index = names(net)[3],
lang = "igraph",
resolution = 1,
seed = NULL,
q = 0,
c = 0.5,
k = 1,
bipartite = FALSE,
site_col = 1,
species_col = 2,
return_node_type = "both",
binpath = "tempdir",
check_install = TRUE,
path_temp = "louvain_temp",
delete_temp = TRUE,
algorithm_in_output = TRUE
)
Arguments
net |
The output object from |
weight |
A |
cut_weight |
A minimal weight value. If |
index |
The name or number of the column to use as weight. By default,
the third column name of |
lang |
A string indicating which version of Louvain should be used
( |
resolution |
A resolution parameter to adjust the modularity (1 is chosen by default, see Details). |
seed |
The random number generator seed (only when |
q |
The quality function used to compute the partition of the graph (modularity is chosen by default, see Details). |
c |
The parameter for the Owsinski-Zadrozny quality function (between 0 and 1, 0.5 is chosen by default). |
k |
The kappa_min value for the Shi-Malik quality function (it must be > 0, 1 is chosen by default). |
bipartite |
A |
site_col |
The name or number for the column of site nodes (i.e., primary nodes). |
species_col |
The name or number for the column of species nodes (i.e., feature nodes). |
return_node_type |
A |
binpath |
A |
check_install |
A |
path_temp |
A |
delete_temp |
A |
algorithm_in_output |
A |
Details
Louvain is a network community detection algorithm proposed in
(Blondel et al., 2008). This function offers two
implementations of the Louvain algorithm (controlled by the lang
parameter):
the igraph
implementation (cluster_louvain) and the C++
implementation (https://sourceforge.net/projects/louvain/, version 0.3).
The igraph
implementation allows adjustment of the resolution parameter of
the modularity function (resolution
argument) used internally by the
algorithm. Lower values typically yield fewer, larger clusters. The original
definition of modularity is recovered when the resolution parameter
is set to 1 (by default).
The C++ implementation provides several quality functions:
q = 0
for the classical Newman-Girvan criterion (Modularity),
q = 1
for the Zahn-Condorcet criterion, q = 2
for the Owsinski-Zadrozny
criterion (parameterized by c
), q = 3
for the Goldberg Density criterion,
q = 4
for the A-weighted Condorcet criterion, q = 5
for the Deviation to
Indetermination criterion, q = 6
for the Deviation to Uniformity criterion,
q = 7
for the Profile Difference criterion, q = 8
for the Shi-Malik
criterion (parameterized by k
), and q = 9
for the Balanced Modularity
criterion.
The C++ version is based on version 0.3 (https://sourceforge.net/projects/louvain/). Binary files are required to run it, and can be installed with install_binaries.
If you changed the default path to the bin
folder
while running install_binaries, PLEASE MAKE SURE to set binpath
accordingly.
If you did not use install_binaries to change the permissions or test
the binary files, PLEASE MAKE SURE to set check_install
accordingly.
The C++ version generates temporary folders and/or files in the path_temp
folder ("louvain_temp" with a unique timestamp located in the bin folder in
binpath
by default). This temporary folder is removed by default
(delete_temp = TRUE
).
Value
A list
of class bioregion.clusters
with five slots:
name: A
character
containing the name of the algorithm.args: A
list
of input arguments as provided by the user.inputs: A
list
of characteristics of the clustering process.algorithm: A
list
of all objects associated with the clustering procedure, such as original cluster objects (only ifalgorithm_in_output = TRUE
).clusters: A
data.frame
containing the clustering results.
In the algorithm
slot, if algorithm_in_output = TRUE
, users can
find the output of cluster_louvain if
lang = "igraph"
and the following element if lang = "cpp"
:
cmd
: The command line used to run Louvain.version
: The Louvain version.web
: The Louvain's website.
Note
Although this algorithm was not primarily designed to deal with bipartite
networks, it is possible to consider the bipartite network as a unipartite
network (bipartite = TRUE
).
Do not forget to indicate which of the first two columns is dedicated to the
site nodes (i.e., primary nodes) and species nodes (i.e., feature nodes) using
the arguments site_col
and species_col
. The type of nodes returned in
the output can be chosen with the argument return_node_type
equal to
"both"
to keep both types of nodes, "site"
to preserve only the site
nodes, and "species"
to preserve only the species nodes.
Author(s)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
References
Blondel VD, Guillaume JL, Lambiotte R & Mech ELJS (2008) Fast unfolding of communities in large networks. J. Stat. Mech. 10, P10008.
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_3_network_clustering.html.
Associated functions: netclu_infomap netclu_greedy netclu_oslom
Examples
comat <- matrix(sample(1000, 50), 5, 10)
rownames(comat) <- paste0("Site", 1:5)
colnames(comat) <- paste0("Species", 1:10)
net <- similarity(comat, metric = "Simpson")
com <- netclu_louvain(net, lang = "igraph")
OSLOM community finding
Description
This function finds communities in a (un)weighted (un)directed network based on the OSLOM algorithm (http://oslom.org/, version 2.4).
Usage
netclu_oslom(
net,
weight = TRUE,
cut_weight = 0,
index = names(net)[3],
seed = NULL,
reassign = "no",
r = 10,
hr = 50,
t = 0.1,
cp = 0.5,
directed = FALSE,
bipartite = FALSE,
site_col = 1,
species_col = 2,
return_node_type = "both",
binpath = "tempdir",
check_install = TRUE,
path_temp = "oslom_temp",
delete_temp = TRUE
)
Arguments
net |
The output object from |
weight |
A |
cut_weight |
A minimal weight value. If |
index |
Name or number of the column to use as weight. By default,
the third column name of |
seed |
For the random number generator (NULL for random by default). |
reassign |
A |
r |
The number of runs for the first hierarchical level (10 by default). |
hr |
The number of runs for the higher hierarchical level (50 by default, 0 if you are not interested in hierarchies). |
t |
The p-value, the default value is 0.10. Increase this value if you want more modules. |
cp |
Kind of resolution parameter used to decide between taking some modules or their union (default value is 0.5; a bigger value leads to bigger clusters). |
directed |
A |
bipartite |
A |
site_col |
Name or number for the column of site nodes (i.e. primary nodes). |
species_col |
Name or number for the column of species nodes (i.e. feature nodes). |
return_node_type |
A |
binpath |
A |
check_install |
A |
path_temp |
A |
delete_temp |
A |
Details
OSLOM is a network community detection algorithm proposed in Lancichinetti et al. (2011) that finds statistically significant (overlapping) communities in (un)weighted and (un)directed networks.
This function is based on the 2.4 C++ version of OSLOM (http://www.oslom.org/software.htm). This function needs files to run. They can be installed with install_binaries.
If you changed the default path to the bin
folder
while running install_binaries, PLEASE MAKE SURE to set binpath
accordingly.
If you did not use install_binaries to change the permissions and test
the binary files, PLEASE MAKE SURE to set check_install
accordingly.
The C++ version of OSLOM generates temporary folders and/or files that are
stored in the path_temp
folder (folder "oslom_temp" with a unique timestamp
located in the bin folder in binpath
by default). This temporary folder is
removed by default (delete_temp = TRUE
).
Value
A list
of class bioregion.clusters
with five slots:
name: A
character
containing the name of the algorithm.args: A
list
of input arguments as provided by the user.inputs: A
list
of characteristics of the clustering process.algorithm: A
list
of all objects associated with the clustering procedure, such as original cluster objects (only ifalgorithm_in_output = TRUE
).clusters: A
data.frame
containing the clustering results.
In the algorithm
slot, users can find the following elements:
cmd
: The command line used to run OSLOM.version
: The OSLOM version.web
: The OSLOM's web site.
Note
Although this algorithm was not primarily designed to deal with bipartite
networks, it is possible to consider the bipartite network as unipartite
network (bipartite = TRUE
). Do not forget to indicate which of the
first two columns is dedicated to the site nodes (i.e. primary nodes) and
species nodes (i.e. feature nodes) using the arguments site_col
and
species_col
. The type of nodes returned in the output can be chosen
with the argument return_node_type
equal to both
to keep both
types of nodes, sites
to preserve only the sites nodes, and
species
to preserve only the species nodes.
Since OSLOM potentially returns overlapping communities, we propose two
methods to reassign the 'overlapping' nodes: randomly (reassign = "random"
)
or based on the closest candidate community (reassign = "simil"
) (only for
weighted networks, in this case the closest candidate community is
determined with the average similarity). By default, reassign = "no"
and
all the information will be provided. The number of partitions will depend
on the number of overlapping modules (up to three). The suffix _semel
,
_bis
, and _ter
are added to the column names. The first partition
(_semel
) assigns a module to each node. A value of NA
in the second
(_bis
) and third (_ter
) columns indicates that no overlapping module
was found for this node (i.e. non-overlapping nodes).
Author(s)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
References
Lancichinetti A, Radicchi F, Ramasco JJ & Fortunato S (2011) Finding statistically significant communities in networks. PLOS ONE 6, e18961.
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_3_network_clustering.html.
Associated functions: netclu_greedy netclu_infomap netclu_louvain
Examples
comat <- matrix(sample(1000, 50), 5, 10)
rownames(comat) <- paste0("Site", 1:5)
colnames(comat) <- paste0("Species", 1:10)
net <- similarity(comat, metric = "Simpson")
com <- netclu_oslom(net)
Community structure detection via short random walks
Description
This function finds communities in a (un)weighted undirected network via short random walks.
Usage
netclu_walktrap(
net,
weight = TRUE,
cut_weight = 0,
index = names(net)[3],
steps = 4,
bipartite = FALSE,
site_col = 1,
species_col = 2,
return_node_type = "both",
algorithm_in_output = TRUE
)
Arguments
net |
The output object from |
weight |
A |
cut_weight |
A minimal weight value. If |
index |
Name or number of the column to use as weight. By default,
the third column name of |
steps |
The length of the random walks to perform. |
bipartite |
A |
site_col |
Name or number for the column of site nodes (i.e. primary nodes). |
species_col |
Name or number for the column of species nodes (i.e. feature nodes). |
return_node_type |
A |
algorithm_in_output |
A |
Details
This function is based on random walks (Pons & Latapy, 2005) as implemented in the igraph package (cluster_walktrap).
Value
A list
of class bioregion.clusters
with five slots:
name: A
character
containing the name of the algorithm.args: A
list
of input arguments as provided by the user.inputs: A
list
of characteristics of the clustering process.algorithm: A
list
of all objects associated with the clustering procedure, such as original cluster objects (only ifalgorithm_in_output = TRUE
).clusters: A
data.frame
containing the clustering results.
In the algorithm
slot, if algorithm_in_output = TRUE
, users can
find the output of
cluster_walktrap.
Note
Although this algorithm was not primarily designed to deal with bipartite
networks, it is possible to consider the bipartite network as unipartite
network (bipartite = TRUE
).
Do not forget to indicate which of the first two columns is
dedicated to the site nodes (i.e. primary nodes) and species nodes (i.e.
feature nodes) using the arguments site_col
and species_col
.
The type of nodes returned in the output can be chosen with the argument
return_node_type
equal to both
to keep both types of nodes,
sites
to preserve only the site nodes, and species
to
preserve only the species nodes.
Author(s)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
References
Pons P & Latapy M (2005) Computing Communities in Large Networks Using Random Walks. In Yolum I, Güngör T, Gürgen F, Özturan C (eds.), Computer and Information Sciences - ISCIS 2005, Lecture Notes in Computer Science, 284-293.
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_3_network_clustering.html.
Associated functions: netclu_infomap netclu_louvain netclu_oslom
Examples
comat <- matrix(sample(1000, 50), 5, 10)
rownames(comat) <- paste0("Site", 1:5)
colnames(comat) <- paste0("Species", 1:10)
net <- similarity(comat, metric = "Simpson")
com <- netclu_walktrap(net)
net_bip <- mat_to_net(comat, weight = TRUE)
clust2 <- netclu_walktrap(net_bip, bipartite = TRUE)
Non-hierarchical clustering: Affinity Propagation
Description
This function performs non-hierarchical clustering using the Affinity Propagation algorithm.
Usage
nhclu_affprop(
similarity,
index = names(similarity)[3],
seed = NULL,
p = NA,
q = NA,
maxits = 1000,
convits = 100,
lam = 0.9,
details = FALSE,
nonoise = FALSE,
K = NULL,
prc = NULL,
bimaxit = NULL,
exact = NULL,
algorithm_in_output = TRUE
)
Arguments
similarity |
The output object from |
index |
The name or number of the similarity column to use. By default,
the third column name of |
seed |
The seed for the random number generator used when
|
p |
Input preference, which can be a vector specifying individual
preferences for each data point. If scalar, the same value is used for all
data points. If |
q |
If |
maxits |
The maximum number of iterations to execute. |
convits |
The algorithm terminates if the exemplars do not change for
|
lam |
The damping factor, a value in the range [0.5, 1). Higher values correspond to heavier damping, which may help prevent oscillations. |
details |
If |
nonoise |
If |
K |
The desired number of clusters. If not |
prc |
A parameter needed when |
bimaxit |
A parameter needed when |
exact |
A flag indicating whether to compute the initial preference range exactly. |
algorithm_in_output |
A |
Details
This function is based on the apcluster package (apcluster).
Value
A list
of class bioregion.clusters
with five slots:
name: A
character
string containing the name of the algorithm.args: A
list
of input arguments as provided by the user.inputs: A
list
describing the characteristics of the clustering process.algorithm: A
list
of objects associated with the clustering procedure, such as original cluster objects (ifalgorithm_in_output = TRUE
).clusters: A
data.frame
containing the clustering results.
If algorithm_in_output = TRUE
, the algorithm
slot includes the output of
apcluster.
Author(s)
Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
References
Frey B & Dueck D (2007) Clustering by Passing Messages Between Data Points. Science 315, 972-976.
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_2_non_hierarchical_clustering.html.
Associated functions: nhclu_clara nhclu_clarans nhclu_dbscan nhclu_kmeans nhclu_affprop
Examples
comat_1 <- matrix(sample(0:1000, size = 10*12, replace = TRUE,
prob = 1/1:1001), 10, 12)
rownames(comat_1) <- paste0("Site", 1:10)
colnames(comat_1) <- paste0("Species", 1:12)
comat_1 <- cbind(comat_1,
matrix(0, 10, 8,
dimnames = list(paste0("Site", 1:10),
paste0("Species", 13:20))))
comat_2 <- matrix(sample(0:1000,
size = 10*12,
replace = TRUE,
prob = 1/1:1001),
10, 12)
rownames(comat_2) <- paste0("Site", 11:20)
colnames(comat_2) <- paste0("Species", 9:20)
comat_2 <- cbind(matrix(0, 10, 8,
dimnames = list(paste0("Site", 11:20),
paste0("Species", 1:8))),
comat_2)
comat <- rbind(comat_1, comat_2)
dissim <- dissimilarity(comat, metric = "Simpson")
sim <- dissimilarity_to_similarity(dissim)
clust1 <- nhclu_affprop(sim)
clust2 <- nhclu_affprop(sim, q = 1)
# Fixed number of clusters
clust3 <- nhclu_affprop(sim, K = 2, prc = 10, bimaxit = 20, exact = FALSE)
Non-hierarchical clustering: CLARA
Description
This function performs non-hierarchical clustering based on dissimilarity using partitioning around medoids, implemented via the Clustering Large Applications (CLARA) algorithm.
Usage
nhclu_clara(
dissimilarity,
index = names(dissimilarity)[3],
seed = NULL,
n_clust = c(1, 2, 3),
maxiter = 0,
initializer = "LAB",
fasttol = 1,
numsamples = 5,
sampling = 0.25,
independent = FALSE,
algorithm_in_output = TRUE
)
Arguments
dissimilarity |
The output object from |
index |
The name or number of the dissimilarity column to use. By
default, the third column name of |
seed |
A value for the random number generator (set to |
n_clust |
An |
maxiter |
An |
initializer |
A |
fasttol |
A positive |
numsamples |
A positive |
sampling |
A positive |
independent |
A |
algorithm_in_output |
A |
Details
Based on fastkmedoids package (fastclara).
Value
A list
of class bioregion.clusters
with five components:
name: A
character
string containing the name of the algorithm.args: A
list
of input arguments as provided by the user.inputs: A
list
of characteristics of the clustering process.algorithm: A
list
of all objects associated with the clustering procedure, such as original cluster objects (only ifalgorithm_in_output = TRUE
).clusters: A
data.frame
containing the clustering results.
If algorithm_in_output = TRUE
, the algorithm
slot includes the output of
fastclara.
Author(s)
Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
References
Schubert E & Rousseeuw PJ (2019) Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms. Similarity Search and Applications 11807, 171-187.
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_2_non_hierarchical_clustering.html.
Associated functions: nhclu_clarans nhclu_dbscan nhclu_kmeans nhclu_pam nhclu_affprop
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
dissim <- dissimilarity(comat, metric = "all")
#clust <- nhclu_clara(dissim, index = "Simpson", n_clust = 5)
Non-hierarchical clustering: CLARANS
Description
This function performs non-hierarchical clustering based on dissimilarity using partitioning around medoids, implemented via the Clustering Large Applications based on RANdomized Search (CLARANS) algorithm.
Usage
nhclu_clarans(
dissimilarity,
index = names(dissimilarity)[3],
seed = NULL,
n_clust = c(1, 2, 3),
numlocal = 2,
maxneighbor = 0.025,
algorithm_in_output = TRUE
)
Arguments
dissimilarity |
The output object from |
index |
The name or number of the dissimilarity column to use. By
default, the third column name of |
seed |
A value for the random number generator ( |
n_clust |
An |
numlocal |
An |
maxneighbor |
A positive |
algorithm_in_output |
A |
Details
Based on fastkmedoids package (fastclarans).
Value
A list
of class bioregion.clusters
with five components:
name: A
character
string containing the name of the algorithm.args: A
list
of input arguments as provided by the user.inputs: A
list
of characteristics of the clustering process.algorithm: A
list
of all objects associated with the clustering procedure, such as original cluster objects (only ifalgorithm_in_output = TRUE
).clusters: A
data.frame
containing the clustering results.
If algorithm_in_output = TRUE
, the algorithm
slot includes the output of
fastclarans.
Author(s)
Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
References
Schubert E & Rousseeuw PJ (2019) Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms. Similarity Search and Applications 11807, 171-187.
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_2_non_hierarchical_clustering.html.
Associated functions: nhclu_clara nhclu_dbscan nhclu_kmeans nhclu_pam nhclu_affprop
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
dissim <- dissimilarity(comat, metric = "all")
#clust <- nhclu_clarans(dissim, index = "Simpson", n_clust = 5)
Non-hierarchical clustering: DBSCAN
Description
This function performs non-hierarchical clustering based on dissimilarity using the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm.
Usage
nhclu_dbscan(
dissimilarity,
index = names(dissimilarity)[3],
minPts = NULL,
eps = NULL,
plot = TRUE,
algorithm_in_output = TRUE,
...
)
Arguments
dissimilarity |
The output object from |
index |
The name or number of the dissimilarity column to use. By
default, the third column name of |
minPts |
A |
eps |
A |
plot |
A |
algorithm_in_output |
A |
... |
Additional arguments to be passed to |
Details
The DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
algorithm clusters points based on the density of neighbors around each
data point. It requires two main arguments: minPts
, the minimum number of
points to identify a core, and eps
, the radius used to find neighbors.
Choosing minPts: This determines how many points are necessary to form a cluster. For example, what is the minimum number of sites expected in a bioregion? Choose a value sufficiently large for your dataset and expectations.
Choosing eps: This determines how similar sites should be to form a
cluster. If eps
is too small, most points will be considered too distinct
and marked as noise. If eps
is too large, clusters may merge. The value of
eps
depends on minPts
. It is recommended to choose eps
by identifying
a knee in the k-nearest neighbor distance plot.
By default, the function attempts to find a knee in this curve
automatically, but the result is uncertain. Users should inspect the graph
and modify eps
accordingly. To explore eps
values, run the function
initially without defining eps
, review the recommendations, and adjust
as needed based on clustering results.
Value
A list
of class bioregion.clusters
with five components:
name: A
character
string containing the name of the algorithm.args: A
list
of input arguments as provided by the user.inputs: A
list
of characteristics of the clustering process.algorithm: A
list
of all objects associated with the clustering procedure, such as original cluster objects (only ifalgorithm_in_output = TRUE
).clusters: A
data.frame
containing the clustering results.
If algorithm_in_output = TRUE
, the algorithm
slot includes the output of
dbscan::dbscan.
Author(s)
Boris Leroy (leroy.boris@gmail.com)
Pierre Denelle (pierre.denelle@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
References
Hahsler M, Piekenbrock M & Doran D (2019) Dbscan: Fast density-based clustering with R. Journal of Statistical Software, 91(1), 1–30.
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_2_non_hierarchical_clustering.html.
Associated functions: nhclu_clara nhclu_clarans nhclu_kmeans nhclu_pam nhclu_affprop
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
dissim <- dissimilarity(comat, metric = "all")
clust1 <- nhclu_dbscan(dissim, index = "Simpson")
clust2 <- nhclu_dbscan(dissim, index = "Simpson", eps = 0.2)
clust3 <- nhclu_dbscan(dissim, index = "Simpson", minPts = c(5, 10, 15, 20),
eps = c(.1, .15, .2, .25, .3))
Non-hierarchical clustering: K-means analysis
Description
This function performs non-hierarchical clustering based on dissimilarity using a k-means analysis.
Usage
nhclu_kmeans(
dissimilarity,
index = names(dissimilarity)[3],
seed = NULL,
n_clust = c(1, 2, 3),
iter_max = 10,
nstart = 10,
algorithm = "Hartigan-Wong",
algorithm_in_output = TRUE
)
Arguments
dissimilarity |
The output object from |
index |
The name or number of the dissimilarity column to use. By
default, the third column name of |
seed |
A value for the random number generator ( |
n_clust |
An |
iter_max |
An |
nstart |
An |
algorithm |
A |
algorithm_in_output |
A |
Details
This method partitions data into k groups such that the sum of squares of Euclidean distances from points to the assigned cluster centers is minimized. K-means cannot be applied directly to dissimilarity or beta-diversity metrics because these distances are not Euclidean. Therefore, it first requires transforming the dissimilarity matrix using Principal Coordinate Analysis (PCoA) with pcoa, and then applying k-means to the coordinates of points in the PCoA.
Because this additional transformation alters the initial dissimilarity matrix, the partitioning around medoids method (nhclu_pam) is preferred.
Value
A list
of class bioregion.clusters
with five components:
name: A
character
string containing the name of the algorithm.args: A
list
of input arguments as provided by the user.inputs: A
list
of characteristics of the clustering process.algorithm: A
list
of all objects associated with the clustering procedure, such as original cluster objects (only ifalgorithm_in_output = TRUE
).clusters: A
data.frame
containing the clustering results.
If algorithm_in_output = TRUE
, the algorithm
slot includes the output of
kmeans.
Author(s)
Boris Leroy (leroy.boris@gmail.com)
Pierre Denelle (pierre.denelle@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_2_non_hierarchical_clustering.html.
Associated functions: nhclu_clara nhclu_clarans nhclu_dbscan nhclu_pam nhclu_affprop
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
comnet <- mat_to_net(comat)
dissim <- dissimilarity(comat, metric = "all")
clust <- nhclu_kmeans(dissim, n_clust = 2:10, index = "Simpson")
Non-hierarchical clustering: Partitioning Around Medoids
Description
This function performs non-hierarchical clustering based on dissimilarity using partitioning around medoids (PAM).
Usage
nhclu_pam(
dissimilarity,
index = names(dissimilarity)[3],
seed = NULL,
n_clust = c(1, 2, 3),
variant = "faster",
nstart = 1,
cluster_only = FALSE,
algorithm_in_output = TRUE,
...
)
Arguments
dissimilarity |
The output object from |
index |
The name or number of the dissimilarity column to use. By
default, the third column name of |
seed |
A value for the random number generator ( |
n_clust |
An |
variant |
A |
nstart |
An |
cluster_only |
A |
algorithm_in_output |
A |
... |
Additional arguments to pass to |
Details
This method partitions the data into the chosen number of clusters based on the input dissimilarity matrix. It is more robust than k-means because it minimizes the sum of dissimilarities between cluster centers (medoids) and points assigned to the cluster. In contrast, k-means minimizes the sum of squared Euclidean distances, which makes it unsuitable for dissimilarity matrices that are not based on Euclidean distances.
Value
A list
of class bioregion.clusters
with five components:
name: A
character
string containing the name of the algorithm.args: A
list
of input arguments as provided by the user.inputs: A
list
of characteristics of the clustering process.algorithm: A
list
of all objects associated with the clustering procedure, such as original cluster objects (only ifalgorithm_in_output = TRUE
).clusters: A
data.frame
containing the clustering results.
If algorithm_in_output = TRUE
, the algorithm
slot includes the output of
pam.
Author(s)
Boris Leroy (leroy.boris@gmail.com)
Pierre Denelle (pierre.denelle@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
References
Kaufman L & Rousseeuw PJ (2009) Finding groups in data: An introduction to cluster analysis. In & Sons. JW (ed.), Finding groups in data: An introduction to cluster analysis.
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_2_non_hierarchical_clustering.html.
Associated functions: nhclu_clara nhclu_clarans nhclu_dbscan nhclu_kmeans nhclu_affprop
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
comnet <- mat_to_net(comat)
dissim <- dissimilarity(comat, metric = "all")
clust <- nhclu_pam(dissim, n_clust = 2:15, index = "Simpson")
Compute similarity metrics between sites based on species composition
Description
This function generates a data.frame
where each row provides one or
several similarity metrics between pairs of sites, based on a co-occurrence
matrix
with sites as rows and species as columns.
Usage
similarity(comat, metric = "Simpson", formula = NULL, method = "prodmat")
Arguments
comat |
A co-occurrence |
metric |
A |
formula |
A |
method |
A |
Details
With a
the number of species shared by a pair of sites, b
species only present in the first site and c
species only present in
the second site.
Jaccard = 1 - (b + c) / (a + b + c)
Jaccardturn = 1 - 2min(b, c) / (a + 2min(b, c)) (Baselga, 2012)
Sorensen = 1 - (b + c) / (2a + b + c)
Simpson = 1 - min(b, c) / (a + min(b, c))
If abundances data are available, Bray-Curtis and its turnover component can also be computed with the following equation:
Bray = 1 - (B + C) / (2A + B + C)
Brayturn = 1 - min(B, C) / (A + min(B, C)) (Baselga, 2013)
with A
the sum of the lesser values for common species shared by a pair of
sites. B
and C
are the total number of specimens counted at both sites
minus A
.
formula
can be used to compute customized metrics with the terms
a
, b
, c
, A
, B
, and C
. For example
formula = c("1 - pmin(b,c) / (a + pmin(b,c))", "1 - (B + C) / (2*A + B + C)")
will compute the Simpson and Bray-Curtis similarity metrics, respectively.
Note that pmin
is used in the Simpson formula because a
, b
, c
, A
,
B
and C
are numeric
vectors.
Euclidean computes the Euclidean similarity between each pair of site following this equation:
Euclidean = 1 / (1 + d_ij)
Where d_ij is the Euclidean distance between site i and site j in terms of species composition.
Value
A data.frame
with the additional class
bioregion.pairwise.metric
, containing one or several similarity
metrics between pairs of sites. The first two columns represent the pairs of
sites. There is one column per similarity metric provided in metric
and
formula
, except for the abc
and ABC
metrics, which are stored in three
separate columns (one for each letter).
Author(s)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
References
Baselga A (2012) The Relationship between Species Replacement, Dissimilarity Derived from Nestedness, and Nestedness. Global Ecology and Biogeography 21, 1223–1232.
Baselga A (2013) Separating the two components of abundance-based dissimilarity: balanced changes in abundance vs. abundance gradients. Methods in Ecology and Evolution 4, 552–557.
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a3_pairwise_metrics.html.
Associated functions: dissimilarity similarity_to_dissimilarity
Examples
comat <- matrix(sample(0:1000, size = 50, replace = TRUE,
prob = 1 / 1:1001), 5, 10)
rownames(comat) <- paste0("Site", 1:5)
colnames(comat) <- paste0("Species", 1:10)
sim <- similarity(comat, metric = c("abc", "ABC", "Simpson", "Brayturn"))
sim <- similarity(comat, metric = "all",
formula = "1 - (b + c) / (a + b + c)")
Convert similarity metrics to dissimilarity metrics
Description
This function converts a data.frame
of similarity metrics between sites
into dissimilarity metrics (beta diversity).
Usage
similarity_to_dissimilarity(similarity, include_formula = TRUE)
Arguments
similarity |
The output object from |
include_formula |
A |
Value
A data.frame
with additional class
bioregion.pairwise.metric
, providing dissimilarity
metric(s) between each pair of sites based on a similarity object.
Note
The behavior of this function changes depending on column names. Columns
Site1
and Site2
are copied identically. If there are columns called
a
, b
, c
, A
, B
, C
they will also be copied identically. If there
are columns based on your own formula (argument formula
in similarity()
)
or not in the original list of similarity metrics (argument metrics
in
similarity()
) and if the argument include_formula
is set to FALSE
,
they will also be copied identically. Otherwise there are going to be
converted like they other columns (default behavior).
If a column is called Euclidean
, its distance will be calculated based
on the following formula:
Euclidean distance = (1 - Euclidean similarity) / Euclidean similarity
Otherwise, all other columns will be transformed into dissimilarity with the following formula:
dissimilarity = 1 - similarity
Author(s)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Boris Leroy (leroy.boris@gmail.com)
Pierre Denelle (pierre.denelle@gmail.com)
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a3_pairwise_metrics.html.
Associated functions: dissimilarity similarity_to_dissimilarity
Examples
comat <- matrix(sample(0:1000, size = 50, replace = TRUE,
prob = 1 / 1:1001), 5, 10)
rownames(comat) <- paste0("Site", 1:5)
colnames(comat) <- paste0("Species", 1:10)
simil <- similarity(comat, metric = "all")
simil
dissimilarity <- similarity_to_dissimilarity(simil)
dissimilarity
Calculate contribution metrics of sites and species
Description
This function calculates metrics to assess the contribution of a given species or site to its bioregion.
Usage
site_species_metrics(
bioregionalization,
comat,
indices = c("rho"),
net = NULL,
site_col = 1,
species_col = 2
)
Arguments
bioregionalization |
A |
comat |
A co-occurrence |
indices |
A |
net |
|
site_col |
A number indicating the position of the column containing
the sites in |
species_col |
A number indicating the position of the column
containing the species in |
Details
The \rho
metric is derived from Lenormand et al. (2019) with the
following formula:
\rho_{ij} = \frac{n_{ij} - \frac{n_i n_j}{n}}{\sqrt{\left(\frac{n - n_j}{
n-1}\right) \left(1-\frac{n_j}{n}\right) \frac{n_i n_j}{n}}}
where n
is the number of sites, n_i
is the number of sites in
which species i
is present, n_j
is the number of sites in
bioregion j
, and n_{ij}
is the number of occurrences of species
i
in sites of bioregion j
.
Affinity A
, fidelity F
, and individual contributions
IndVal
describe how species are linked to their bioregions. These
metrics are described in Bernardo-Madrid et al. (2019):
Affinity of species to their region:
A_i = \frac{R_i}{Z}
, whereR_i
is the occurrence/range size of speciesi
in its associated bioregion, andZ
is the total size (number of sites) of the bioregion. High affinity indicates that the species occupies most sites in its bioregion.Fidelity of species to their region:
F_i = \frac{R_i}{D_i}
, whereR_i
is the occurrence/range size of speciesi
in its bioregion, andD_i
is its total range size. High fidelity indicates that the species is not present in other regions.Indicator Value of species:
IndVal = F_i \cdot A_i
.
Cz
metrics are derived from Guimerà & Amaral (2005):
Participation coefficient:
C_i = 1 - \sum_{s=1}^{N_M}{\left(\frac{k_{is}}{k_i}\right)^2}
, wherek_{is}
is the number of links of nodei
to nodes in bioregions
, andk_i
is the total degree of nodei
. A high value means links are uniformly distributed; a low value means links are within the node's bioregion.Within-bioregion degree z-score:
z_i = \frac{k_i - \overline{k_{si}}}{\sigma_{k_{si}}}
, wherek_i
is the number of links of nodei
to nodes in its bioregions_i
,\overline{k_{si}}
is the average degree of nodes ins_i
, and\sigma_{k_{si}}
is the standard deviation of degrees ins_i
.
Value
A data.frame
with columns Bioregion
, Species
, and the desired summary
statistics, or a list of data.frame
s if Cz
and other indices are
selected.
Author(s)
Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
References
Bernardo-Madrid R, Calatayud J, González‐Suárez M, Rosvall M, Lucas P, Antonelli A & Revilla E (2019) Human activity is altering the world’s zoogeographical regions. Ecology Letters 22, 1297–1305.
Guimerà R & Amaral LAN (2005) Functional cartography of complex metabolic networks. Nature 433, 895–900.
Lenormand M, Papuga G, Argagnon O, Soubeyrand M, Alleaume S & Luque S (2019) Biogeographical network analysis of plant species distribution in the Mediterranean region. Ecology and Evolution 9, 237–250.
See Also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a5_3_summary_metrics.html.
Associated functions: bioregion_metrics bioregionalization_metrics
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
dissim <- dissimilarity(comat, metric = "Simpson")
clust1 <- nhclu_kmeans(dissim, n_clust = 3, index = "Simpson")
net <- similarity(comat, metric = "Simpson")
com <- netclu_greedy(net)
site_species_metrics(bioregionalization = clust1, comat = comat,
indices = "rho")
# Contribution metrics
site_species_metrics(bioregionalization = com, comat = comat,
indices = c("rho", "affinity", "fidelity", "indicator_value"))
# Cz indices
net_bip <- mat_to_net(comat, weight = TRUE)
clust_bip <- netclu_greedy(net_bip, bipartite = TRUE)
site_species_metrics(bioregionalization = clust_bip, comat = comat,
net = net_bip, indices = "Cz")
Extract a subset of sites or species from a bioregion.clusters
object
Description
This function extracts a subset of nodes based on their type ("site"
or
"species"
) from a bioregion.clusters
object, which contains both types of
nodes (sites and species).
Usage
site_species_subset(clusters, node_type = "site")
Arguments
clusters |
An object of class |
node_type |
A |
Value
An object of class bioregion.clusters
containing only the specified
node type (sites or species).
Note
Network clustering functions (prefixed with netclu_
) may return both types
of nodes (sites and species) when applied to bipartite networks (using the
bipartite
argument). In such cases, the type of nodes included in the
output can be specified with the return_node_type
argument. This function
allows you to extract a particular type of nodes (sites or species) from the
output and adjust the return_node_type
attribute accordingly.
Author(s)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
Examples
net <- data.frame(
Site = c(rep("A", 2), rep("B", 3), rep("C", 2)),
Species = c("a", "b", "a", "c", "d", "b", "d"),
Weight = c(10, 100, 1, 20, 50, 10, 20)
)
clusters <- netclu_louvain(net, lang = "igraph", bipartite = TRUE)
clusters_sites <- site_species_subset(clusters, node_type = "site")
Spatial distribution of Mediterranean vegetation (data.frame)
Description
A dataset containing the abundance of 3,697 species in 715 sites.
Usage
vegedf
Format
A data.frame
with 460,878 rows and 3 columns:
- Site
Unique site identifier (corresponding to the field ID of vegesp)
- Species
Unique species identifier
- Abundance
Species abundance
Source
Spatial distribution of Mediterranean vegetation (co-occurrence matrix)
Description
A dataset containing the abundance of each of the 3,697 species in each of the 715 sites.
Usage
vegemat
Format
A co-occurrence matrix
with sites as rows and species as
columns. Each element of the matrix
represents the abundance of the species in the site.
Source
Spatial distribution of Mediterranean vegetation (spatial grid)
Description
A dataset containing the geometry of the 715 sites.
Usage
vegesf
Format
A
- ID
Unique site identifier
- geometry
Geometry of the site