Title: | Generalized Association Plots |
Version: | 0.1.4 |
Description: | Provides a comprehensive framework for visualizing associations and interaction structures in matrix-formatted data using Generalized Association Plots (GAP). The package implements multiple proximity computation methods (e.g., correlation, distance metrics), ordering techniques including hierarchical clustering (HCT) and Rank-2-Ellipse (R2E) seriation, and optional flipping strategies to enhance visual symmetry. It supports a variety of covariate-based color annotations, allows flexible customization of layout and output, and is suitable for analyzing multivariate data across domains such as social sciences, genomics, and medical research. The method is based on Generalized Association Plots introduced by Chen (2002) https://www3.stat.sinica.edu.tw/statistica/J12N1/J12N11/J12N11.html and further extended by Wu, Tien, and Chen (2010) <doi:10.1016/j.csda.2008.09.029>. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
Imports: | Rcpp, ComplexHeatmap, RColorBrewer, gridExtra, grid, dendextend, circlize, seriation, magick, |
Suggests: | MASS |
LinkingTo: | Rcpp |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | yes |
Packaged: | 2025-06-06 08:52:10 UTC; R108-4 |
Author: | Shu-Yu Lin [aut, cre], Chiun-How Kao [aut, ctb], Chun-Houh Chen [aut, ctb] |
Maintainer: | Shu-Yu Lin <shuyuuu89@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-06-06 13:30:10 UTC |
Internal function to compute Anti-Robinson or GAR/ RGAR score
Description
This function is not exported. It serves as the core computational engine for evaluating Anti-Robinson (AR), Generalized Anti-Robinson (GAR), and Relative GAR (RGAR) scores.
Usage
.compute_gar_core(mat_sorted, w = NULL, normalize = FALSE)
Arguments
mat_sorted |
A numeric, symmetric sorted distance matrix. |
w |
An integer window size (if NULL, evaluates all triplets globally). |
normalize |
Logical. If TRUE, returns a proportion; otherwise returns the raw count. |
Value
A numeric value indicating the number (or proportion) of Anti-Robinson violations.
Compute the Anti-Robinson (AR) score
Description
Calculates the total number of Anti-Robinson violations over all triplets in the matrix using the specified ordering. This is equivalent to GAR with a full window.
Usage
AR(mat_sorted)
Arguments
mat_sorted |
A numeric, symmetric sorted distance matrix. |
Value
The AR score (the total number of structural violations).
Please refer to GAP
for complete usage examples.
Generalized Association Plots (GAP)
Description
Generates a generalized association plot for the given matrix or data frame, with optional proximity computation, ordering, flipping, coloring, and export options.
Usage
GAP(
data,
isProximityMatrix = FALSE,
XdNum = NULL,
XcNum = NULL,
YdNum = NULL,
YcNum = NULL,
row.name = NULL,
Xd.name = NULL,
Xc.name = NULL,
row.prox = NULL,
col.prox = NULL,
show.row.prox = TRUE,
show.col.prox = TRUE,
row.order = NULL,
col.order = NULL,
row.flip = NULL,
col.flip = NULL,
row.externalOrder = NULL,
col.externalOrder = NULL,
original.color = NULL,
row.color = NULL,
col.color = NULL,
Xd.color = NULL,
Xc.color = NULL,
Yd.color = NULL,
Yc.color = NULL,
row.label.size = NULL,
col.label.size = NULL,
Xd.label.size = NULL,
Xc.label.size = NULL,
Yd.label.size = NULL,
Yc.label.size = NULL,
colorbar.margin = 1.5,
border = FALSE,
border.width = 1,
isContainMissingValue = 0,
MissingValue.color = "gray",
exp.row_order = FALSE,
exp.column_order = FALSE,
exp.row_names = FALSE,
exp.column_names = FALSE,
exp.Xc = FALSE,
exp.Yc = FALSE,
exp.Xd = FALSE,
exp.Yd = FALSE,
exp.Xd_codebook = FALSE,
exp.Yd_codebook = FALSE,
exp.originalmatrix = FALSE,
exp.row_prox = FALSE,
exp.col_prox = FALSE,
PNGfilename = NULL,
PNGwidth = 1800,
PNGheight = 1200,
PNGres = 150,
show.plot = FALSE
)
Arguments
data |
A data frame to be visualized. |
isProximityMatrix |
Logical. Whether the input data is already a proximity matrix. |
XdNum , XcNum , YdNum , YcNum |
Integer vectors specifying discrete/continuous covariates on X and Y axes. |
row.name |
Either a character vector, or an integer vector to be used as row names. |
Xd.name , Xc.name |
Either A string, or a character vector to be used as Xc.name/Xd.name. |
row.prox , col.prox |
A string indicating the method used to compute row/column proximity. |
show.row.prox , show.col.prox |
Logical. Whether to show row/column proximity matrices. |
row.order , col.order |
A string specifying the method used to order rows/columns. |
row.flip , col.flip |
A string specifying the row/column flipping method. |
row.externalOrder , col.externalOrder |
Integer vectors used as external references for flipping. |
original.color |
Color palette for the original data matrix. |
row.color , col.color |
Color palettes for the row/column proximity matrices. |
Xd.color , Xc.color , Yd.color , Yc.color |
Color palettes for covariate matrices. |
row.label.size , col.label.size |
Numeric values controlling the font size of row and column labels. |
Xd.label.size , Xc.label.size , Yd.label.size , Yc.label.size |
Numeric values controlling the font size of covariate labels for X and Y axes. |
colorbar.margin |
Numeric. The margin space between the colorbar and the main plot area. |
border |
Logical. Whether to draw borders around each matrix. |
border.width |
Numeric value specifying border width. |
isContainMissingValue |
Integer. Set to |
MissingValue.color |
Color to represent missing values in the matrix. Default is |
exp.row_order , exp.column_order |
Logical. Whether to export row/column order. |
exp.row_names , exp.column_names |
Logical. Whether to export sorted row/column names. |
exp.Xc , exp.Yc , exp.Xd , exp.Yd |
Logical. Whether to export sorted covariate matrices. |
exp.Xd_codebook , exp.Yd_codebook |
Logical. Whether to export codebooks for discrete covariates. |
exp.originalmatrix |
Logical. Whether to export the reordered original matrix. |
exp.row_prox , exp.col_prox |
Logical. Whether to export computed proximity matrices (after ordering). |
PNGfilename |
A string specifying the output filename for the PNG image. |
PNGwidth , PNGheight |
Width/height of the PNG image in pixels. |
PNGres |
Resolution of the PNG image in DPI. |
show.plot |
Logical. Whether to display the plot in the R graphics window after generation. |
Details
isProximityMatrix
If isProximityMatrix = TRUE
, you may directly provide a proximity matrix as the input data
.
In this case, only row-based settings will be applied, such as row.order
, row.flip
, and row.externalOrder
.
Note that correlation matrices (e.g., "pearson"
) must be converted to distance matrices before being used,
and the selected color scheme must also be one of the supported diverging palettes (e.g., "GAP_Blue_White_Red"
, "BrBG"
, "PiYG"
, "PRGn"
, "PuOr"
, "RdBu"
, "RdGy"
).
XdNum, XcNum, YdNum, YcNum
These parameters are used to specify which columns in data
should be treated as covariates
on the X or Y axes. Provide the column indices (e.g., XdNum = c(3, 5)
) of discrete or continuous variables.
Xd.name, Xc.name
If not provided, the default labels will be a sequence of numbers based on the number of selected variables (e.g., "1", "2", ..., up to the length of XdNum or XcNum).
row.name
This parameter can be:
A character vector providing custom row names.
An integer (column index) indicating a column in
data
to be used as row names.If
row.name = NULL
, the row names will be automatically generated as1:nrow(data)
.
row.prox, col.prox
Available proximity methods for row.prox
and col.prox
include:
-
"euclidean"
-
"pearson"
-
"kendall"
-
"spearman"
-
"atancorr"
(adjusted tangent correlation) -
"city-block"
(Manhattan distance) -
"abs_pearson"
-
"uncenteredcorr"
-
"abs_uncenteredcorr"
-
"maximum"
-
"canberra"
For binary data, the following methods are supported:
-
"hamman"
-
"jaccard"
-
"phi"
-
"rao"
-
"rogers"
-
"simple"
-
"sneath"
-
"yule"
show.row.prox, show.col.prox
If set to TRUE
, the corresponding proximity matrix will be visualized.
If set to FALSE
, the proximity matrix will not be shown, but the associated proximity and ordering methods will still be applied.
In such cases, the dendrogram (tree structure) will appear alongside the original plot, reflecting the proximity-based ordering.
row.order, col.order
The ordering method determines how the rows or columns are reordered. Supported options include:
-
"original"
— Use the original data order. -
"random"
— Randomly permute the order. -
"reverse"
— Reverse the original order. -
"r2e"
— Rank-two ellipse ordering. -
"single"
— Single-linkage hierarchical clustering. -
"complete"
— Complete-linkage hierarchical clustering. -
"average"
— Average-linkage hierarchical clustering (UPGMA). -
any method name from the
seriation
package — such as"TSP"
,"Spectral"
,"ARSA"
, etc.
If the ordering method is "original"
, "random"
, or "reverse"
, then proximity matrices are not required,
and the parameters row.prox
or col.prox
may be left unset.
For all other ordering methods, a proximity matrix must be computed first.
Therefore, row.prox
or col.prox
must be specified accordingly.
Note: it is necessary to explicitly specify one of the valid ordering options; the function does not assume a default.
row.flip, col.flip
Supported flipping methods include:
-
"r2e"
— Flip using the rank-two ellipse (R2E) method. -
"uncle"
— Apply uncle-flipping based on tree structure. -
"grandpa"
— Apply grandpa-flipping based on tree structure.
Usage restrictions:
Flipping is only applicable when a hierarchical clustering tree is generated. Therefore, if
row.order
orcol.order
is set to"original"
,"random"
,"reverse"
,"r2e"
, or a seriation method, tree structures are not built and flipping cannot be applied.When using
"r2e"
as the ordering method, only"r2e"
flipping is allowed."uncle"
or"grandpa"
flipping will be ignored.-
Do not specify both
externalOrder
andflip
at the same time. These options are mutually exclusive. If both are provided, the function will throw an error.
row.externalOrder, col.externalOrder
External orders are used as references when flipping the hierarchical clustering tree. If a tree is available, the external order guides the flipping of the dendrogram’s leaf nodes to better match a predefined sequence.
Important:
Do not use externalOrder
together with flip
; they are mutually exclusive.
Color settings
The function supports a variety of color palette options for visualizing the original matrix, proximity matrices, and covariate matrices.
Supported built-in palettes include:
-
"GAP_Rainbow"
-
"GAP_Blue_White_Red"
-
"GAP_d"
-
"grayscale_palette"
You may also specify any palette name from the RColorBrewer
package.
However, note that some palettes—such as those under the "Qualitative" category—are not suitable for visualizing continuous data like proximity matrices.
All palette names must be passed as character strings (e.g., "GAP_Rainbow"
, "Set1"
).
original.color:
The system will automatically determine the appropriate default color palette based on data type.
If the input data is binary, the default is a grayscale palette; otherwise, it defaults to "GAP_Rainbow"
.
row.color, col.color:
The system chooses a default palette based on the proximity method used.
For distance-based methods (e.g., "euclidean"
, "city-block"
), the default is "GAP_Rainbow"
.
For correlation-based methods (e.g., "pearson"
, "spearman"
), the default is "GAP_Blue_White_Red"
.
Xd.color, Yd.color (discrete covariates):
The default color palette is "GAP_d"
, which supports up to 16 distinct categories.
If there are more than 16 unique levels, a custom palette should be provided by the user.
Label size settings
Font sizes for axis labels and covariate matrices can be customized individually. Default values are:
-
row.label.size
: 2 -
col.label.size
: 8 -
Xd.label.size, Xc.label.size, Yd.label.size, Yc.label.size
,Xc.label.size
: 8
You may increase or decrease these values to improve readability depending on figure size and resolution.
Export-related options (exp.*
)
When any of the exp.*
parameters are set to TRUE
, the corresponding data will be stored in a list and returned by the function.
This allows users to programmatically retrieve the order, reordered matrix, proximity matrices, covariate data, or codebooks after plotting.
PNG output settings
The following parameters control the export of the PNG image:
-
PNGfilename
: The name of the PNG file to be saved.The file extension
.png
must be included manually (e.g.,"myplot.png"
).If no file path is specified, the image will be saved in a system-generated temporary directory (via
tempdir()
) using the default filename"output_plot.png"
.To save the image to a specific location, provide the full path (e.g.,
"C:/.../myplot.png"
).
-
PNGwidth
: Width of the output image in pixels. Default = 1800. -
PNGheight
: Height of the output image in pixels. Default = 1200. -
PNGres
: Resolution (dots per inch, DPI). Default = 150.
Value
A composite plot (e.g., heatmap with annotations) is saved or displayed. Additional information may be exported based on the settings.
If one or more export-related options (exp.*
) are set to TRUE
,
the function returns a list containing the requested components. Each element in the list corresponds to an exportable data object,
Examples
# Example using the crabs dataset from the MASS package
if (requireNamespace("MASS", quietly = TRUE)) {
df_crabs <- MASS::crabs
CRAB_result <- GAP(
data = df_crabs,
YdNum = c(1,2), # First two columns as Y discrete covariates
YcNum = 3, # Third column as Y continuous covariate
row.name = c(1,2,3), # Use First three columns as row names
row.prox = "euclidean",
col.prox = "euclidean",
row.order = "average",
col.order = "average",
row.flip = "r2e",
col.flip = "r2e",
border = TRUE,
border.width = 1,
exp.row_order = TRUE,
exp.column_order = TRUE,
exp.row_names = TRUE,
exp.column_names = TRUE,
exp.Yd_codebook = TRUE,
exp.Yd = TRUE,
exp.Yc = TRUE,
exp.originalmatrix = TRUE,
exp.row_prox = TRUE,
exp.col_prox = TRUE,
PNGfilename = file.path(tempdir(), "output_plot.png"),
show.plot = TRUE
)
# Access exported results:
CRAB_result$row_order # Row order after ordering
CRAB_result$column_order # Column order after ordering
CRAB_result$row_names # Row names after ordering
CRAB_result$column_names # Column names after ordering
CRAB_result$Yd_codebook # Codebook for Y discrete covariates
CRAB_result$Yd # Y discrete covariates after ordering
CRAB_result$Yc # Y continuous covariates after ordering
CRAB_result$originalmatrix # Original matrix (after ordering)
CRAB_result$row_prox # Row proximity matrix (after ordering)
CRAB_result$col_prox # Column proximity matrix (after ordering)
# Evaluate row ordering quality
AR(CRAB_result$row_prox)
GAR(CRAB_result$row_prox, w = 10)
RGAR(CRAB_result$row_prox, w = 10)
}
Color palette: GAP_Blue_White_Red
Description
A diverging color palette from blue to white to red, suitable for visualizing correlations.
Usage
GAP_Blue_White_Red
Format
An object of class character
of length 20.
Color palette: GAP_Rainbow
Description
A color palette with 13 colors in rainbow used for continuous data.
Usage
GAP_Rainbow
Format
An object of class character
of length 13.
Color palette: GAP_d
Description
A 16-category discrete color palette.
Usage
GAP_d
Format
An object of class character
of length 16.
Compute the Generalized Anti-Robinson (GAR) score
Description
Calculates the number of Anti-Robinson violations within a specified window w
,
allowing evaluation of local structural consistency in the reordered matrix.
Usage
GAR(mat_sorted, w = NULL)
Arguments
mat_sorted |
A numeric, symmetric sorted distance matrix. |
w |
Window size (integer). If NULL, uses global comparisons (equivalent to AR). |
Value
The GAR score (the total number of violations).
Please refer to GAP
for complete usage examples.
Compute the Relative Generalized Anti-Robinson (RGAR) score
Description
This function returns the relative GAR score, representing the proportion of Anti-Robinson violations over the total number of evaluated triplets.
Usage
RGAR(mat_sorted, w = NULL)
Arguments
mat_sorted |
A numeric, symmetric sorted distance matrix. |
w |
Window size (integer). If NULL, uses global comparisons (equivalent to AR normalized). |
Value
The RGAR score (between 0 and 1).
Please refer to GAP
for complete usage examples.
Compute Proximity Matrix
Description
This function takes a numeric matrix and computes a square proximity matrix (similarity or distance) based on a specified method.
Usage
computeProximity(data, proxType, side, isContainMissingValue)
Arguments
data |
A numeric matrix with n rows and p columns. Each row typically represents an observation. |
proxType |
An integer specifying the type of proximity measure to use. |
side |
An integer indicating the direction for computing proximity. |
isContainMissingValue |
An integer indicating whether the input data contains missing values. |
Details
proxType
Available proxType options include:
-
0
: Euclidean -
1
: Pearson correlation -
2
: Kendall correlation -
3
: Spearman correlation -
4
: Adjusted tangent correlation (atancorr) -
5
: City-block (Manhattan) distance -
6
: Absolute Pearson correlation -
7
: Uncentered correlation -
8
: Absolute uncentered correlation -
20
: Hamman similarity (binary) -
21
: Jaccard index (binary) -
22
: Phi coefficient (binary) -
23
: Rao coefficient (binary) -
24
: Rogers-Tanimoto similarity (binary) -
25
: Simple matching coefficient (binary) -
26
: Sneath coefficient (binary) -
27
: Yule's Q (binary)
Ensure the data type matches the selected method. For example, binary methods should only be used on binary (0/1) data.
side
Use 0
for row-wise proximity and 1
for column-wise proximity.
isContainMissingValue
Set to 1
if the input data includes missing values; otherwise, use 0
.
Value
A square matrix representing the proximity between rows or columns, depending on the selected side.
Ellipse Sort
Description
This function applies an Rank-2-Ellipse seriation method to reorder a proximity matrix.
Usage
ellipse_sort(data)
Arguments
data |
A square numeric proximity matrix (either n × n or p × p), representing pairwise distances or similarities between items. |
Value
An integer vector representing the reordered indices of the matrix rows.
Color palette: grayscale_palette
Description
A simple two-color grayscale palette from white to black, often used for binary data visualization.
Usage
grayscale_palette
Format
An object of class character
of length 2.
HCTree Sort
Description
This function applies a hierarchical clustering tree (HCT) sorting algorithm to reorder rows of a proximity matrix. It supports external ordering constraints, different linkage-based order types, and optional flipping for optimal layout.
Usage
hctree_sort(distance_matrix, externalOrder = NULL, orderType, flipType)
Arguments
distance_matrix |
A square numeric proximity matrix (either n × n or p × p) representing pairwise distances between items. |
externalOrder |
An integer vector specifying an initial or external ordering of the items (can be empty or NULL if not used). |
orderType |
An integer indicating the type of hierarchical clustering order to apply. |
flipType |
An integer indicating the flipping methods. |
Details
distance_matrix
The input matrix must represent pairwise distances between items.
If you start with a similarity matrix (e.g., a correlation matrix), you must convert it to a dissimilarity matrix before use.
For example, for correlation-based similarities,
use as.matrix(as.dist(1 - cor_matrix))
or other appropriate transformations to convert it to a proper distance matrix.
The matrix should also be symmetric and non-negative.
orderType
Specifies the linkage method used for hierarchical clustering:
-
0
: Single-linkage -
1
: Complete-linkage -
2
: Average-linkage (UPGMA)
flipType
Controls how the branches of the clustering tree are flipped:
-
1
: Flip based onexternalOrder
This option should be used only whenexternalOrder
is provided. -
2
: Uncle-flipping -
3
: Grandpa-flipping
Important:
Do not specify both flipType = 1
and a NULL
or missing externalOrder
.
When using flipType = 1
, externalOrder
must be a valid integer vector.
Value
A list representing a dendrogram tree structure, containing:
left
, right
, and height
for tree construction,
and order
for the optimal leaf order.