Type: | Package |
Title: | High Dimensional Discriminant Analysis with Compositional Data |
Version: | 1.0 |
Date: | 2025-07-08 |
Maintainer: | Michail Tsagris <mtsagris@uoc.gr> |
Depends: | R (≥ 4.0) |
Imports: | Compositional, HDclassif, Rfast, stats |
Suggests: | Rfast2 |
Description: | High dimensional discriminant analysis with compositional data is performed. The compositional data are first transformed using the alpha-transformation of Tsagris M., Preston S. and Wood A.T.A. (2011) <doi:10.48550/arXiv.1106.1451>, and then the High Dimensional Discriminant Analysis (HDDA) algorithm of Bouveyron C. Girard S. and Schmid C. (2007) <doi:10.1080/03610920701271095> is applied. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
NeedsCompilation: | no |
Packaged: | 2025-07-08 20:13:08 UTC; mtsag |
Author: | Michail Tsagris [aut, cre] |
Repository: | CRAN |
Date/Publication: | 2025-07-11 13:10:02 UTC |
High Dimensional Discriminant Analysis with Compositional Data
Description
High dimensional discriminant analysis (HDDA) for compositional data using the alpha-transformation is performed.
Details
Package: | CompositionalHDDA |
Type: | Package |
Version: | 1.0 |
Date: | 2025-07-08 |
License: | GPL-2 |
Maintainers
Michail Tsagris <mtsagris@uoc.gr>
Author(s)
Michail Tsagris mtsagris@uoc.gr.
References
Bouveyron C. Girard S. and Schmid C. (2007). High Dimensional Discriminant Analysis. Communications in Statistics: Theory and Methods, 36(14): 2607–2623.
Bouveyron C. Celeux G. and Girard S. (2010). Intrinsic dimension estimation by maximum likelihood in probabilistic PCA. Technical Report 440372, Universite Paris 1 Pantheon-Sorbonne.
Berge L. Bouveyron C. and Girard S. (2012). HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data. Journal of Statistical Software, 46(6).
Tsagris M.T., Preston S. and Wood A.T.A. (2011). A data-based power transformation for compositional data. In Proceedings of the 4th Compositional Data Analysis Workshop, Girona, Spain. https://arxiv.org/pdf/1106.1451.pdf
HDDA for compositional data using the alpha-transformation
Description
HDDA for compositional data using the alpha-transformation.
Usage
alfa.hdda(xnew, ina, x, a = seq(-1, 1, by = 0.1), d_select = "Cattell", threshold = 0.2)
Arguments
xnew |
A matrix with the new compositional data whose class is to be predicted. |
ina |
A group indicator variable for the compositional data. |
x |
The compositional data. Zero values are allowed. |
a |
Either a single value or a vector of |
d_select |
Either "Cattell", "BIC" or "both". "Cattell": The Cattell's scree-test is used to gather the intrinsic dimension of each class. If the model is of common dimension (models 7 to 14), the scree-test is done on the covariance matrix of the whole dataset. "BIC": The intrinsic dimensions are selected with the BIC criterion. See Bouveyron et al. (2010) for a discussion of this topic. For common dimension models, the procedure is done on the covariance matrix of the whole dataset. |
threshold |
A float stricly within 0 and 1. It is the threshold used in the Cattell's Scree-Test. |
Details
The compositional data are first using the \alpha
-transformation and then the HDDA algorithm is called. The function then will compute all the models, give their BIC and keep the model with the highest BIC value.
Value
A list with sub-lists, one for each value of \alpha
, where each sub-list includes:
mod |
A list containing the output as returned by the function hdda from the package HDclassif. |
class |
The predicted class of each observation. |
posterior |
The posterior probabilities of each new observation. |
Author(s)
Michail Tsagris.
R implementation and documentation: Michail Tsagris mtsagris@uoc.gr.
References
Bouveyron C. Girard S. and Schmid C. (2007). High Dimensional Discriminant Analysis. Communications in Statistics: Theory and Methods, 36(14): 2607–2623.
Bouveyron C. Celeux G. and Girard S. (2010). Intrinsic dimension estimation by maximum likelihood in probabilistic PCA. Technical Report 440372, Universite Paris 1 Pantheon-Sorbonne.
Berge L. Bouveyron C. and Girard S. (2012). HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data. Journal of Statistical Software, 46(6).
Tsagris M.T., Preston S. and Wood A.T.A. (2011). A data-based power transformation for compositional data. In Proceedings of the 4th Compositional Data Analysis Workshop, Girona, Spain. https://arxiv.org/pdf/1106.1451.pdf
See Also
Examples
x <- matrix( rgamma(60 * 100, runif(100, 4, 10), 1), ncol = 100, byrow = TRUE )
x <- x / rowSums(x) ## Dirichlet simulated values
xnew <- matrix( rgamma(20 * 100, runif(100, 4, 10), 1), ncol = 100, byrow = TRUE )
xnew <- xnew / rowSums(xnew) ## Dirichlet simulated values
ina <- rbinom(60, 1, 0.5)
alfa.hdda(xnew, ina, x, a = 0.5)
Cross-Validation of the HDDA for compositional data using the alpha-transformation
Description
Cross-Validation of the HDDA for compositional data using the alpha-transformation.
Usage
cv.alfahdda(ina, x, a = seq(-1, 1, by = 0.1), d_select = "both",
threshold = c(0.001, 0.005, 0.05, 1:9 * 0.1), folds = NULL, stratified = TRUE,
nfolds = 10, seed = NULL)
Arguments
ina |
A group indicator variable for the compositional data. |
x |
The compositional data. Zero values are allowed. |
a |
A vector of |
d_select |
Either "Cattell", "BIC" or "both". |
threshold |
A vector with numbers strictly bewtween 0 and 1. Each value corresponds to a threshold used in the Cattell's Scree-Test. |
folds |
If you have the list with the folds supply it here. You can also leave it NULL and it will create folds. |
stratified |
Do you want the folds to be created in a stratified way? The default value is TRUE. |
nfolds |
The number of folds in the cross validation. |
seed |
You can specify your own seed number here or leave it NULL. |
Details
K-fold cross-validation for the high dimensional discriminant analysis with compositional data using the \alpha
-transformation is performed.
Value
A list including:
kl |
A matrix with the configurations of hyper-parameters tested and the estimated Kullback-Leibler divergence, for each configuration. |
js |
A matrix with the configurations of hyper-parameters tested and the estimated Jensen-Shannon divergence, for each configuration. |
Author(s)
Michail Tsagris.
R implementation and documentation: Michail Tsagris mtsagris@uoc.gr.
References
Bouveyron C. Girard S. and Schmid C. (2007). High Dimensional Discriminant Analysis. Communications in Statistics: Theory and Methods, 36(14): 2607–2623.
Bouveyron C. Celeux G. and Girard S. (2010). Intrinsic dimension estimation by maximum likelihood in probabilistic PCA. Technical Report 440372, Universite Paris 1 Pantheon-Sorbonne.
Berge L. Bouveyron C. and Girard S. (2012). HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data. Journal of Statistical Software, 46(6).
Tsagris M.T., Preston S. and Wood A.T.A. (2011). A data-based power transformation for compositional data. In Proceedings of the 4th Compositional Data Analysis Workshop, Girona, Spain. https://arxiv.org/pdf/1106.1451.pdf
See Also
Examples
x <- matrix( rgamma(100 * 200, runif(200, 4, 10), 1), ncol = 200, byrow = TRUE )
x <- x / rowSums(x) ## Dirichlet simulated values
ina <- rbinom(100, 1, 0.5)
mod <- cv.alfahdda(ina, x, a = c(0.1, 0.5, 1), d_select = "both",
threshold = seq(0.1, 0.5, by = 0.1), nfolds = 5)