Type: | Package |
Title: | K-Medians |
Version: | 2.2.0 |
Description: | Online, Semi-online, and Offline K-medians algorithms are given. For both methods, the algorithms can be initialized randomly or with the help of a robust hierarchical clustering. The number of clusters can be selected with the help of a penalized criterion. We provide functions to provide robust clustering. Function gen_K() enables to generate a sample of data following a contaminated Gaussian mixture. Functions Kmedians() and Kmeans() consists in a K-median and a K-means algorithms while Kplot() enables to produce graph for both methods. Cardot, H., Cenac, P. and Zitt, P-A. (2013). "Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm". Bernoulli, 19, 18-43. <doi:10.3150/11-BEJ390>. Cardot, H. and Godichon-Baggioni, A. (2017). "Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis". Test, 26(3), 461-480 <doi:10.1007/s11749-016-0519-x>. Godichon-Baggioni, A. and Surendran, S. "A penalized criterion for selecting the number of clusters for K-medians" <doi:10.48550/arXiv.2209.03597> Vardi, Y. and Zhang, C.-H. (2000). "The multivariate L1-median and associated data depth". Proc. Natl. Acad. Sci. USA, 97(4):1423-1426. <doi:10.1073/pnas.97.4.1423>. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Encoding: | UTF-8 |
Imports: | foreach, doParallel,parallel, genieclust, Gmedian,mvtnorm, capushe, ggplot2, reshape2 |
RoxygenNote: | 7.1.2 |
NeedsCompilation: | no |
Packaged: | 2023-12-18 13:26:49 UTC; Godichon-Baggioni |
Author: | Antoine Godichon-Baggioni [aut, cre, cph], Sobihan Surendran [aut] |
Maintainer: | Antoine Godichon-Baggioni <antoine.godichon_baggioni@upmc.fr> |
Repository: | CRAN |
Date/Publication: | 2023-12-18 13:40:05 UTC |
K-Medians
Description
We provide functions to provide robust clustering. Function gen_K
enables to generate a sample of data following a contaminated Gaussian mixture. Functions Kmedians
and Kmeans
consists in a K-median and a K-means algorithms while Kplot
enables to produce graph for both methods.
Author(s)
Antoine Godichon-Baggioni [aut, cre, cph], Sobihan Surendran [aut]
Maintainer: Antoine Godichon-Baggioni <antoine.godichon_baggioni@upmc.fr>
References
Cardot, H., Cenac, P. and Zitt, P-A. (2013). Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli, 19, 18-43.
Cardot, H. and Godichon-Baggioni, A. (2017). Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis. Test, 26(3), 461-480
Godichon-Baggioni, A. and Surendran, S. A penalized criterion for selecting the number of clusters for K-medians. arxiv.org/abs/2209.03597
Vardi, Y. and Zhang, C.-H. (2000). The multivariate L1-median and associated data depth. Proc. Natl. Acad. Sci. USA, 97(4):1423-1426.
Kmeans
Description
A K-means algorithm.
Usage
Kmeans(X,nclust=1:15,ninit=1,niter=20,par=TRUE)
Arguments
X |
A numerical matrix giving the data. |
nclust |
A vector of positive integers giving the possible numbers of clusters. Default is |
ninit |
A non negative integer giving the number of random initializations. Default is |
niter |
A positive integer giving the number of iterations for the EM algorirthms. Default is |
par |
A logical argument telling if the parallelization of the algorithm is allowed. Default is |
Value
A list with:
bestresults |
A list giving all the results for the clustering selected by |
allresults |
A list containing all the results. |
SE |
A vector giving the Sum of Errors for each considered number of clusters. |
cap |
The results given by the function |
Ksel |
An integer giving the number of clusters selected by |
data |
A numerical matrix giving the data. |
nclust |
A vector of positive integers giving the considered numbers of clusters. |
For the lists bestresult
and allresults
:
cluster |
A vector of positive integers giving the clustering. |
centers |
A numerical matrix giving the centers of the clusteres. |
SE |
An integer giving the Sum of Errors. |
See Also
See also Kmedians
, Kplot
and gen_K
.
Examples
## Not run:
n <- 500
K <- 3
pcont <- 0.2
ech <- gen_K(n=n,K=K,pcont=pcont)
X <-ech$X
res <- Kmeans(X,par=FALSE)
Kplot(res)
## End(Not run)
Kmedians
Description
K-medians algorithms.
Usage
Kmedians(X,nclust=1:15,ninit=0,niter=20,
method='Offline', init=TRUE,par=TRUE)
Arguments
X |
A numerical matrix giving the data. |
nclust |
A vector of positive integers giving the possible numbers of clusters. Default is |
ninit |
A non negative integer giving the number of random initializations. Default is |
niter |
A positive integer giving the number of iterations for the EM algorirthms. Default is |
method |
The selected method for the K-medians algorithm. Can be |
init |
A logical argument telling if the function |
par |
A logical argument telling if the parallelization of the algorithm is allowed. Default is |
Value
A list with:
bestresults |
A list giving all the results for the clustering selected by |
allresults |
A list containing all the results. |
SE |
A vector giving the Sum of Errors for each considered number of clusters. |
cap |
The results given by the function |
Ksel |
An integer giving the number of clusters selected by |
data |
A numerical matrix giving the data. |
nclust |
A vector of positive integers giving the considered numbers of clusters. |
For the lists bestresult
and allresults
:
cluster |
A vector of positive integers giving the clustering. |
centers |
A numerical matrix giving the centers of the clusteres. |
SE |
An integer giving the Sum of Errors. |
References
Godichon-Baggioni, A. and Surendran, S. A penalized criterion for selecting the number of clusters for K-medians. arxiv.org/abs/2209.03597
See Also
See also Kmeans
, Kplot
and gen_K
.
Examples
## Not run:
n <- 500
K <- 3
pcont <- 0.2
ech <- gen_K(n=n,K=K,pcont=pcont)
X <-ech$X
res <- Kmedians(X,par=FALSE)
Kplot(res)
## End(Not run)
Kplot
Description
A plot function for K-medians and K-means
Usage
Kplot(a,propplot=0.95,graph=c('Two_Dim','Capushe','Profiles','SE','Criterion'),
bestresult=TRUE,Ksel=FALSE,bycluster=TRUE)
Arguments
a |
|
propplot |
A scalar between |
graph |
A string specifying the type of graph requested.
Default is |
bestresult |
A logical indicating if the graphs must be done for the result chosen by the selected criterion. Default is |
Ksel |
A logical or positive integer giving the chosen number of clusters for each the graphs should be drawn. |
bycluster |
A logical indicating if the data selected for |
Value
No return value.
See Also
Examples
## Not run:
n <- 500
K <- 3
pcont <- 0.2
ech <- gen_K(n=n,K=K,pcont=pcont)
X <-ech$X
res <- Kmedians(X,par=FALSE)
Kplot(res)
## End(Not run)
gen_K
Description
Generate a sample of a Gaussian Mixture Model whose centers are generate randomly on a sphere of radius radius
.
Usage
gen_K(n=500,d=5,K=3,pcont=0,df=1,
cont="Student",min=-5,max=5,radius=5)
Arguments
n |
A positive integer giving the number of data per cluster. Default is |
d |
A positive integer giving the dimension. Default is |
K |
A positive integer giving the number of clusters. Default is |
pcont |
A scalar between |
df |
A positive integer giving the degrees of freedom of the law of the contaminated data if |
cont |
The law of the contaminated data. Can be |
min |
A scalar giving the lower bound of the uniform law if |
max |
A scalar giving the upper bound of the uniform law if |
radius |
The radius of the sphere on each the centers of the class are generated. Default is |
Value
A list with:
X |
A numerical matrix giving the generated data. |
cluster |
An character vector specifying the true classification. |
See Also
Examples
n <- 500
K <- 3
pcont <- 0.2
ech <- gen_K(n=n,K=K,pcont=pcont)
X=ech$X