Version: | 2.16.0 |
Title: | Finite Mixture Modeling, Clustering & Classification |
Description: | Random univariate and multivariate finite mixture model generation, estimation, clustering, latent class analysis and classification. Variables can be continuous, discrete, independent or dependent and may follow normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or circular von Mises parametric families. |
Depends: | R (≥ 3.1.0) |
Imports: | methods, stats, utils, graphics, grDevices |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Author: | Marko Nagode |
Maintainer: | Marko Nagode <marko.nagode@fs.uni-lj.si> |
NeedsCompilation: | yes |
Packaged: | 2024-07-10 12:35:02 UTC; PCNagodeM |
Repository: | CRAN |
Date/Publication: | 2024-07-10 14:00:02 UTC |
Akaike Information Criterion
Description
Returns the Akaike information criterion at pos
.
Usage
## S4 method for signature 'REBMIX'
AIC(x = NULL, pos = 1, ...)
## S4 method for signature 'REBMIX'
AIC3(x = NULL, pos = 1, ...)
## S4 method for signature 'REBMIX'
AIC4(x = NULL, pos = 1, ...)
## S4 method for signature 'REBMIX'
AICc(x = NULL, pos = 1, ...)
## S4 method for signature 'REBMIX'
CAIC(x = NULL, pos = 1, ...)
## ... and for other signatures
Arguments
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
Methods
signature(x = "REBMIX")
an object of class
REBMIX
.signature(x = "REBMVNORM")
an object of class
REBMVNORM
.
Author(s)
Marko Nagode
References
H. Akaike. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(51):716-723, 1974.
A. F. M. Smith and D. J. Spiegelhalter. Bayes factors and choice criteria for linear
models. Journal of the Royal Statistical Society. Series B, 42(2):213-220, 1980. https://www.jstor.org/stable/2984964.
H. Bozdogan. Model selection and akaike's information criterion (aic): The general theory and its
analytical extensions. Psychometrika, 52(3):345-370, 1987. doi:10.1007/BF02294361.
C. M. Hurvich and C.-L. Tsai. Regression and time series model selection in small samples. Biometrika,
76(2):297-307, 1989. https://www.jstor.org/stable/2336663.
Approximate Weight of Evidence Criterion
Description
Returns the approximate weight of evidence criterion at pos
.
Usage
## S4 method for signature 'REBMIX'
AWE(x = NULL, pos = 1, ...)
## ... and for other signatures
Arguments
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
Methods
signature(x = "REBMIX")
an object of class
REBMIX
.signature(x = "REBMVNORM")
an object of class
REBMVNORM
.
Author(s)
Marko Nagode
References
J. D. Banfield and A. E. Raftery. Model-based gaussian and non-gaussian clustering. Biometrics, 49(3):803-821, 1993. doi:10.2307/2532201.
Predicts Class Membership Based Upon the Best First Search Algorithm
Description
Returns as default the optimized RCLSMIX algorithm output for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities. If model
equals "RCLSMVNORM"
optimized output for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices is returned.
Usage
## S4 method for signature 'RCLSMIX'
BFSMIX(model = "RCLSMIX", x = list(), Dataset = data.frame(),
Zt = factor(), ...)
## ... and for other signatures
Arguments
model |
see Methods section below. |
x |
a list of objects of class |
Dataset |
a data frame containing test dataset |
Zt |
a factor of true class membership |
... |
currently not used. |
Value
Returns an optimized object of class RCLSMIX
or RCLSMVNORM
.
Methods
signature(model = "RCLSMIX")
a character giving the default class name
"RCLSMIX"
for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities.signature(model = "RCLSMVNORM")
a character giving the class name
"RCLSMVNORM"
for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices.
Author(s)
Marko Nagode
References
R. Kohavi and G. H. John. Wrappers for feature subset selection, Artificial Intelligence, 97(1-2):273-324, 1997. doi:10.1016/S0004-3702(97)00043-X.
Bayesian Information Criterion
Description
Returns the Bayesian information criterion at pos
.
Usage
## S4 method for signature 'REBMIX'
BIC(x = NULL, pos = 1, ...)
## ... and for other signatures
Arguments
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
Methods
signature(x = "REBMIX")
an object of class
REBMIX
.signature(x = "REBMVNORM")
an object of class
REBMVNORM
.
Author(s)
Marko Nagode
References
G. Schwarz. Estimating the dimension of the model. The Annals of Statistics, 6(2):461-464, 1978.
Classification Likelihood Criterion
Description
Returns the classification likelihood criterion at pos
.
Usage
## S4 method for signature 'REBMIX'
CLC(x = NULL, pos = 1, ...)
## ... and for other signatures
Arguments
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
Methods
signature(x = "REBMIX")
an object of class
REBMIX
.signature(x = "REBMVNORM")
an object of class
REBMVNORM
.
Author(s)
Marko Nagode
References
C. Biernacki and G. Govaert. Using the classification likelihood to choose the number of clusters. In E. J. Wegman and S. P. Azen, editors, Computing Science and Statistics, 1997.
Class "EM.Control"
Description
Object of class EM.Control
.
Objects from the Class
Objects can be created by calls of the form new("EM.Control", ...)
. Accessor methods for the slots are a.strategy(x = NULL)
,
a.variant(x = NULL)
, a.acceleration(x = NULL)
, a.tolerance(x = NULL)
, a.acceleration.multiplier(x = NULL)
,
a.maximum.iterations(x = NULL)
, a.K(x = NULL)
and a. eliminate.zero.components (x = NULL)
, where x
stands for an object of
class EM.Control
. Setter methods a.strategy(x = NULL)
, a.variant(x = NULL)
,
a.acceleration(x = NULL)
, a.tolerance(x = NULL)
, a.acceleration.multiplier(x = NULL)
, a.maximum.iterations(x = NULL)
,
a.K(x = NULL)
and eliminate.zero.components
are provided to write to strategy
, variant
, acceleration
, tolerance
,
acceleration.multiplier
, maximum.iterations
and eliminate.zero.components
slot respectively.
Slots
strategy
:-
a character containing the EM and REBMIX strategy. One of
"none"
,"exhaustive"
,"best"
and"single"
. The default value is"none"
. variant
:-
a character containing the type of the EM algorithm to be used. One of
"EM"
of"ECM"
. The default value is"EM"
. acceleration
:-
a character containing the type of acceleration of the EM iteration increment. One of
"fixed"
,"line"
or"golden"
. The default value is"fixed"
. tolerance
:-
tolerance value for the EM convergence criteria. The default value is 1.0E-4.
acceleration.multiplier
:-
acceleration.multiplier
a_{\mathrm{EM}}
,1.0 \leq a_{\mathrm{EM}} \leq 2.0
. acceleration.multiplier for the EM step increment. The default value is 1.0. maximum.iterations
:-
a positive integer containing the maximum allowed number of iterations of the EM algorithm. The default value is 1000.
K
:-
an integer containing the number of bins for the histogram based EM algorithm. This option can reduce computational time drastically if the datasets contain a large number of observations
n
andK
is set to the value\ll n
. The default value of 0 means that the EM algorithm runs over alln
. eliminate.zero.components
:-
a logical indicating if the componenets with
w_{l} = 0
should be eliminated from output. Only used withEMMIX-methods
.
Author(s)
Branislav Panic
References
B. Panic, J. Klemenc, M. Nagode. Improved initialization of the EM algorithm for mixture model parameter estimation.
Mathematics, 8(3):373, 2020.
doi:10.3390/math8030373.
A. P. Dempster et al. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B, 39(1):1-38, 1977.
https://www.jstor.org/stable/2984875.
G. Celeux and G. Govaert. A classification EM algorithm for clustering and two stochastic versions, Computational Statistics & Data Analysis, 14(3):315:332, 1992.
doi:10.1016/0167-9473(92)90042-E.
Examples
# Inline creation by new call.
EM <- new("EM.Control", strategy = "exhaustive",
variant = "EM", acceleration = "fixed",
tolerance = 1e-4, acceleration.multiplier = 1.0,
maximum.iterations = 1000, K = 0)
EM
# Creation of EM object with setter method.
EM <- new("EM.Control")
a.strategy(EM) <- "exhaustive"
a.variant(EM) <- "EM"
a.acceleration(EM) <- "fixed"
a.tolerance(EM) <- 1e-4
a.acceleration.multiplier(EM) <- 1.0
a.maximum.iterations(EM) <- 1000
a.K(EM) <- 256
EM
EM Algorithm for Univariate or Multivariate Finite Mixture Estimation
Description
Returns as default the EM algorithm output for mixtures of conditionally independent normal, lognormal, Weibull, gamma,
Gumbel, binomial, Poisson, Dirac or von Mises component densities. If model
equals "REBMVNORM"
output
for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices is returned.
Usage
## S4 method for signature 'REBMIX'
EMMIX(model = "REBMIX", Dataset = list(),
Theta = NULL, EMcontrol = NULL, ...)
## ... and for other signatures
Arguments
model |
see Methods section below. |
Dataset |
a list of length |
Theta |
an object of class |
EMcontrol |
an object of class |
... |
currently not used. |
Value
Returns an object of class REBMIX
or REBMVNORM
.
Methods
signature(model = "REBMIX")
a character giving the default class name
"REBMIX"
for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac or von Mises component densities.signature(model = "REBMVNORM")
a character giving the class name
"REBMVNORM"
for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices.
Author(s)
Branislav Panic
References
B. Panic, J. Klemenc, M. Nagode. Improved initialization of the EM algorithm for mixture model parameter estimation. Mathematics, 8(3):373, 2020. doi:10.3390/math8030373.
Examples
## Not run:
devAskNewPage(ask = TRUE)
# Load faithful dataset.
data(faithful)
# Plot faithfull dataset.
plot(faithful)
# Number of dimensions.
d <- ncol(faithful)
# Obtain 2 component solution with Gaussian mixtures.
c <- 2
# Create EMMVNORM.Theta object with new call.
Theta <- new("EMMVNORM.Theta", d = d, c = c)
# Set parameters of Theta.
# Weights.
a.w(Theta) <- c(0.5, 0.5)
# Means.
a.theta1.all(Theta) <- c(2.0, 55.0, 4.5, 80.0)
# Covariances.
a.theta2.all(Theta) <- c(1, 0, 0, 1, 1, 0, 0, 1)
# Run EMMIX method.
model <- EMMIX(model = "REBMVNORM", Dataset = list(faithful), Theta = Theta)
# show.
model
# summary.
summary(model)
# plot.
plot(model, nrow = 3, ncol = 2, what = c("pdf", "marginal pdf", "marginal cdf"))
# Create EMMIX.Theta object with new call.
Theta <- new("EMMIX.Theta", c = c, pdf = c("normal", "normal"))
# Set parameters of Theta.
# Weights.
a.w(Theta) <- c(0.5, 0.5)
# Means.
a.theta1.all(Theta) <- c(2.0, 55.0, 4.5, 80.0)
# Covariances.
a.theta2.all(Theta) <- c(1, 1, 1, 1)
# Run EMMIX method.
model <- EMMIX(Dataset = list(faithful), Theta = Theta)
# show.
model
# summary.
summary(model)
# plot.
plot(model, nrow = 3, ncol = 2, what = c("pdf", "marginal pdf", "marginal cdf"))
## End(Not run)
Class "EMMIX.Theta"
Description
Object of class EMMIX.Theta
.
Objects from the Class
Objects can be created by calls of the form new("EMMIX.Theta", ...)
. Accessor methods for the slots are a.c(x = NULL)
, a.d(x = NULL)
,
a.pdf(x = NULL)
and a.Theta(x = NULL)
, where x
stands for an object of class EMMIX.Theta
. Setter methods
a.theta1(x = NULL, l = numeric())
, a.theta2(x = NULL, l = numeric())
, a.theta3(x = NULL, l = numeric())
,
a.theta1.all(x = NULL)
, a.theta2.all(x = NULL)
, a.theta3.all(x = NULL)
and a.w(x = NULL)
are provided to write to Theta
slot, where l = 1, \ldots, c
.
Slots
c
:-
number of components
c > 0
. The default value is1
. d
:-
number of dimensions.
pdf
:-
a character vector of length
d
containing continuous or discrete parametric family types. One of"normal"
,"lognormal"
,"Weibull"
,"gamma"
,"Gumbel"
,"binomial"
,"Poisson"
,"Dirac"
or"vonMises"
. Theta
:-
a list containing
c
parametric family typespdfl
. One of"normal"
,"lognormal"
,"Weibull"
,"gamma"
,"Gumbel"
,"binomial"
,"Poisson"
,"Dirac"
or circular"vonMises"
defined for0 \leq y_{i} \leq 2 \pi
. Component parameterstheta1.l
follow the parametric family types. One of\mu_{il}
for normal, lognormal, Gumbel and von Mises distributions and\theta_{il}
for Weibull, gamma, binomial, Poisson and Dirac distributions. Component parameterstheta2.l
followtheta1.l
. One of\sigma_{il}
for normal, lognormal and Gumbel distributions,\beta_{il}
for Weibull and gamma distributions,p_{il}
for binomial distribution,\kappa_{il}
for von Mises distribution. Component parameterstheta3.l
followtheta2.l
. One of\xi_{il} \in \{-1, 1\}
for Gumbel distribution. w
:-
a vector of length
c
containing component weightsw_{l}
summing to 1.
Author(s)
Branislav Panic
Examples
Theta <- new("EMMIX.Theta", c = 2, pdf = c("normal", "Gumbel"))
a.w(Theta) <- c(0.4, 0.6)
a.theta1(Theta, l = 1) <- c(2, 10)
a.theta2(Theta, l = 1) <- c(0.5, 2.3)
a.theta3(Theta, l = 1) <- c(NA, 1.0)
a.theta1(Theta, l = 2) <- c(20, 50)
a.theta2(Theta, l = 2) <- c(3, 4.2)
a.theta3(Theta, l = 2) <- c(NA, -1.0)
Theta
Theta <- new("EMMIX.Theta", c = 2, pdf = c("normal", "Gumbel", "Poisson"))
a.w(Theta) <- c(0.4, 0.6)
a.theta1.all(Theta) <- c(2, 10, 30, 20, 50, 60)
a.theta2.all(Theta) <- c(0.5, 2.3, NA, 3, 4.2, NA)
a.theta3.all(Theta) <- c(NA, 1.0, NA, NA, -1.0, NA)
Theta
Theta <- new("EMMVNORM.Theta", c = 2, d = 3)
a.w(Theta) <- c(0.4, 0.6)
a.theta1(Theta, l = 1) <- c(2, 10, -20)
a.theta2(Theta, l = 1) <- c(9, 0, 0, 0, 4, 0, 0, 0, 1)
a.theta1(Theta, l = 2) <- c(-2.4, -15.1, 30)
a.theta2(Theta, l = 2) <- c(4, -3.2, -0.2, -3.2, 4, 0, -0.2, 0, 1)
Theta
Theta <- new("EMMVNORM.Theta", c = 2, d = 3)
a.w(Theta) <- c(0.4, 0.6)
a.theta1.all(Theta) <- c(2, 10, -20, -2.4, -15.1, 30)
a.theta2.all(Theta) <- c(9, 0, 0, 0, 4, 0, 0, 0, 1,
4, -3.2, -0.2, -3.2, 4, 0, -0.2, 0, 1)
Theta
Hannan-Quinn Information Criterion
Description
Returns the Hannan-Quinn information criterion at pos
.
Usage
## S4 method for signature 'REBMIX'
HQC(x = NULL, pos = 1, ...)
## ... and for other signatures
Arguments
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
Methods
signature(x = "REBMIX")
an object of class
REBMIX
.signature(x = "REBMVNORM")
an object of class
REBMVNORM
.
Author(s)
Marko Nagode
References
E. J. Hannan and B. G. Quinn. The determination of the order of an autoregression. Journal of the Royal Statistical Society. Series B, 41(2):190-195, 1979. https://www.jstor.org/stable/2985032.
Class "Histogram"
Description
Object of class Histogram
.
Objects from the Class
Objects can be created by calls of the form new("Histogram", ...)
. Accessor methods for the slots are a.Y(x = NULL)
,
a.K(x = NULL)
, a.ymin(x = NULL)
, a.ymax(x = NULL)
, a.y0(x = NULL)
, a.h(x = NULL)
, a.n(x = NULL)
and a.ns(x = NULL)
.
Slots
Y
:-
a data frame of size
v \times (d + 1)
containing d-dimensional histogram. Each of the firstd
columns represents one random variable and contains bin means\bar{\bm{y}}_{1}, \ldots, \bar{\bm{y}}_{v}
. Columnd + 1
contains frequenciesk_{1}, \ldots, k_{v}
. K
:-
an integer or a vector of length
d
containing numbers of binsv
. ymin
:-
a vector of length
d
containing minimum observations. ymax
:-
a vector of length
d
containing maximum observations. y0
:-
a vector of length
d
containing origins. h
:-
a vector of length
d
containing bin widths. n
:-
an integer containing total number
n
of observations. ns
:-
an integer containing number
n_{\mathrm{s}}
of samples.
Author(s)
Marko Nagode
Examples
Y <- as.data.frame(matrix(1.0, nrow = 8, ncol = 3))
hist <- new("Histogram", Y = Y, K = c(4, 2), ymin = c(2, 1), ymax = c(10, 8))
a.Y(hist)
a.K(hist)
a.ymin(hist)
a.ymax(hist)
a.y0(hist)
a.h(hist)
a.n(hist)
a.ns(hist)
# Multiplay Y[ , d + 1] by 0.1.
a.Y(hist) <- 0.1
Integrated Classification Likelihood Criterion
Description
Returns the integrated classification likelihood criterion at pos
.
Usage
## S4 method for signature 'REBMIX'
ICL(x = NULL, pos = 1, ...)
## ... and for other signatures
Arguments
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
Methods
signature(x = "REBMIX")
an object of class
REBMIX
.signature(x = "REBMVNORM")
an object of class
REBMVNORM
.
Author(s)
Marko Nagode
References
C. Biernacki, G. Celeux and G. Govaert. Assessing a mixture model for clustering with the integrated classification likelihood. Technical Report 3521, INRIA, Rhone-Alpes, 1998.
Approximate Integrated Classification Likelihood Criterion
Description
Returns the approximate integrated classification likelihood criterion at pos
.
Usage
## S4 method for signature 'REBMIX'
ICLBIC(x = NULL, pos = 1, ...)
## ... and for other signatures
Arguments
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
Methods
signature(x = "REBMIX")
an object of class
REBMIX
.signature(x = "REBMVNORM")
an object of class
REBMVNORM
.
Author(s)
Marko Nagode
References
C. Biernacki, G. Celeux and G. Govaert. Assessing a mixture model for clustering with the integrated classification likelihood. Technical Report 3521, INRIA, Rhone-Alpes, 1998.
Minimum Description Length
Description
Returns the minimum desription length at pos
.
Usage
## S4 method for signature 'REBMIX'
MDL2(x = NULL, pos = 1, ...)
## S4 method for signature 'REBMIX'
MDL5(x = NULL, pos = 1, ...)
## ... and for other signatures
Arguments
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
Methods
signature(x = "REBMIX")
an object of class
REBMIX
.signature(x = "REBMVNORM")
an object of class
REBMVNORM
.
Author(s)
Marko Nagode
References
M. H. Hansen and B. Yu. Model selection and the principle of minimum description length. Journal of the American Statistical Association, 96(454):746-774, 2001. https://www.jstor.org/stable/2670311.
Partition Coefficient
Description
Returns the partition coefficient of Bezdek at pos
.
Usage
## S4 method for signature 'REBMIX'
PC(x = NULL, pos = 1, ...)
## ... and for other signatures
Arguments
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
Methods
signature(x = "REBMIX")
an object of class
REBMIX
.signature(x = "REBMVNORM")
an object of class
REBMVNORM
.
Author(s)
Marko Nagode
References
G. McLachlan and D. Peel. Finite Mixture Models. John Wiley & Sons, New York, 2000.
Total of Positive Relative Deviations
Description
Returns the total of positive relative deviations D
at pos
.
Usage
## S4 method for signature 'REBMIX'
PRD(x = NULL, pos = 1, ...)
## ... and for other signatures
Arguments
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
Methods
signature(x = "REBMIX")
an object of class
REBMIX
.signature(x = "REBMVNORM")
an object of class
REBMVNORM
.
Author(s)
Marko Nagode
References
M. Nagode and M. Fajdiga. The rebmix algorithm for the univariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(5):876-892, 2011a. doi:10.1080/03610920903480890.
M. Nagode and M. Fajdiga. The rebmix algorithm for the multivariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(11):2022-2034, 2011b. doi:10.1080/03610921003725788.
M. Nagode. Finite mixture modeling via REBMIX.
Journal of Algorithms and Optimization, 3(2):14-28, 2015. https://repozitorij.uni-lj.si/Dokument.php?id=127674&lang=eng.
Class "RCLRMIX"
Description
Object of class RCLRMIX
.
Objects from the Class
Objects can be created by calls of the form new("RCLRMIX", ...)
.
Accessor methods for the slots are a.Dataset(x = NULL)
, a.pos(x = NULL)
, a.Zt(x = NULL)
,
a.Zp(x = NULL, s = expression(c))
, a.c(x = NULL)
,
a.p(x = NULL, s = expression(c))
, a.pi(x = NULL, s = expression(c))
,
a.P(x = NULL, s = expression(c))
, a.tau(x = NULL, s = expression(c))
,
a.prob(x = NULL)
, a.Rule(x = NULL)
, a.from(x = NULL)
, a.to(x = NULL)
,
a.EN(x = NULL)
and a.ED(x = NULL)
, where x
stands for an object of class RCLRMIX
and s
a desired number of clusters for which the slot is calculated.
Slots
x
:-
an object of class
REBMIX
. Dataset
:-
a data frame or an object of class
Histogram
to be clustered. pos
:-
a desired row number in
x@summary
for which the clustering is performed. The default value is1
. Zt
:-
a factor of true cluster membership.
Zp
:-
a factor of predictive cluster membership.
c
:-
number of nonempty clusters.
p
:-
a vector of length
c
containing prior probabilities of cluster membershipsp_{l}
summing to 1. The value is returned only if all variables in slotx
follow either binomial or Dirac parametric families. The default value isnumeric()
. pi
:-
a list of length
d
of matrices of sizec \times K_{i}
containing cluster conditional probabilities\pi_{ilk}
. Let\pi_{ilk}
denote the cluster conditional probability that an observation in clusterl = 1, \ldots, c
produces thek
th outcome on thei
th variable. Suppose we observei = 1, \ldots, d
polytomous categorical variables (the manifest variables), each of which containsK_{i}
possible outcomes for observationsj = 1, \ldots, n
. A manifest variable is a variable that can be measured or observed directly. It must be coded as whole number starting at zero for the first outcome and increasing to the possible number of outcomes minus one. It is presumed here that all variables are statistically independentand within clusters and that\bm{y}_{1}, \ldots, \bm{y}_{n}
stands for an observedd
dimensional dataset of sizen
of vector observations\bm{y}_{j} = (y_{1j}, \ldots, y_{ij}, \ldots, y_{dj})^\top
. The value is returned only if all variables in slotx
follow either binomial or Dirac parametric families. The default value islist()
. P
:-
a data frame containing true
N_{\mathrm{t}}(\bm{y}_{\tilde{\jmath}})
and predictiveN_{\mathrm{p}}(\bm{y}_{\tilde{\jmath}})
frequencies calculated for unique\bm{y}_{\tilde{\jmath}} \in \{ \bm{y}_{1}, \ldots, \bm{y}_{n} \}
, where\tilde{\jmath} = 1, \ldots, \tilde{n}
and\tilde{n} \leq n
. tau
:-
a matrix of size
n \times c
containing conditional probabilities\tau_{jl}
that observations\bm{y}_{1}, \ldots, \bm{y}_{n}
arise from clusters1, \ldots, c
. prob
:-
a vector of length
c
containing probabilities of correct clustering fors = 1, \ldots, c
. Rule
:-
a character containing the merging rule. One of
"Entropy"
and"Demp"
. The default value is"Entropy"
. from
:-
a vector of length
c - 1
containing clusters merged toto
clusters. to
:-
a vector of length
c - 1
containing clusters originating fromfrom
clusters. EN
:-
a vector of length
c - 1
containing entropies for combined clusters. ED
:-
a vector of length
c - 1
containing decrease of entropies for combined clusters. A
:-
an adjacency matrix of size
c_{\mathrm{max}} \times c_{\mathrm{max}}
, wherec_{\mathrm{max}} \geq c
.
Author(s)
Marko Nagode, Branislav Panic
References
J. P. Baudry, A. E. Raftery, G. Celeux, K. Lo and R. Gottardo. Combining mixture components for clustering.
Journal of Computational and Graphical Statistics, 19(2):332-353, 2010. doi:10.1198/jcgs.2010.08111
S. Kyoya and K. Yamanishi. Summarizing finite mixture model with overlapping quantification. Entropy, 23(11):1503, 2021. doi:10.3390/e23111503
Examples
devAskNewPage(ask = TRUE)
# Generate normal dataset.
n <- c(500, 200, 400)
Theta <- new("RNGMVNORM.Theta", c = 3, d = 2)
a.theta1(Theta, 1) <- c(3, 10)
a.theta1(Theta, 2) <- c(8, 6)
a.theta1(Theta, 3) <- c(12, 11)
a.theta2(Theta, 1) <- c(3, 0.3, 0.3, 2)
a.theta2(Theta, 2) <- c(5.7, -2.3, -2.3, 3.5)
a.theta2(Theta, 3) <- c(2, 1, 1, 2)
normal <- RNGMIX(model = "RNGMVNORM", Dataset.name = "normal_1", n = n, Theta = a.Theta(Theta))
# Estimate number of components, component weights and component parameters.
normalest <- REBMIX(model = "REBMVNORM",
Dataset = a.Dataset(normal),
Preprocessing = "histogram",
cmax = 6,
Criterion = "BIC")
summary(normalest)
# Plot finite mixture.
plot(normalest)
# Cluster dataset.
normalclu <- RCLRMIX(model = "RCLRMVNORM", x = normalest, Zt = a.Zt(normal))
# Plot clusters.
plot(normalclu)
summary(normalclu)
Predicts Cluster Membership Based Upon a Model Trained by REBMIX
Description
Returns as default the RCLRMIX algorithm output for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities, following the methodology proposed in the article cited in the references. If model
equals "RCLRMVNORM"
output for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices is returned.
Usage
## S4 method for signature 'RCLRMIX'
RCLRMIX(model = "RCLRMIX", x = NULL, Dataset = NULL,
pos = 1, Zt = factor(), Rule = character(), ...)
## ... and for other signatures
## S4 method for signature 'RCLRMIX'
summary(object, ...)
## ... and for other signatures
Arguments
model |
see Methods section below. |
x |
an object of class |
Dataset |
a data frame or an object of class |
pos |
a desired row number in |
Zt |
a factor of true cluster membership. The default value is |
Rule |
a character containing the merging rule. One of |
object |
see Methods section below. |
... |
currently not used. |
Value
Returns an object of class RCLRMIX
or RCLRMVNORM
.
Methods
signature(model = "RCLRMIX")
a character giving the default class name
"RCLRMIX"
for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities.signature(model = "RCLRMVNORM")
a character giving the class name
"RCLRMVNORM"
for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices.signature(object = "RCLRMIX")
an object of class
RCLRMIX
.signature(object = "RCLRMVNORM")
an object of class
RCLRMVNORM
.
Author(s)
Marko Nagode
References
J. P. Baudry, A. E. Raftery, G. Celeux, K. Lo and R. Gottardo. Combining mixture components for clustering. Journal of Computational and Graphical Statistics, 19(2):332-353, 2010. doi:10.1198/jcgs.2010.08111
Examples
devAskNewPage(ask = TRUE)
# Generate Poisson dataset.
n <- c(500, 200, 400)
Theta <- new("RNGMIX.Theta", c = 3, pdf = "Poisson")
a.theta1(Theta) <- c(3, 12, 36)
poisson <- RNGMIX(Dataset.name = "Poisson_1", n = n, Theta = a.Theta(Theta))
# Estimate number of components, component weights and component parameters.
EM <- new("EM.Control", strategy = "exhaustive")
poissonest <- REBMIX(Dataset = a.Dataset(poisson),
Preprocessing = "histogram",
cmax = 6,
Criterion = "BIC",
pdf = rep("Poisson", 1),
EMcontrol = EM)
summary(poissonest)
# Plot finite mixture.
plot(poissonest)
# Cluster dataset.
poissonclu <- RCLRMIX(x = poissonest, Zt = a.Zt(poisson))
summary(poissonclu)
# Plot clusters.
plot(poissonclu)
# Create new dataset.
Dataset <- sample.int(n = 50, size = 10, replace = TRUE)
Dataset <- as.data.frame(Dataset)
# Cluster the dataset.
poissonclu <- RCLRMIX(x = poissonest, Dataset = Dataset, Rule = "Demp")
a.Dataset(poissonclu)
Class "RCLS.chunk"
Description
Object of class RCLS.chunk
.
Objects from the Class
Objects can be created by calls of the form new("RCLS.chunk", ...)
. Accessor methods for the slots are a.s(x = NULL)
,
a.levels(x = NULL)
, a.ntrain(x = NULL)
, a.train(x = NULL)
, a.Zr(x = NULL)
, a.ntest(x = NULL)
, a.test(x = NULL)
and a.Zt(x = NULL)
,
where x
stands for an object of class RCLS.chunk
.
Slots
s
:-
finite set of size
s
of classes\bm{\Omega} = \{\bm{\Omega}_{g}; \ g = 1, \ldots, s\}
. levels
:-
a character vector of length
s
containing class names\bm{\Omega}_{g}
. ntrain
:-
a vector of length
s
containing numbers of observations in train datasetsY_{\mathrm{train}g}
. train
:-
a list of length
n_{\mathrm{D}}
of data frames containing train datasetsY_{\mathrm{train}g}
of lengthn_{\mathrm{train}g}
. Zr
:-
a list of factors of true class membership
\bm{\Omega}_{g}
for the train datasets. ntest
:-
number of observations in test dataset
Y_{\mathrm{test}}
. test
:-
a data frame containing test dataset
Y_{\mathrm{test}}
of lengthn_{\mathrm{test}}
. Zt
:-
a factor of true class membership
\bm{\Omega}_{g}
for the test dataset.
Author(s)
Marko Nagode
References
D. M. Dziuda. Data Mining for Genomics and Proteomics: Analysis of Gene and Protein Expression Data. John Wiley & Sons, New York, 2010.
Class "RCLSMIX"
Description
Object of class RCLSMIX
.
Objects from the Class
Objects can be created by calls of the form new("RCLSMIX", ...)
. Accessor methods for the slots are a.o(x = NULL)
,
a.Dataset(x = NULL)
, a.s(x = NULL)
, a.ntrain(x = NULL)
, a.P(x = NULL)
, a.ntest(x = NULL)
, a.Zt(x = NULL)
,
a.Zp(x = NULL)
, a.CM(x = NULL)
, a.Accuracy(x = NULL)
, a.Error(x = NULL)
, a.Precision(x = NULL)
, a.Sensitivity(x = NULL)
,
a.Specificity(x = NULL)
and a.Chunks(x = NULL)
, where x
stands for an object of class RCLSMIX
.
Slots
x
:-
a list of objects of class
REBMIX
of lengtho
obtained by runningREBMIX
ong = 1, \ldots, s
train datasetsY_{\mathrm{train}g}
all of lengthn_{\mathrm{train}g}
. For the train datasets the corresponding class membership\bm{\Omega}_{g}
is known. This yieldsn_{\mathrm{train}} = \sum_{g = 1}^{s} n_{\mathrm{train}g}
, whileY_{\mathrm{train}q} \cap Y_{\mathrm{train}g} = \emptyset
for allq \neq g
. Each object in the list corresponds to one chunk, e.g.,(y_{1j}, y_{3j})^{\top}
. o
:-
number of chunks
o
.Y = \{\bm{y}_{j}; \ j = 1, \ldots, n\}
is an observedd
-dimensional dataset of sizen
of vector observations\bm{y}_{j} = (y_{1j}, \ldots, y_{dj})^{\top}
and is partitioned into train and test datasets. Vector observations\bm{y}_{j}
may further be split intoo
chunks when runningREBMIX
, e.g., ford = 6
ando = 3
the set of chunks substituting\bm{y}_{j}
may be as follows(y_{1j}, y_{3j})^{\top}
,(y_{2j}, y_{4j}, y_{6j})^{\top}
andy_{5j}
. Dataset
:-
a data frame containing test dataset
Y_{\mathrm{test}}
of lengthn_{\mathrm{test}}
. For the test dataset the corresponding class membership\bm{\Omega}_{g}
is not known. s
:-
finite set of size
s
of classes\bm{\Omega} = \{\bm{\Omega}_{g}; \ g = 1, \ldots, s\}
. ntrain
:-
a vector of length
s
containing numbers of observations in train datasetsY_{\mathrm{train}g}
. P
:-
a vector of length
s
containing prior probabilitiesP(\bm{\Omega}_{g}) = \frac{n_{\mathrm{train}g}}{n_{\mathrm{train}}}
. ntest
:-
number of observations in test dataset
Y_{\mathrm{test}}
. Zt
:-
a factor of true class membership
\bm{\Omega}_{g}
for the test dataset. Zp
:-
a factor of predictive class membership
\bm{\Omega}_{g}
for the test dataset. CM
:-
a table containing confusion matrix for multiclass classifier. It contains number
x_{qg}
of test observations with the true classq
that are classified into the classg
, whereq, g = 1, \ldots, s
. Accuracy
:-
proportion of all test observations that are classified correctly.
\mathrm{Accuracy} = \frac{\sum_{g = 1}^{s} x_{gg}}{n_{\mathrm{test}}}
. Error
:-
proportion of all test observations that are classified wrongly.
\mathrm{Error} = 1 - \mathrm{Accuracy}
. Precision
:-
a vector containing proportions of predictive observations in class
g
that are classified correctly into classg
.\mathrm{Precision}(g) = \frac{x_{gg}}{\sum_{q = 1}^{s} x_{qg}}
. Sensitivity
:-
a vector containing proportions of test observations in class
g
that are classified correctly into classg
.\mathrm{Sensitivity}(g) = \frac{x_{gg}}{\sum_{q = 1}^{s} x_{gq}}
. Specificity
:-
a vector containing proportions of test observations that are not in class
g
and are classified into the nong
class.\mathrm{Specificity}(g) = \frac{n_{\mathrm{test}} - \sum_{q = 1}^{s} x_{qg}}{n_{\mathrm{test}} - \sum_{q = 1}^{s} x_{gq}}
. Chunks
:-
a vector containing selected chunks.
Author(s)
Marko Nagode
References
D. M. Dziuda. Data Mining for Genomics and Proteomics: Analysis of Gene and Protein Expression Data. John Wiley & Sons, New York, 2010.
Predicts Class Membership Based Upon a Model Trained by REBMIX
Description
Returns as default the RCLSMIX algorithm output for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities. If model
equals "RCLSMVNORM"
output for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices is returned.
Usage
## S4 method for signature 'RCLSMIX'
RCLSMIX(model = "RCLSMIX", x = list(), Dataset = data.frame(),
Zt = factor(), ...)
## ... and for other signatures
## S4 method for signature 'RCLSMIX'
summary(object, ...)
## ... and for other signatures
Arguments
model |
see Methods section below. |
x |
a list of objects of class |
Dataset |
a data frame containing test dataset |
Zt |
a factor of true class membership |
object |
see Methods section below. |
... |
currently not used. |
Value
Returns an object of class RCLSMIX
or RCLSMVNORM
.
Methods
signature(model = "RCLSMIX")
a character giving the default class name
"RCLSMIX"
for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities.signature(model = "RCLSMVNORM")
a character giving the class name
"RCLSMVNORM"
for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices.signature(object = "RCLSMIX")
an object of class
RCLSMIX
.signature(object = "RCLSMVNORM")
an object of class
RCLSMVNORM
.
Author(s)
Marko Nagode
References
R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. John Wiley & Sons, New York, 1973.
Examples
## Not run:
devAskNewPage(ask = TRUE)
data(adult)
# Find complete cases.
adult <- adult[complete.cases(adult),]
# Replace levels with numbers.
adult <- as.data.frame(data.matrix(adult))
# Find numbers of levels.
cmax <- unlist(lapply(apply(adult[, c(-1, -16)], 2, unique), length))
cmax
# Split adult dataset into train and test subsets for two Incomes
# and remove Type and Income columns.
Adult <- split(p = list(type = 1, train = 2, test = 1),
Dataset = adult, class = 16)
# Estimate number of components, component weights and component parameters
# for the set of chunks 1:14.
adultest <- list()
for (i in 1:14) {
adultest[[i]] <- REBMIX(Dataset = a.train(chunk(Adult, i)),
Preprocessing = "histogram",
cmax = min(120, cmax[i]),
Criterion = "BIC",
pdf = "Dirac",
K = 1)
}
# Class membership prediction based upon the best first search algorithm.
adultcla <- BFSMIX(x = adultest,
Dataset = a.test(Adult),
Zt = a.Zt(Adult))
adultcla
summary(adultcla)
# Plot selected chunks.
plot(adultcla, nrow = 5, ncol = 2)
## End(Not run)
Class "REBMIX"
Description
Object of class REBMIX
.
Objects from the Class
Objects can be created by calls of the form new("REBMIX", ...)
. Accessor methods for the slots are a.Dataset(x = NULL, pos = 0)
,
a.Preprocessing(x = NULL)
, a.cmax(x = NULL)
, a.cmin(x = NULL)
, a.Criterion(x = NULL)
, a.Variables(x = NULL)
,
a.pdf(x = NULL)
, a.theta1(x = NULL)
, a.theta2(x = NULL)
, a.theta3(x = NULL)
, a.K(x = NULL)
, a.ymin(x = NULL)
,
a.ymax(x = NULL)
, a.ar(x = NULL)
, a.Restraints(x = NULL)
, a.Mode(x = NULL)
, a.w(x = NULL, pos = 0)
, a.Theta(x = NULL, pos = 0)
, a.summary(x = NULL, col.name = character(), pos = 0)
,
a.summary.EM(x = NULL, col.name = character(), pos = 0)
, a.pos(x = NULL)
,
a.opt.c(x = NULL)
, a.opt.IC(x = NULL)
, a.opt.logL(x = NULL)
, a.opt.Dmin(x = NULL)
, a.opt.D(x = NULL)
, a.all.K(x = NULL)
, a.all.IC(x = NULL)
,
a.theta1.all(x = NULL, pos = 1)
, a.theta2.all(x = NULL, pos = 1)
and a.theta3.all(x = NULL, pos = 1)
, where x
, pos
and col.name
stand for an object of class REBMIX
,
a desired slot item and a desired column name, respectively.
Slots
Dataset
:-
a list of length
n_{\mathrm{D}}
of data frames or objects of classHistogram
. Data frames should have sizen \times d
containing d-dimensional datasets. Each of thed
columns represents one random variable. Numbers of observationsn
equal the number of rows in the datasets. Preprocessing
:-
a character vector giving the preprocessing types. One of
"histogram"
,
"kernel density estimation"
or"k-nearest neighbour"
. cmax
:-
maximum number of components
c_{\mathrm{max}} > 0
. The default value is15
. cmin
:-
minimum number of components
c_{\mathrm{min}} > 0
. The default value is1
. Criterion
:-
a character giving the information criterion type. One of default Akaike
"AIC"
,"AIC3"
,"AIC4"
or"AICc"
, Bayesian"BIC"
, consistent Akaike"CAIC"
, Hannan-Quinn"HQC"
, minimum description length"MDL2"
or"MDL5"
, approximate weight of evidence"AWE"
, classification likelihood"CLC"
, integrated classification likelihood"ICL"
or"ICL-BIC"
, partition coefficient"PC"
, total of positive relative deviations"D"
or sum of squares error"SSE"
. Variables
:-
a character vector of length
d
containing types of variables. One of"continuous"
or"discrete"
. pdf
:-
a character vector of length
d
containing continuous or discrete parametric family types. One of"normal"
,"lognormal"
,"Weibull"
,"gamma"
,"Gumbel"
,"binomial"
,"Poisson"
,"Dirac"
,"uniform"
or"vonMises"
. theta1
:-
a vector of length
d
containing initial component parameters. One ofn_{il} = \textrm{number of categories} - 1
for"binomial"
distribution. theta2
:-
a vector of length
d
containing initial component parameters. Currently not used. theta3
:-
a vector of length
d
containing initial component parameters. One of\xi_{il} \in \{-1, \textrm{NA}, 1\}
for"Gumbel"
distribution. K
:-
a character or a vector or a list of vectors containing numbers of bins
v
for the histogram and the kernel density estimation or numbers of nearest neighboursk
for the k-nearest neighbour. There is no genuine rule to identifyv
ork
. Consequently, the REBMIX algorithm identifies them from the setK
of input values by minimizing the information criterion. The Sturges rulev = 1 + \mathrm{log_{2}}(n)
,\mathrm{Log}_{10}
rulev = 10 \mathrm{log_{10}}(n)
or RootN rulev = 2 \sqrt{n}
can be applied to estimate the limiting numbers of bins or the rule of thumbk = \sqrt{n}
to guess the intermediate number of nearest neighbours. If, e.g.,K = c(10, 20, 40, 60)
and minimumIC
coincides, e.g.,40
, brackets are set to20
and60
and the golden section is applied to refine the minimum search. See alsokseq
for sequence of bins or nearest neighbours generation. The default value is"auto"
. ymin
:-
a vector of length
d
containing minimum observations. The default value isnumeric()
. ymax
:-
a vector of length
d
containing maximum observations. The default value isnumeric()
. ar
:-
acceleration rate
0 < a_{\mathrm{r}} \leq 1
. The default value is0.1
and in most cases does not have to be altered. Restraints
:-
a character giving the restraints type. One of
"rigid"
or default"loose"
. The rigid restraints are obsolete and applicable for well separated components only. Mode
:-
a character giving the mode type. One of
"all"
,"outliers"
or default"outliersplus"
.The modes are determined in decreasing order of magnitude from all observations ifMode = "all"
. IfMode = "outliers"
, the modes are determined in decreasing order of magnitude from outliers only. In the meantime, some outliers are reclassified as inliers. Finally, when all observations are inliers, the procedure is completed. IfMode = "outliersplus"
, the modes are determined in decreasing magnitude from the outliers only. In the meantime, some outliers are reclassified as inliers. Finally, if all observations are inliers, they are converted to outliers and the mode determination procedure is continued. w
:-
a list of vectors of length
c
containing component weightsw_{l}
summing to 1. Theta
:-
a list of lists each containing
c
parametric family typespdfl
. One of"normal"
,"lognormal"
,"Weibull"
,"gamma"
,"Gumbel"
,"binomial"
,"Poisson"
,"Dirac"
,"uniform"
or circular"vonMises"
defined for0 \leq y_{i} \leq 2 \pi
. Component parameterstheta1.l
follow the parametric family types. One of\mu_{il}
for normal, lognormal, Gumbel and von Mises distributions,\theta_{il}
for Weibull, gamma, binomial, Poisson and Dirac distributions anda
for uniform distribution. Component parameterstheta2.l
followtheta1.l
. One of\sigma_{il}
for normal, lognormal and Gumbel distributions,\beta_{il}
for Weibull and gamma distributions,p_{il}
for binomial distribution,\kappa_{il}
for von Mises distribution andb
for uniform distribution. Component parameterstheta3.l
followtheta2.l
. One of\xi_{il}
for Gumbel distribution. summary
:-
a data frame with additional information about dataset, preprocessing,
c_{\mathrm{max}}
,c_{\mathrm{min}}
, information criterion type,a_{\mathrm{r}}
, restraints type, mode type, optimalc
, optimalv
ork
,K
,y_{i0}
,y_{i\mathrm{min}}
,y_{i\mathrm{max}}
, optimalh_{i}
, information criterion\mathrm{IC}
, log likelihood\mathrm{log}\, L
and degrees of freedomM
. summary.EM
:-
a data frame with additional information about dataset, strategy for the EM algorithm
strategy
, variant of the EM algorithmvariant
, acceleration typeacceleration
, tolerancetolerance
, acceleration multilplieracceleration.multiplier
, maximum allowed number of iterationsmaximum.iterations
, number of iterations used for obtaining optimal solutionopt.iterations.nbr
and total number of iterations of the EM algorithmtotal.iterations.nbr
. pos
:-
position in the
summary
data frame at which log likelihood\mathrm{log}\, L
attains its maximum. opt.c
:-
a list of vectors containing numbers of components for optimal
v
for the histogram and the kernel density estimation or for optimal number of nearest neighboursk
for the k-nearest neighbour. opt.IC
:-
a list of vectors containing information criteria for optimal
v
for the histogram and the kernel density estimation or for optimal number of nearest neighboursk
for the k-nearest neighbour. opt.logL
:-
a list of vectors containing log likelihoods for optimal
v
for the histogram and the kernel density estimation or for optimal number of nearest neighboursk
for the k-nearest neighbour. opt.Dmin
:-
a list of vectors containing
D_{\mathrm{min}}
values for optimalv
for the histogram and the kernel density estimation or for optimal number of nearest neighboursk
for the k-nearest neighbour. opt.D
:-
a list of vectors containing totals of positive relative deviations for optimal
v
for the histogram and the kernel density estimation or for optimal number of nearest neighboursk
for the k-nearest neighbour. all.K
:-
a list of vectors containing all processed numbers of bins
v
for the histogram and the kernel density estimation or all processed numbers of nearest neighboursk
for the k-nearest neighbour. all.IC
:-
a list of vectors containing information criteria for all processed numbers of bins
v
for the histogram and the kernel density estimation or for all processed numbers of nearest neighboursk
for the k-nearest neighbour.
Author(s)
Marko Nagode
REBMIX Algorithm for Univariate or Multivariate Finite Mixture Estimation
Description
Returns as default the REBMIX algorithm output for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities. If model
equals "REBMVNORM"
output for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices is returned.
Usage
## S4 method for signature 'REBMIX'
REBMIX(model = "REBMIX", Dataset = list(), Preprocessing = character(),
cmax = 15, cmin = 1, Criterion = "AIC", pdf = character(),
theta1 = numeric(), theta2 = numeric(), theta3 = numeric(), K = "auto",
ymin = numeric(), ymax = numeric(), ar = 0.1,
Restraints = "loose", Mode = "outliersplus", EMcontrol = NULL, ...)
## ... and for other signatures
## S4 method for signature 'REBMIX'
summary(object, ...)
## ... and for other signatures
Arguments
model |
see Methods section below. |
Dataset |
a list of length |
Preprocessing |
a character giving the preprocessing type. One of |
cmax |
maximum number of components |
cmin |
minimum number of components |
Criterion |
a character giving the information criterion type. One of default Akaike |
pdf |
a character vector of length |
theta1 |
a vector of length |
theta2 |
a vector of length |
theta3 |
a vector of length |
K |
a character or a vector or a matrix of size |
ymin |
a vector of length |
ymax |
a vector of length |
ar |
acceleration rate |
Restraints |
a character giving the restraints type. One of |
Mode |
a character giving the mode type. One of |
EMcontrol |
an object of class |
object |
see Methods section below. |
... |
currently not used. |
Value
Returns an object of class REBMIX
or REBMVNORM
.
Methods
signature(model = "REBMIX")
a character giving the default class name
"REBMIX"
for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities.signature(model = "REBMVNORM")
a character giving the class name
"REBMVNORM"
for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices.signature(object = "REBMIX")
an object of class
REBMIX
.signature(object = "REBMVNORM")
an object of class
REBMVNORM
.
Author(s)
Marko Nagode
References
H. A. Sturges. The choice of a class interval. Journal of American Statistical Association, 21(153):
65-66, 1926. https://www.jstor.org/stable/2965501.
P. F. Velleman. Interactive computing for exploratory data analysis I: display algorithms. Proceedings of the Statistical Computing Section,
American Statistical Association, 1976.
W. J. Dixon and R. A. Kronmal. The Choice of origin and scale for graphs. Journal of the ACM, 12(2):
259-261, 1965. doi:10.1145/321264.321277.
M. Nagode and M. Fajdiga. A general multi-modal probability density function suitable for the
rainflow ranges of stationary random processes. International Journal of Fatigue, 20(3):211-223,
1998. doi:10.1016/S0142-1123(97)00106-0.
M. Nagode and M. Fajdiga. An improved algorithm for parameter estimation suitable for mixed
weibull distributions. International Journal of Fatigue, 22(1):75-80, 2000. doi:10.1016/S0142-1123(99)00112-7.
M. Nagode, J. Klemenc and M. Fajdiga. Parametric modelling and scatter prediction of rainflow
matrices. International Journal of Fatigue, 23(6):525-532, 2001. doi:10.1016/S0142-1123(01)00007-X.
M. Nagode and M. Fajdiga. An alternative perspective on the mixture estimation problem. Reliability
Engineering & System Safety, 91(4):388-397, 2006. doi:10.1016/j.ress.2005.02.005.
M. Nagode and M. Fajdiga. The rebmix algorithm for the univariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(5):876-892, 2011a. doi:10.1080/03610920903480890.
M. Nagode and M. Fajdiga. The rebmix algorithm for the multivariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(11):2022-2034, 2011b. doi:10.1080/03610921003725788.
M. Nagode. Finite mixture modeling via REBMIX.
Journal of Algorithms and Optimization, 3(2):14-28, 2015. https://repozitorij.uni-lj.si/Dokument.php?id=127674&lang=eng.
B. Panic, J. Klemenc, M. Nagode. Improved initialization of the EM algorithm for mixture model parameter estimation.
Mathematics, 8(3):373, 2020.
doi:10.3390/math8030373.
Examples
# Generate and plot univariate normal dataset.
n <- c(998, 263, 1086, 487)
Theta <- new("RNGMIX.Theta", c = 4, pdf = "normal")
a.theta1(Theta) <- c(688, 265, 30, 934)
a.theta2(Theta) <- c(72, 54, 34, 28)
normal <- RNGMIX(Dataset.name = "complex1",
rseed = -1,
n = n,
Theta = a.Theta(Theta))
normal
a.Dataset(normal, 1)[1:20,]
# Estimate number of components, component weights and component parameters.
normalest <- REBMIX(Dataset = a.Dataset(normal),
Preprocessing = "h",
cmax = 8,
Criterion = "BIC",
pdf = "n")
normalest
BIC(normalest)
logL(normalest)
# Plot finite mixture.
plot(normalest, nrow = 2, what = c("pdf", "marginal cdf"), npts = 1000)
# EM algorithm utilization
# Load iris data.
data(iris)
Dataset <- list(data.frame(iris[, c(1:4)]))
# Create EM.Control object.
EM <- new("EM.Control",
strategy = "exhaustive",
variant = "EM",
acceleration = "fixed",
tolerance = 1e-4,
acceleration.multiplier = 1.0,
maximum.iterations = 1000)
# Mixture parameter estimation using REBMIX and EM algorithm.
irisest <- REBMIX(model = "REBMVNORM",
Dataset = Dataset,
Preprocessing = "histogram",
cmax = 10,
Criterion = "BIC",
EMcontrol = EM)
irisest
# Print total number of EM iterations used in Ehxaustive strategy from summary.EM slot.
a.summary.EM(irisest, col.name = "total.iterations.nbr", pos = 1)
Class "REBMIX.boot"
Description
Object of class REBMIX.boot
.
Objects from the Class
Objects can be created by calls of the form new("REBMIX.boot", ...)
. Accessor methods for the slots are a.rseed(x = NULL)
,
a.pos(x = NULL)
, a.Bootstrap(x = NULL)
, a.B(x = NULL)
, a.n(x = NULL)
, a.replace(x = NULL)
, a.prob(x = NULL)
,
a.c(x = NULL)
, a.c.se(x = NULL)
, a.c.cv(x = NULL)
, a.c.mode(x = NULL)
, a.c.prob(x = NULL)
, a.w(x = NULL)
,
a.w.se(x = NULL)
, a.w.cv(x = NULL)
, a.Theta(x = NULL)
, a.Theta.se(x = NULL)
and a.Theta.cv(x = NULL)
, where x
stands for an object of class REBMIX.boot
.
Slots
x
:-
an object of class
REBMIX
. rseed
:-
set the random seed to any negative integer value to initialize the sequence. The first bootstrap dataset corresponds to it. For each next bootstrap dataset the random seed is decremented
r_{\mathrm{seed}} = r_{\mathrm{seed}} - 1
. The default value is-1
. pos
:-
a desired row number in
x@summary
to be bootstrapped. The default value is1
. Bootstrap
:-
a character giving the bootstrap type. One of default
"parametric"
or"nonparametric"
. B
:-
number of bootstrap datasets. The default value is
100
. n
:-
number of observations. The default value is
numeric()
. replace
:-
logical. The sampling is with replacement if
TRUE
, see alsosample
. The default value isTRUE
. prob
:-
a vector of length
n
containing probability weights, see alsosample
. The default value isnumeric()
. c
:-
a vector containing numbers of components for
B
bootstrap datasets. c.se
:-
standard error of numbers of components
c
. c.cv
:-
coefficient of variation of numbers of components
c
. c.mode
:-
mode of numbers of components
c
. c.prob
:-
probability of mode
c.mode
. w
:-
a matrix containing component weights for
\leq B
bootstrap datasets. w.se
:-
a vector containing standard errors of component weights
w
. w.cv
:-
a vector containing coefficients of variation of component weights
w
. Theta
:-
a list of matrices containing component parameters
theta1.l
,theta2.l
andtheta3.l
for\leq B
bootstrap datasets. Theta.se
:-
a list of vectors containing standard errors of component parameters
theta1.l
,theta2.l
andtheta3.l
. Theta.cv
:-
a list of vectors containing coefficients of variation of component parameters
theta1.l
,theta2.l
andtheta3.l
.
Author(s)
Marko Nagode
Class "RNGMIX"
Description
Object of class RNGMIX
.
Objects from the Class
Objects can be created by calls of the form new("RNGMIX", ...)
. Accessor methods for the slots are a.Dataset.name(x = NULL)
,
a.rseed(x = NULL)
, a.n(x = NULL)
, a.Theta(x = NULL)
, a.Dataset(x = NULL, pos = 0)
,
a.Zt(x = NULL)
, a.w(x = NULL)
, a.Variables(x = NULL)
, a.ymin(x = NULL)
and a.ymax(x = NULL)
,
where x
and pos
stand for an object of class RNGMIX
and a desired slot item, respectively.
Slots
Dataset.name
:-
a character vector containing list names of data frames of size
n \times d
that d-dimensional datasets are written in. rseed
:-
set the random seed to any negative integer value to initialize the sequence. The first file in
Dataset.name
corresponds to it. For each next file the random seed is decrementedr_{\mathrm{seed}} = r_{\mathrm{seed}} - 1
. The default value is-1
. n
:-
a vector containing numbers of observations in classes
n_{l}
, where number of observationsn = \sum_{l = 1}^{c} n_{l}
. Theta
:-
a list containing
c
parametric family typespdfl
. One of"normal"
,"lognormal"
,"Weibull"
,"gamma"
,"Gumbel"
,"binomial"
,"Poisson"
,"Dirac"
,"uniform"
or circular"vonMises"
defined for0 \leq y_{i} \leq 2 \pi
. Component parameterstheta1.l
follow the parametric family types. One of\mu_{il}
for normal, lognormal, Gumbel and von Mises distributions,\theta_{il}
for Weibull, gamma, binomial, Poisson and Dirac distributions anda
for uniform distribution. Component parameterstheta2.l
followtheta1.l
. One of\sigma_{il}
for normal, lognormal and Gumbel distributions,\beta_{il}
for Weibull and gamma distributions,p_{il}
for binomial distribution,\kappa_{il}
for von Mises distribution andb
for uniform distribution. Component parameterstheta3.l
followtheta2.l
. One of\xi_{il} \in \{-1, 1\}
for Gumbel distribution. Dataset
:-
a list of length
n_{\mathrm{D}}
of data frames of sizen \times d
containing d-dimensional datasets. Each of thed
columns represents one random variable. Numbers of observationsn
equal the number of rows in the datasets. Zt
:-
a factor of true cluster membership.
w
:-
a vector of length
c
containing component weightsw_{l}
summing to 1. Variables
:-
a character vector containing types of variables. One of
"continuous"
or"discrete"
. ymin
:-
a vector of length
d
containing minimum observations. ymax
:-
a vector of length
d
containing maximum observations.
Author(s)
Marko Nagode
Random Univariate or Multivariate Finite Mixture Generation
Description
Returns as default the RNGMIX univariate or multivariate random datasets for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities.
If model
equals "RNGMVNORM"
multivariate random datasets for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices are returned.
Usage
## S4 method for signature 'RNGMIX'
RNGMIX(model = "RNGMIX", Dataset.name = character(),
rseed = -1, n = numeric(), Theta = list(), ...)
## ... and for other signatures
Arguments
model |
see Methods section below. |
Dataset.name |
a character vector containing list names of data frames of size |
rseed |
set the random seed to any negative integer value to initialize the sequence. The first file in |
n |
a vector containing numbers of observations in classes |
Theta |
a list containing |
... |
currently not used. |
Details
RNGMIX is based on the "Minimal" random number generator ran1
of Park and Miller with the Bays-Durham shuffle and added safeguards that returns a uniform random deviate between 0.0 and 1.0
(exclusive of the endpoint values).
Value
Returns an object of class RNGMIX
or RNGMVNORM
.
Methods
signature(model = "RNGMIX")
a character giving the default class name
"RNGMIX"
for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities.signature(model = "RNGMVNORM")
a character giving the class name
"RNGMVNORM"
for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices.
Author(s)
Marko Nagode
References
W. H. Press, S. A. Teukolsky, W. T. Vetterling and B. P. Flannery. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, Cambridge, 1992.
Examples
devAskNewPage(ask = TRUE)
# Generate and print multivariate normal datasets with diagonal
# variance-covariance matrices.
n <- c(75, 100, 125, 150, 175)
Theta <- new("RNGMIX.Theta", c = 5, pdf = rep("normal", 4))
a.theta1(Theta, 1) <- c(10, 12, 10, 12)
a.theta1(Theta, 2) <- c(8.5, 10.5, 8.5, 10.5)
a.theta1(Theta, 3) <- c(12, 14, 12, 14)
a.theta1(Theta, 4) <- c(13, 15, 7, 9)
a.theta1(Theta, 5) <- c(7, 9, 13, 15)
a.theta2(Theta, 1) <- c(1, 1, 1, 1)
a.theta2(Theta, 2) <- c(1, 1, 1, 1)
a.theta2(Theta, 3) <- c(1, 1, 1, 1)
a.theta2(Theta, 4) <- c(2, 2, 2, 2)
a.theta2(Theta, 5) <- c(3, 3, 3, 3)
simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:25, sep = ""),
rseed = -1,
n = n,
Theta = a.Theta(Theta))
simulated
plot(simulated, pos = 22, nrow = 2, ncol = 3)
# Generate and print multivariate normal datasets with unrestricted
# variance-covariance matrices.
n <- c(200, 50, 50)
Theta <- new("RNGMVNORM.Theta", c = 3, d = 3)
a.theta1(Theta, 1) <- c(0, 0, 0)
a.theta1(Theta, 2) <- c(-6, 3, 6)
a.theta1(Theta, 3) <- c(6, 6, 4)
a.theta2(Theta, 1) <- c(9, 0, 0, 0, 4, 0, 0, 0, 1)
a.theta2(Theta, 2) <- c(4, -3.2, -0.2, -3.2, 4, 0, -0.2, 0, 1)
a.theta2(Theta, 3) <- c(4, 3.2, 2.8, 3.2, 4, 2.4, 2.8, 2.4, 2)
simulated <- RNGMIX(model = "RNGMVNORM",
Dataset.name = paste("simulated_", 1:2, sep = ""),
rseed = -1,
n = n,
Theta = a.Theta(Theta))
simulated
plot(simulated, pos = 2, nrow = 3, ncol = 1)
# Generate and print multivariate mixed continuous-discrete datasets.
n <- c(400, 100, 500)
Theta <- new("RNGMIX.Theta", c = 3, pdf = c("lognormal", "Poisson", "binomial", "Weibull"))
a.theta1(Theta, 1) <- c(1, 2, 10, 2)
a.theta1(Theta, 2) <- c(3.5, 10, 10, 10)
a.theta1(Theta, 3) <- c(2.5, 15, 10, 25)
a.theta2(Theta, 1) <- c(0.3, NA, 0.9, 3)
a.theta2(Theta, 2) <- c(0.2, NA, 0.1, 7)
a.theta2(Theta, 3) <- c(0.4, NA, 0.7, 20)
simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:5, sep = ""),
rseed = -1,
n = n,
Theta = a.Theta(Theta))
simulated
plot(simulated, pos = 4, nrow = 2, ncol = 3)
# Generate and print univariate mixed Weibull dataset.
n <- c(75, 100, 125, 150, 175)
Theta <- new("RNGMIX.Theta", c = 5, pdf = "Weibull")
a.theta1(Theta) <- c(12, 10, 14, 15, 9)
a.theta2(Theta) <- c(2, 4.1, 3.2, 7.1, 5.3)
simulated <- RNGMIX(Dataset.name = "simulated",
rseed = -1,
n = n,
Theta = a.Theta(Theta))
simulated
plot(simulated, pos = 1)
# Generate and print multivariate normal datasets with unrestricted
# variance-covariance matrices.
# Set dimension, dataset size, number of components and seed.
d <- 2; n <- 1000; c <- 10; set.seed(123)
# Component weights are generated.
w <- runif(c, 0.1, 0.9); w <- w / sum(w)
# Set range of means and rang of eigenvalues.
mu <- c(-100, 100); lambda <- c(1, 100)
# Component means and variance-covariance matrices are calculated.
Mu <- list(); Sigma <- list()
for (l in 1:c) {
Mu[[l]] <- runif(d, mu[1], mu[2])
Lambda <- diag(runif(d, lambda[1], lambda[2]), nrow = d, ncol = d)
P <- svd(matrix(runif(d * d, -1, 1), nc = d))$u
Sigma[[l]] <- P
}
# Numbers of observations are calculated and component means and
# variance-covariance matrices are stored.
n <- round(w * n); Theta <- list()
for (l in 1:c) {
Theta[[paste0("pdf", l)]] <- rep("normal", d)
Theta[[paste0("theta1.", l)]] <- Mu[[l]]
Theta[[paste0("theta2.", l)]] <- as.vector(Sigma[[l]])
}
# Dataset is generated.
simulated <- RNGMIX(model = "RNGMVNORM", Dataset.name = "mvnorm_1",
rseed = -1, n = n, Theta = Theta)
plot(simulated)
# Generate and print bivariate mixed uniform-Gumbel dataset.
n <- c(100, 150)
Theta <- new("RNGMIX.Theta", c = 2, pdf = c("uniform", "Gumbel"))
a.theta1(Theta, l = 1) <- c(2, 10)
a.theta2(Theta, l = 1) <- c(10, 2.3)
a.theta3(Theta, l = 1) <- c(NA, 1.0)
a.theta1(Theta, l = 2) <- c(10, 50)
a.theta2(Theta, l = 2) <- c(30, 4.2)
a.theta3(Theta, l = 2) <- c(NA, -1.0)
simulated <- RNGMIX(Dataset.name = paste("simulated_", 1, sep = ""),
rseed = -1,
n = n,
Theta = a.Theta(Theta))
plot(simulated)
Class "RNGMIX.Theta"
Description
Object of class RNGMIX.Theta
.
Objects from the Class
Objects can be created by calls of the form new("RNGMIX.Theta", ...)
. Accessor methods for the slots are a.c(x = NULL)
, a.d(x = NULL)
,
a.pdf(x = NULL)
and a.Theta(x = NULL)
, where x
stands for an object of class RNGMIX.Theta
. Setter methods
a.theta1(x = NULL, l = numeric())
, a.theta2(x = NULL, l = numeric())
and a.theta3(x = NULL, l = numeric())
,
a.theta1.all(x = NULL)
, a.theta2.all(x = NULL)
and a.theta3.all(x = NULL)
are provided to write to Theta
slot, where l = 1, \ldots, c
.
Slots
c
:-
number of components
c > 0
. The default value is1
. d
:-
number of dimensions.
pdf
:-
a character vector of length
d
containing continuous or discrete parametric family types. One of"normal"
,"lognormal"
,"Weibull"
,"gamma"
,"Gumbel"
,"binomial"
,"Poisson"
,"Dirac"
,"uniform"
or"vonMises"
. Theta
:-
a list containing
c
parametric family typespdfl
. One of"normal"
,"lognormal"
,"Weibull"
,"gamma"
,"Gumbel"
,"binomial"
,"Poisson"
,"Dirac"
,"uniform"
or circular"vonMises"
defined for0 \leq y_{i} \leq 2 \pi
. Component parameterstheta1.l
follow the parametric family types. One of\mu_{il}
for normal, lognormal, Gumbel and von Mises distributions,\theta_{il}
for Weibull, gamma, binomial, Poisson and Dirac distributions anda
for uniform distribution. Component parameterstheta2.l
followtheta1.l
. One of\sigma_{il}
for normal, lognormal and Gumbel distributions,\beta_{il}
for Weibull and gamma distributions,p_{il}
for binomial distribution,\kappa_{il}
for von Mises distribution andb
for uniform distribution. Component parameterstheta3.l
followtheta2.l
. One of\xi_{il} \in \{-1, 1\}
for Gumbel distribution.
Author(s)
Marko Nagode
Examples
Theta <- new("RNGMIX.Theta", c = 2, pdf = c("normal", "Gumbel"))
a.theta1(Theta, l = 1) <- c(2, 10)
a.theta2(Theta, l = 1) <- c(0.5, 2.3)
a.theta3(Theta, l = 1) <- c(NA, 1.0)
a.theta1(Theta, l = 2) <- c(20, 50)
a.theta2(Theta, l = 2) <- c(3, 4.2)
a.theta3(Theta, l = 2) <- c(NA, -1.0)
Theta
Theta <- new("RNGMIX.Theta", c = 2, pdf = c("normal", "Gumbel"))
a.theta1.all(Theta) <- c(2, 10, 20, 50)
a.theta2.all(Theta) <- c(0.5, 2.3, 3, 4.2)
a.theta3.all(Theta) <- c(NA, 1.0, NA, -1.0)
Theta
Theta <- new("RNGMVNORM.Theta", c = 2, d = 3)
a.theta1(Theta, l = 1) <- c(2, 10, -20)
a.theta1(Theta, l = 2) <- c(-2.4, -15.1, 30)
Theta
Sum of Squares Error
Description
Returns the sum of squares error at pos
.
Usage
## S4 method for signature 'REBMIX'
SSE(x = NULL, pos = 1, ...)
## ... and for other signatures
Arguments
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
Methods
signature(x = "REBMIX")
an object of class
REBMIX
.signature(x = "REBMVNORM")
an object of class
REBMVNORM
.
Author(s)
Marko Nagode
References
C. M. Bishop. Neural Networks for Pattern Recognition. Clarendon Press, Oxford, 1995.
Adult Dataset
Description
The adult
dataset containing 48842 instances with 16 continuous, binary and discrete variables was extracted from the census bureau database. Extraction was done by Barry Becker from the 1994 census bureau database.
Usage
data(adult)
Format
adult
is a data frame with 48842 cases (rows) and 16 variables (columns) named:
-
Type
binarytrain
ortest
. -
Age
continuous. -
Workclass
one of the 8 discrete valuesprivate
,self-emp-not-inc
,self-emp-inc
,federal-gov
,local-gov
,state-gov
,without-pay
ornever-worked
. -
Fnlwgt
stands for continuous final weight. -
Education
one of the 16 discrete valuesbachelors
,some-college
,11th
,hs-grad
,prof-school
,assoc-acdm
,assoc-voc
,9th
,7th-8th
,12th
,masters
,1st-4th
,10th
,doctorate
,5th-6th
orpreschool
. -
Education.Num
continuous. -
Marital.Status
one of the 7 discrete valuesmarried-civ-spouse
,divorced
,never-married
,separated
,widowed
,married-spouse-absent
ormarried-af-spouse
. -
Occupation
one of the 14 discrete valuestech-support
,craft-repair
,other-service
,sales
,exec-managerial
,prof-specialty
,handlers-cleaners
,machine-op-inspct
,adm-clerical
,farming-fishing
,transport-moving
,priv-house-serv
,protective-serv
orarmed-forces
. -
Relationship
one of the 6 discrete valueswife
,own-child
,husband
,not-in-family
,other-relative
orunmarried
. -
Race
one of the 5 discrete valueswhite
,asian-pac-islander
,amer-indian-eskimo
,other
orblack
. -
Sex
binaryfemale
ormale
. -
Capital.Gain
continuous. -
Capital.Loss
continuous. -
Hours.Per.Week
continuous. -
Native.Country
one of the 41 discrete valuesunited-states
,cambodia
,england
,puerto-rico
,canada
,germany
,outlying-us(guam-usvi-etc)
,india
,japan
,greece
,south
,china
,cuba
,iran
,honduras
,philippines
,italy
,poland
,jamaica
,vietnam
,mexico
,portugal
,ireland
,france
,dominican-republic
,laos
,ecuador
,taiwan
,haiti
,columbia
,hungary
,guatemala
,nicaragua
,scotland
,thailand
,yugoslavia
,el-salvador
,trinadad&tobago
,peru
,hong
orholand-netherlands
. -
Income
binary<=50k
or>50k
.
Source
A. Asuncion and D. J. Newman. Uci machine learning repository, 2007. http://archive.ics.uci.edu/ml/.
References
A. Asuncion and D. J. Newman. Uci machine learning repository, 2007. http://archive.ics.uci.edu/ml/.
Examples
data(adult)
# Find complete cases.
adult <- adult[complete.cases(adult),]
# Show level attributes for binary and discrete variables.
levels(adult[["Type"]])
levels(adult[["Workclass"]])
levels(adult[["Education"]])
levels(adult[["Marital.Status"]])
levels(adult[["Occupation"]])
levels(adult[["Relationship"]])
levels(adult[["Race"]])
levels(adult[["Sex"]])
levels(adult[["Native.Country"]])
levels(adult[["Income"]])
Bearings Faults Detection Data
Description
These data are the results of the extraction process from the vibrational data of healthy and faulty bearings. Different faults are considered: faultless (1), defect on outer race (2), defect on inner race (3) and defect on ball (4). The extracted features are: root mean square (RMS), square root of the amplitude (SRA), kurtosis value (KV), skewness value (SV), peak to peak value (PPV), crest factor (CF), impulse factor (IF), margin factor (MF), shape factor (SF), kurtosis factor (KF), frequency centre (FC), root mean square frequency (RMSF) and root variance frequency (RVF).
Usage
data(bearings)
Format
bearings
is a data frame with 1906 cases (rows) and 14 variables (columns) named:
-
RMS
continuous. -
SRA
continuous. -
KV
continuous. -
SV
continuous. -
PPV
continuous. -
CF
continuous. -
IF
continuous. -
MF
continuous. -
SF
continuous. -
KF
continuous. -
FC
continuous. -
RMSF
continuous. -
RVF
continuous. -
Class
discrete1
,2
,3
or4
.
Source
Case Western Reserve University Bearing Data Center Website https://engineering.case.edu/bearingdatacenter/welcome.
References
B. Panic, J. Klemenc and M. Nagode. Gaussian mixture model based classification revisited: Application to the bearing fault classification. Journal of Mechanical Engineering, 66(4):215-226, 2020. doi:10.5545/sv-jme.2020.6563.
Examples
## Not run:
data(bearings)
# Split dataset into train (75
set.seed(3)
Bearings <- split(p = 0.75, Dataset = bearings, class = 14)
# Estimate number of components, component weights and component
# parameters for train subsets.
bearingsest <- REBMIX(model = "REBMVNORM",
Dataset = a.train(Bearings),
Preprocessing = "histogram",
cmax = 15,
Criterion = "BIC")
# Classification.
bearingscla <- RCLSMIX(model = "RCLSMVNORM",
x = list(bearingsest),
Dataset = a.test(Bearings),
Zt = a.Zt(Bearings))
bearingscla
summary(bearingscla)
## End(Not run)
Binning of Data
Description
Returns the list of data frames containing bin means \bar{\bm{y}}_{1}, \ldots, \bar{\bm{y}}_{v}
and frequencies k_{1}, \ldots, k_{v}
for the histogram preprocessing.
Usage
## S4 method for signature 'list'
bins(Dataset = list(), K = matrix(),
ymin = numeric(), ymax = numeric(), ...)
## ... and for other signatures
Arguments
Dataset |
a list of length |
K |
a matrix of size |
ymin |
a vector of length |
ymax |
a vector of length |
... |
currently not used. |
Methods
signature(x = "list")
a list of data frames.
Author(s)
Branislav Panic, Marko Nagode
References
M. Nagode. Finite mixture modeling via REBMIX. Journal of Algorithms and Optimization, 3(2):14-28, 2015. https://repozitorij.uni-lj.si/Dokument.php?id=127674&lang=eng.
Examples
# Generate multivariate normal datasets.
n <- c(7, 10)
Theta <- new("RNGMVNORM.Theta", c = 2, d = 2)
a.theta1(Theta, 1) <- c(8, 6)
a.theta1(Theta, 2) <- c(6, 8)
a.theta2(Theta, 1) <- c(8, 2, 2, 4)
a.theta2(Theta, 2) <- c(2, 1, 1, 4)
sim2d <- RNGMIX(model = "RNGMVNORM",
Dataset.name = paste("sim2d_", 1:2, sep = ""),
rseed = -1,
n = n,
Theta = a.Theta(Theta))
# Calculate optimal numbers of bins.
opt.k <- optbins(Dataset = sim2d@Dataset,
Rule = "Knuth equal",
kmin = 1,
kmax = 20)
opt.k
Y <- bins(Dataset = sim2d@Dataset, K = opt.k)
Y
opt.k <- optbins(Dataset = sim2d@Dataset,
Rule = "Knuth unequal",
kmin = 1,
kmax = 20)
opt.k
Y <- bins(Dataset = sim2d@Dataset, K = opt.k)
Y
Parametric or Nonparametric Bootstrap for Standard Error and Coefficient of Variation Estimation
Description
Returns as default the boot output for mixtures of conditionally independent normal,
lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities. If
x
is of class RNGMVNORM
the boot output for mixtures of multivariate normal
component densities with unrestricted variance-covariance matrices is returned.
Usage
## S4 method for signature 'REBMIX'
boot(x = NULL, rseed = -1, pos = 1, Bootstrap = "parametric",
B = 100, n = numeric(), replace = TRUE, prob = numeric(), ...)
## ... and for other signatures
## S4 method for signature 'REBMIX.boot'
summary(object, ...)
## ... and for other signatures
Arguments
x |
see Methods section below. |
rseed |
set the random seed to any negative integer value to initialize the sequence. The first bootstrap dataset corresponds to it.
For each next bootstrap dataset the random seed is decremented |
pos |
a desired row number in |
Bootstrap |
a character giving the bootstrap type. One of default |
B |
number of bootstrap datasets. The default value is |
n |
number of observations. The default value is |
replace |
logical. The sampling is with replacement if |
prob |
a vector of length |
... |
maximum number of components |
object |
see Methods section below. |
Value
Returns an object of class REBMIX.boot
or REBMVNORM.boot
.
Methods
signature(x = "REBMIX")
an object of class
REBMIX
for mixtures of conditionally independent normal, lognormal, Weibull, gamma, Gumbel, binomial, Poisson, Dirac, uniform or von Mises component densities.signature(x = "REBMVNORM")
an object of class
REBMVNORM
for mixtures of multivariate normal component densities with unrestricted variance-covariance matrices.signature(object = "REBMIX")
an object of class
REBMIX
.signature(object = "REBMVNORM")
an object of class
REBMVNORM
.
Author(s)
Marko Nagode
References
G. McLachlan and D. Peel. Finite Mixture Models. John Wiley & Sons, New York, 2000.
Examples
## Not run:
data(weibull)
# Create object of class EM.Control.
EM <- new("EM.Control", strategy = "single", variant = "EM",
acceleration = "fixed", acceleration.multiplier = 1.0, tolerance = 1.0E-4,
maximum.iterations = 1000)
# Estimate number of components, component weights and component parameters.
weibullest <- REBMIX(Dataset = list(weibull),
Preprocessing = "kernel density estimation",
cmin = 2,
cmax = 4,
Criterion = "BIC",
pdf = "Weibull",
EMcontrol = EM)
# Plot finite mixture.
plot(weibullest, what = c("pdf", "marginal cdf", "IC", "logL", "D"),
nrow = 3, ncol = 2, npts = 1000)
# Bootstrap finite mixture.
weibullboot <- boot(x = weibullest, Bootstrap = "nonparametric", B = 10)
weibullboot
## End(Not run)
Compact Histogram Calculation
Description
Returns an object of class Histogram
. The method can be called recursively.
This way more than one dataset can be binned into one histogram. The method is time consuming.
Usage
## S4 method for signature 'Histogram'
chistogram(x = NULL, Dataset = data.frame(),
K = numeric(), ymin = numeric(), ymax = numeric(), ...)
## ... and for other signatures
Arguments
x |
an object of class |
Dataset |
a data frame of size |
K |
an integer or a vector of length |
ymin |
a vector of length |
ymax |
a vector of length |
... |
currently not used. |
Methods
signature(x = "Histogram")
an object of class
Histogram
.
Author(s)
Marko Nagode
Examples
# Create three datasets.
set.seed(1)
n <- 15
Dataset1 <- as.data.frame(cbind(rnorm(n, 157, 8), rnorm(n, 71, 10)))
Dataset2 <- as.data.frame(cbind(rnorm(n, 244, 14), rnorm(n, 61, 29)))
Dataset3 <- as.data.frame(cbind(rnorm(n, 198, 8), rnorm(n, 252, 13)))
apply(Dataset1, 2, range)
apply(Dataset2, 2, range)
apply(Dataset3, 2, range)
# Bin the first dataset.
hist <- chistogram(Dataset = Dataset1, K = c(4, 5), ymin = c(100.0, 0.0), ymax = c(300.0, 300.0))
# Bin the second dataset.
hist <- chistogram(x = hist, Dataset = Dataset2)
# Bin the third dataset.
hist <- chistogram(x = hist, Dataset = Dataset3)
hist
Extracts Chunk from Train and Test Datasets
Description
Returns (invisibly) the object containing train and test observations \bm{x}_{1}, \ldots, \bm{x}_{n}
as well as true class membership \bm{\Omega}_{g}
for the test dataset. Vectors \bm{x}
are subvectors of
\bm{y} = (y_{1}, \ldots, y_{d})^{\top}
.
Usage
## S4 method for signature 'RCLS.chunk'
chunk(x = NULL, variables = expression(1:d))
## ... and for other signatures
Arguments
x |
see Methods section below. |
variables |
a vector containing indices of variables in subvectors |
Value
Returns an object of class RCLS.chunk
.
Methods
signature(x = "RCLS.chunk")
an object of class
RCLS.chunk
.
Author(s)
Marko Nagode
Examples
data(iris)
# Split dataset into train (75%) and test (25%) subsets.
set.seed(5)
Iris <- split(p = 0.75, Dataset = iris, class = 5)
# Extract chunk from train and test datasets.
Iris14 <- chunk(x = Iris, variables = c(1,4))
Iris14
Empirical Density Calculation
Description
Returns the data frame containing observations \bm{x}_{1}, \ldots, \bm{x}_{n}
and empirical
densities f_{1}, \ldots, f_{n}
for the kernel density estimation or k-nearest neighbour or bin means \bar{\bm{x}}_{1}, \ldots, \bar{\bm{x}}_{v}
and empirical densities f_{1}, \ldots, f_{v}
for the histogram preprocessing. Vectors \bm{x}
and \bar{\bm{x}}
are subvectors of
\bm{y} = (y_{1}, \ldots, y_{d})^{\top}
and \bar{\bm{y}} = (\bar{y}_{1}, \ldots, \bar{y}_{d})^{\top}
.
Usage
## S4 method for signature 'REBMIX'
demix(x = NULL, pos = 1, variables = expression(1:d), ...)
## ... and for other signatures
Arguments
x |
see Methods section below. |
pos |
a desired row number in |
variables |
a vector containing indices of variables in subvectors |
... |
currently not used. |
Methods
signature(x = "REBMIX")
an object of class
REBMIX
.signature(x = "REBMVNORM")
an object of class
REBMVNORM
.
Author(s)
Marko Nagode
References
M. Nagode and M. Fajdiga. The rebmix algorithm for the univariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(5):876-892, 2011a. doi:10.1080/03610920903480890.
M. Nagode and M. Fajdiga. The rebmix algorithm for the multivariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(11):2022-2034, 2011b. doi:10.1080/03610921003725788.
M. Nagode. Finite mixture modeling via REBMIX.
Journal of Algorithms and Optimization, 3(2):14-28, 2015. https://repozitorij.uni-lj.si/Dokument.php?id=127674&lang=eng.
Examples
# Generate simulated dataset.
n <- c(15, 15)
Theta <- new("RNGMIX.Theta", c = 2, pdf = rep("normal", 3))
a.theta1(Theta, 1) <- c(10, 20, 30)
a.theta1(Theta, 2) <- c(3, 4, 5)
a.theta2(Theta, 1) <- c(3, 2, 1)
a.theta2(Theta, 2) <- c(15, 10, 5)
simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:4, sep = ""),
rseed = -1,
n = n,
Theta = a.Theta(Theta))
# Create object of class EM.Control.
EM <- new("EM.Control", strategy = "best")
# Estimate number of components, component weights and component parameters.
simulatedest <- REBMIX(model = "REBMVNORM",
Dataset = a.Dataset(simulated),
Preprocessing = "h",
cmax = 8,
Criterion = "BIC",
EMcontrol = NULL)
# Preprocess simulated dataset.
f <- demix(simulatedest, pos = 3, variables = c(1, 3))
f
# Plot finite mixture.
opar <- plot(simulatedest, pos = 3, nrow = 3, ncol = 1)
par(usr = opar[[2]]$usr, mfg = c(2, 1))
text(x = f[, 1], y = f[, 2], labels = format(f[, 3], digits = 3), cex = 0.8, pos = 1)
Predictive Marginal Density Calculation
Description
Returns the data frame containing observations \bm{x}_{1}, \ldots, \bm{x}_{n}
and
predictive marginal densities f(\bm{x} | c, \bm{w}, \bm{\Theta})
. Vectors \bm{x}
are subvectors of
\bm{y} = (y_{1}, \ldots, y_{d})^{\top}
. If \bm{x} = \bm{y}
the method returns the data frame containing observations \bm{y}_{1}, \ldots, \bm{y}_{n}
and
the corresponding predictive mixture densities f(\bm{y} | c, \bm{w}, \bm{\Theta})
.
Usage
## S4 method for signature 'REBMIX'
dfmix(x = NULL, Dataset = data.frame(), pos = 1, variables = expression(1:d), ...)
## ... and for other signatures
Arguments
x |
see Methods section below. |
Dataset |
a data frame containing observations |
pos |
a desired row number in |
variables |
a vector containing indices of variables in subvectors |
... |
currently not used. |
Methods
signature(x = "REBMIX")
an object of class
REBMIX
.signature(x = "REBMVNORM")
an object of class
REBMVNORM
.
Author(s)
Marko Nagode
References
M. Nagode and M. Fajdiga. The rebmix algorithm for the univariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(5):876-892, 2011a. doi:10.1080/03610920903480890.
M. Nagode and M. Fajdiga. The rebmix algorithm for the multivariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(11):2022-2034, 2011b. doi:10.1080/03610921003725788.
M. Nagode. Finite mixture modeling via REBMIX.
Journal of Algorithms and Optimization, 3(2):14-28, 2015. https://repozitorij.uni-lj.si/Dokument.php?id=127674&lang=eng.
Examples
# Generate simulated dataset.
n <- c(15, 15)
Theta <- new("RNGMIX.Theta", c = 2, pdf = rep("normal", 3))
a.theta1(Theta, 1) <- c(10, 20, 30)
a.theta1(Theta, 2) <- c(3, 4, 5)
a.theta2(Theta, 1) <- c(3, 2, 1)
a.theta2(Theta, 2) <- c(15, 10, 5)
simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:4, sep = ""),
rseed = -1,
n = n,
Theta = a.Theta(Theta))
# Number of classes or nearest neighbours to be processed.
K <- c(as.integer(1 + log2(sum(n))), # Minimum v follows Sturges rule.
as.integer(10 * log10(sum(n)))) # Maximum v follows log10 rule.
# Estimate number of components, component weights and component parameters.
simulatedest <- REBMIX(model = "REBMVNORM",
Dataset = a.Dataset(simulated),
Preprocessing = "h",
cmax = 4,
Criterion = "BIC")
# Preprocess simulated dataset.
Dataset <- data.frame(c(-7, 1), NA, c(3, 7))
f <- dfmix(simulatedest, Dataset = Dataset, pos = 3, variables = c(1, 3))
f
# Plot finite mixture.
opar <- plot(simulatedest, pos = 3, nrow = 3, ncol = 1,
contour.drawlabels = TRUE, contour.labcex = 0.6)
par(usr = opar[[2]]$usr, mfg = c(2, 1))
points(x = f[, 1], y = f[, 2])
text(x = f[, 1], y = f[, 2], labels = format(f[, 3], digits = 3), cex = 0.8, pos = 4)
Fast Histogram Calculation
Description
Returns an object of class Histogram
. The method can be called recursively.
This way more than one dataset can be binned into one histogram. Set shrink
to TRUE
only when the method is called for the last time to optimize the size of the object.
The method is memory consuming.
Usage
## S4 method for signature 'Histogram'
fhistogram(x = NULL, Dataset = data.frame(),
K = numeric(), ymin = numeric(), ymax = numeric(),
shrink = FALSE, ...)
## ... and for other signatures
Arguments
x |
an object of class |
Dataset |
a data frame of size |
K |
an integer or a vector of length |
ymin |
a vector of length |
ymax |
a vector of length |
shrink |
logical. If |
... |
currently not used. |
Methods
signature(x = "Histogram")
an object of class
Histogram
.
Author(s)
Marko Nagode
Examples
# Create three datasets.
set.seed(1)
n <- 15
Dataset1 <- as.data.frame(cbind(rnorm(n, 157, 8), rnorm(n, 71, 10)))
Dataset2 <- as.data.frame(cbind(rnorm(n, 244, 14), rnorm(n, 61, 29)))
Dataset3 <- as.data.frame(cbind(rnorm(n, 198, 8), rnorm(n, 252, 13)))
apply(Dataset1, 2, range)
apply(Dataset2, 2, range)
apply(Dataset3, 2, range)
# Bin the first dataset.
hist <- fhistogram(Dataset = Dataset1, K = c(4, 5), ymin = c(100.0, 0.0), ymax = c(300.0, 300.0))
# Bin the second dataset.
hist <- fhistogram(x = hist, Dataset = Dataset2)
# Bin the third dataset and shrink the hist object.
hist <- fhistogram(x = hist, Dataset = Dataset3, shrink = TRUE)
hist
Galaxy Dataset
Description
The unfilled survey of the Corona Borealis region contains the velocities of 82 galaxies from 6 well separated conic sections of space.
Usage
data(galaxy)
Format
galaxy
is a data frame with 82 cases (rows) and 1 continuous variable (columns) called Velocity
.
Source
K. Roeder. Density estimation with confidence sets exemplified by superclusters and voids in the galaxies. Journal of American Statistical Association, 85(411):617-624, 1990. https://www.jstor.org/stable/2289993.
References
S. Richardson and P. J. Green. On bayesian analysis of mixtures with an unknown number
of components. Journal of the Royal Statistical Society B, 59(4):731-792, 1997. https://www.jstor.org/stable/2985194.
G. McLachlan and D. Peel. Contribution to the discussion of paper by s. richardson
and p.j. green. Journal of the Royal Statistical Society B, 59(4):779-780, 1997. https://www.jstor.org/stable/2985194.
M. Stephens. Bayesian analysis of mixture models with an unknown number of components -
an alternative to reversible jump methods. The Annals of Statistics, 28(1):40-74, 2000. https://www.jstor.org/stable/2673981.
Iris Data Set
Description
This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic in the field and is referenced frequently to this day. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.
Usage
data(iris)
Format
iris
is a data frame with 150 cases (rows) and 5 variables (columns) named:
-
Sepal.Length
continuous. -
Sepal.Width
continuous. -
Petal.Length
continuous. -
Petal.Width
continuous. -
Class
discreteiris-setosa
,iris-versicolour
oriris-virginica
.
Source
A. Asuncion and D. J. Newman. Uci machine learning repository, 2007. http://archive.ics.uci.edu/ml/.
References
R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2):179-188, 1936.
Examples
## Not run:
devAskNewPage(ask = TRUE)
data(iris)
# Show level attributes.
levels(iris[["Class"]])
# Split dataset into train (75
set.seed(5)
Iris <- split(p = 0.6, Dataset = iris, class = 5)
# Estimate number of components, component weights and component
# parameters for train subsets.
n <- range(a.ntrain(Iris))
irisest <- REBMIX(model = "REBMVNORM",
Dataset = a.train(Iris),
Preprocessing = "histogram",
cmax = 10,
Criterion = "ICL-BIC",
EMcontrol = new("EM.Control", strategy = "single"))
plot(irisest, pos = 1, nrow = 3, ncol = 2, what = c("pdf"))
plot(irisest, pos = 2, nrow = 3, ncol = 2, what = c("pdf"))
plot(irisest, pos = 3, nrow = 3, ncol = 2, what = c("pdf"))
# Selected chunks.
iriscla <- RCLSMIX(model = "RCLSMVNORM",
x = list(irisest),
Dataset = a.test(Iris),
Zt = a.Zt(Iris))
iriscla
summary(iriscla)
# Plot selected chunks.
plot(iriscla, nrow = 3, ncol = 2)
## End(Not run)
Sequence of Bins or Nearest Neighbours Generation
Description
Returns (invisibly) a vector containing numbers of bins v
for the histogram and the kernel density estimation or numbers of nearest
neighbours k
for the k-nearest neighbour.
Usage
kseq(from = NULL, to = NULL, f = 0.05, ...)
Arguments
from |
starting value of the sequence. The default value is |
to |
end value of the sequence. The default value is |
f |
number specifying the fraction by which the bins or nearest neighbours should be separated |
... |
currently not used. |
Author(s)
Marko Nagode
Examples
# Generate numbers of bins.
n <- 10000
Sturges <- as.integer(1 + log2(n)) # Minimum v follows Sturges rule.
Log10 <- as.integer(10 * log10(n)) # Maximum v follows Log10 rule.
RootN <- as.integer(2 * n^0.5) # Maximum v follows RootN rule.
K <- kseq(from = Sturges, to = Log10, f = 0.05)
K
K <- kseq(from = Sturges, to = RootN, f = 0.03)
K
Label Image Moments
Description
Returns the list with the data frame Mij
containing the cluster levels l
, the numbers of pixels n
and the cluster moments
\bm{M} = (M_{\mathrm{10}}, M_{\mathrm{01}}, M_{\mathrm{11}})^{\top}
for 2D images or the data frame Mijk
containing the cluster levels l
, the numbers of voxels n
and the cluster moments \bm{M} = (M_{\mathrm{100}}, M_{\mathrm{010}}, M_{\mathrm{001}}, M_{\mathrm{111}})^{\top}
for 3D images and the adjacency matrix A
of size c_{\mathrm{max}} \times c_{\mathrm{max}}
. It may have some NA
rows and columns. To calculate the adjacency matrix A(i,j) = \exp{(-\|\bm{M}_{i} - \bm{M}_{j}\|^2 / 2 \sigma^2)}
, the raw cluster moments are first converted into z-scores.
Usage
## S4 method for signature 'array'
labelmoments(Zp = array(), cmax = integer(), Sigma = 1.0, ...)
## ... and for other signatures
Arguments
Zp |
a 2D array of size |
cmax |
maximum number of clusters |
Sigma |
scale parameter |
... |
currently not used. |
Methods
signature(Zp = "array")
an array.
Author(s)
Marko Nagode, Branislav Panic
References
A. Ng, M. Jordan and Y. Weiss. On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems 14 (NIPS 2001).
Examples
Zp <- matrix(rep(0, 100), nrow = 10, ncol = 10)
Zp[2, 2:4] <- 1;
Zp[2:4, 5] <- 2;
Zp[8, 7:10] <- 3;
Zp[9, 6] <- 4; Zp[10, 5] <- 4
Zp[10, 1:4] <- 5
Zp[6:9, 1] <- 6
labelmoments <- labelmoments(Zp, cmax = 6, Sigma = 1.0)
set.seed(12)
mergelabels <- mergelabels(list(labelmoments$A), w = 1.0, k = 2, nstart = 3)
Zp
mergelabels
Log Likelihood
Description
Returns the log likelihood at pos
.
Usage
## S4 method for signature 'REBMIX'
logL(x = NULL, pos = 1, ...)
## ... and for other signatures
Arguments
x |
see Methods section below. |
pos |
a desired row number in |
... |
currently not used. |
Methods
signature(x = "REBMIX")
an object of class
REBMIX
.signature(x = "REBMVNORM")
an object of class
REBMVNORM
.
Author(s)
Marko Nagode
References
G. McLachlan and D. Peel. Finite Mixture Models. John Wiley & Sons, New York, 2000.
Map Clusters
Description
Returns a factor of predictive cluster membership for dataset.
Usage
## S4 method for signature 'RCLRMIX'
mapclusters(x = NULL, Dataset = data.frame(),
s = expression(c), ...)
## ... and for other signatures
Arguments
x |
see Methods section below. |
Dataset |
a data frame of size |
s |
a desired number of clusters to be created. The default value is |
... |
currently not used. |
Methods
signature(x = "RCLRMIX")
an object of class
RCLRMIX
.signature(x = "RCLRMVNORM")
an object of class
RCLRMVNORM
.
Author(s)
Marko Nagode, Branislav Panic
Examples
devAskNewPage(ask = TRUE)
# Generate normal dataset.
n <- c(50, 20, 40)
Theta <- new("RNGMVNORM.Theta", c = 3, d = 2)
a.theta1(Theta, 1) <- c(3, 10)
a.theta1(Theta, 2) <- c(8, 6)
a.theta1(Theta, 3) <- c(12, 11)
a.theta2(Theta, 1) <- c(3, 0.3, 0.3, 2)
a.theta2(Theta, 2) <- c(5.7, -2.3, -2.3, 3.5)
a.theta2(Theta, 3) <- c(2, 1, 1, 2)
normal <- RNGMIX(model = "RNGMVNORM", Dataset.name = paste("normal_", 1:10, sep = ""),
n = n, Theta = a.Theta(Theta))
# Convert all datasets to single histogram.
hist <- NULL
n <- length(normal@Dataset)
hist <- fhistogram(Dataset = normal@Dataset[[1]], K = c(10, 10),
ymin = a.ymin(normal), ymax = a.ymax(normal))
for (i in 2:n) {
hist <- fhistogram(x = hist, Dataset = normal@Dataset[[i]], shrink = i == n)
}
# Estimate number of components, component weights and component parameters.
normalest <- REBMIX(model = "REBMVNORM",
Dataset = list(hist),
Preprocessing = "histogram",
cmax = 6,
Criterion = "BIC")
summary(normalest)
# Plot finite mixture.
plot(normalest)
# Cluster dataset.
normalclu <- RCLRMIX(model = "RCLRMVNORM", x = normalest)
# Plot clusters.
plot(normalclu)
summary(normalclu)
# Map clusters.
Zp <- mapclusters(x = normalclu, Dataset = a.Dataset(normal, 4))
Zt <- a.Zt(normal)
Zp
Zt
Merge Labels Based on Probability Adjacency Matrix
Description
Returns the list with the normalised adjacency matrix L
of size c \times c
. The normalised adjacency matrix
L = D^{-1/2} P D^{-1/2}
depends on the probability adjacency matrix P(i,j) = \sum_{l = 1}^{n} p_{l} A_{l}(i,j)
, where p_{l} = w_{l} / \sum_{i = 1}^{c}\sum_{j = i + 1}^{c} A_{l}(i,j)
and the degree matrix D(i,i) = \sum_{j = 1}^{c} P(i,j)
. The A_{l}
matrices may contain some NA
rows and columns, which are eliminated by the method.
The list also contains the vector of integers cluster
of length k
, which indicates the cluster to which each label is assigned.
Usage
## S4 method for signature 'list'
mergelabels(A = list(), w = numeric(), k = 2, ...)
## ... and for other signatures
Arguments
A |
a list of length |
w |
vector of length |
k |
number of clusters |
... |
further arguments to |
Methods
signature(A = "list")
a list.
Author(s)
Marko Nagode, Branislav Panic
References
A. Ng, M. Jordan and Y. Weiss. On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems 14 (NIPS 2001).
Examples
Zp <- array(0, dim = c(10, 10, 2))
Zp[ , ,1][10, 1:4] <- 1
Zp[ , ,1][1:4, 10] <- 2
Zp[ , ,2][9, 1:5] <- 3
Zp[ , ,2][1:6, 9] <- 4
labelmoments <- labelmoments(Zp, cmax = 4, Sigma = 1.0)
labelmoments
set.seed(3)
mergelabels <- mergelabels(list(labelmoments$A), w = 1.0, k = 2, nstart = 5)
Zp
mergelabels
Optimal Numbers of Bins Calculation
Description
Returns the matrix of size n_{\mathrm{D}} \times d
containing optimal numbers of bins v_{1}, \ldots, v_{d}
for all processed datasets.
Usage
## S4 method for signature 'list'
optbins(Dataset = list(), Rule = "Knuth equal",
ymin = numeric(), ymax = numeric(), kmin = numeric(),
kmax = numeric(), ...)
## ... and for other signatures
Arguments
Dataset |
a list of length |
Rule |
a character giving the histogram binning rule. One of |
ymin |
a vector of length |
ymax |
a vector of length |
kmin |
lower limit of the number of bins. The default value is |
kmax |
upper limit of the number of bins. The default value is |
... |
currently not used. |
Methods
signature(x = "list")
a list of data frames.
Author(s)
Branislav Panic, Marko Nagode
References
K. K. Knuth. Optimal data-based binning for histograms and histogram-based probability density models.
Digital Signal Processing, 95:102581, 2019.
doi:10.1016/j.dsp.2019.102581.
B. Panic, J. Klemenc, M. Nagode. Improved initialization of the EM algorithm for mixture model parameter estimation.
Mathematics, 8(3):373, 2020.
doi:10.3390/math8030373.
Examples
# Generate multivariate normal datasets.
n <- c(750, 1000)
Theta <- new("RNGMVNORM.Theta", c = 2, d = 2)
a.theta1(Theta, 1) <- c(8, 6)
a.theta1(Theta, 2) <- c(6, 8)
a.theta2(Theta, 1) <- c(8, 2, 2, 4)
a.theta2(Theta, 2) <- c(2, 1, 1, 4)
sim2d <- RNGMIX(model = "RNGMVNORM",
Dataset.name = paste("sim2d_", 1:5, sep = ""),
rseed = -1,
n = n,
Theta = a.Theta(Theta))
# Calculate optimal numbers of bins.
opt.k <- optbins(Dataset = sim2d@Dataset,
Rule = "Knuth equal",
ymin = sim2d@ymin,
ymax = sim2d@ymax,
kmin = 2,
kmax = 20)
opt.k
# Create object of class EM.Control.
EM <- new("EM.Control", strategy = "exhaustive", variant = "EM",
acceleration = "fixed", acceleration.multiplier = 1.0, tolerance = 1.0E-4,
maximum.iterations = 1000)
# Estimate number of components, component weights and component parameters.
sim2dest <- REBMIX(model = "REBMVNORM",
Dataset = a.Dataset(sim2d),
Preprocessing = "h",
cmax = 10,
ymin = a.ymin(sim2d),
ymax = a.ymax(sim2d),
K = opt.k,
Criterion = "BIC",
EMcontrol = EM)
# Plot finite mixture.
plot(sim2dest, pos = 3, nrow = 4, what = c("pdf", "marginal pdf", "IC"))
# Estimate number of components, component weights and component
# parameters for well known Iris dataset.
Dataset <- list(iris[, c(1:4)])
# Calculate optimal numbers of bins using non-equal number of bins in each dimension.
opt.k <- optbins(Dataset = Dataset,
Rule = "Knuth unequal",
kmin = 2,
kmax = 20)
opt.k
# Estimate number of components, component weights and component parameters.
irisest <- REBMIX(model = "REBMVNORM",
Dataset = Dataset,
Preprocessing = "h",
cmax = 10,
K = opt.k,
Criterion = "BIC",
EMcontrol = EM)
irisest
Empirical Distribution Function Calculation
Description
Returns the data frame containing observations \bm{x}_{1}, \ldots, \bm{x}_{n}
and empirical
distribution functions F_{1}, \ldots, F_{n}
. Vectors \bm{x}
are subvectors of
\bm{y} = (y_{1}, \ldots, y_{d})^{\top}
.
Usage
## S4 method for signature 'REBMIX'
pemix(x = NULL, pos = 1, variables = expression(1:d),
lower.tail = TRUE, log.p = FALSE, ...)
## ... and for other signatures
Arguments
x |
see Methods section below. |
pos |
a desired row number in |
variables |
a vector containing indices of variables in subvectors |
lower.tail |
logical. If |
log.p |
logical. if |
... |
currently not used. |
Methods
signature(x = "REBMIX")
an object of class
REBMIX
.signature(x = "REBMVNORM")
an object of class
REBMVNORM
.
Author(s)
Marko Nagode
References
M. Nagode and M. Fajdiga. The rebmix algorithm for the univariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(5):876-892, 2011a. doi:10.1080/03610920903480890.
M. Nagode and M. Fajdiga. The rebmix algorithm for the multivariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(11):2022-2034, 2011b. doi:10.1080/03610921003725788.
M. Nagode. Finite mixture modeling via REBMIX.
Journal of Algorithms and Optimization, 3(2):14-28, 2015. https://repozitorij.uni-lj.si/Dokument.php?id=127674&lang=eng.
Examples
# Generate simulated dataset.
n <- c(15, 15)
Theta <- new("RNGMIX.Theta", c = 2, pdf = rep("normal", 3))
a.theta1(Theta, 1) <- c(10, 20, 30)
a.theta1(Theta, 2) <- c(3, 4, 5)
a.theta2(Theta, 1) <- c(3, 2, 1)
a.theta2(Theta, 2) <- c(15, 10, 5)
simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:4, sep = ""),
rseed = -1,
n = n,
Theta = a.Theta(Theta))
# Create object of class EM.Control.
EM <- new("EM.Control", strategy = "exhaustive", variant = "ECM",
acceleration = "fixed", acceleration.multiplier = 1.0, tolerance = 1.0E-4,
maximum.iterations = 1000)
# Estimate number of components, component weights and component parameters.
simulatedest <- REBMIX(Dataset = a.Dataset(simulated),
Preprocessing = "kernel density estimation",
cmax = 4,
pdf = c("n", "n", "n"),
EMcontrol = EM)
# Preprocess simulated dataset.
f <- pemix(simulatedest, pos = 3, variables = c(1))
f
Predictive Marginal Distribution Function Calculation
Description
Returns the data frame containing observations \bm{x}_{1}, \ldots, \bm{x}_{n}
and
predictive marginal distribution functions F(\bm{x} | c, \bm{w}, \bm{\Theta})
. Vectors \bm{x}
are subvectors of
\bm{y} = (y_{1}, \ldots, y_{d})^{\top}
. If \bm{x} = \bm{y}
the method returns the data frame containing observations \bm{y}_{1}, \ldots, \bm{y}_{n}
and
the corresponding predictive mixture distribution function F(\bm{y} | c, \bm{w}, \bm{\Theta})
.
Usage
## S4 method for signature 'REBMIX'
pfmix(x = NULL, Dataset = data.frame(), pos = 1,
variables = expression(1:d), lower.tail = TRUE, log.p = FALSE, ...)
## ... and for other signatures
Arguments
x |
see Methods section below. |
Dataset |
a data frame containing observations |
pos |
a desired row number in |
variables |
a vector containing indices of variables in subvectors |
lower.tail |
logical. If |
log.p |
logical. if |
... |
currently not used. |
Methods
signature(x = "REBMIX")
an object of class
REBMIX
.signature(x = "REBMVNORM")
an object of class
REBMVNORM
.
Author(s)
Marko Nagode
References
M. Nagode and M. Fajdiga. The rebmix algorithm for the univariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(5):876-892, 2011a. doi:10.1080/03610920903480890.
M. Nagode and M. Fajdiga. The rebmix algorithm for the multivariate finite mixture estimation.
Communications in Statistics - Theory and Methods, 40(11):2022-2034, 2011b. doi:10.1080/03610921003725788.
M. Nagode. Finite mixture modeling via REBMIX.
Journal of Algorithms and Optimization, 3(2):14-28, 2015. https://repozitorij.uni-lj.si/Dokument.php?id=127674&lang=eng.
Examples
# Generate simulated dataset.
n <- c(15, 15)
Theta <- new("RNGMIX.Theta", c = 2, pdf = rep("normal", 3))
a.theta1(Theta, 1) <- c(10, 20, 30)
a.theta1(Theta, 2) <- c(3, 4, 5)
a.theta2(Theta, 1) <- c(3, 2, 1)
a.theta2(Theta, 2) <- c(15, 10, 5)
simulated <- RNGMIX(Dataset.name = paste("simulated_", 1:4, sep = ""),
rseed = -1,
n = n,
Theta = a.Theta(Theta))
# Number of classes or nearest neighbours to be processed.
K <- c(as.integer(1 + log2(sum(n))), # Minimum v follows Sturges rule.
as.integer(10 * log10(sum(n)))) # Maximum v follows log10 rule.
# Estimate number of components, component weights and component parameters.
simulatedest <- REBMIX(Dataset = a.Dataset(simulated),
Preprocessing = "h",
cmax = 4,
Criterion = "BIC",
pdf = c("n", "n", "n"))
# Preprocess simulated dataset.
Dataset <- data.frame(c(25, 5, -20), NA, c(31, 20, 20))
f <- pfmix(simulatedest, Dataset = Dataset, pos = 3, variables = c(1, 3))
f
# Plot finite mixture.
opar <- plot(simulatedest, pos = 3, nrow = 3, ncol = 1,
what = "pdf", contour.drawlabels = TRUE, contour.labcex = 0.6)
par(usr = opar[[2]]$usr, mfg = c(2, 1))
points(x = f[, 1], y = f[, 2])
text(x = f[, 1], y = f[, 2], labels = format(f[, 3], digits = 3), cex = 0.8, pos = 4)
Plots RNGMIX, REBMIX, RCLRMIX and RCLSMIX Output
Description
Plots true clusters if x
equals "RNGMIX"
. Plots the REBMIX output
depending on what
argument if x
equals "REBMIX"
.
Plots predictive clusters if x
equals "RCLRMIX"
.
Wrongly clustered observations are plotted only if x@Zt
is available.
Plots predictive classes and wrongly classified observations if x
equals "RCLSMIX"
.
Usage
## S4 method for signature 'RNGMIX,missing'
plot(x, y, pos = 1, nrow = 1, ncol = 1, cex = 0.8,
fg = "black", lty = "solid", lwd = 1, pty = "m", tcl = 0.5,
plot.cex = 0.8, plot.pch = 19, ...)
## S4 method for signature 'REBMIX,missing'
plot(x, y, pos = 1, what = c("pdf"),
nrow = 1, ncol = 1, npts = 200, n = 200, cex = 0.8, fg = "black",
lty = "solid", lwd = 1, pty = "m", tcl = 0.5,
plot.cex = 0.8, plot.pch = 19, contour.drawlabels = FALSE,
contour.labcex = 0.8, contour.method = "flattest",
contour.nlevels = 12, log = "", ...)
## S4 method for signature 'RCLRMIX,missing'
plot(x, y, s = expression(c), nrow = 1, ncol = 1, cex = 0.8,
fg = "black", lty = "solid", lwd = 1, pty = "m", tcl = 0.5,
plot.cex = 0.8, plot.pch = 19, ...)
## S4 method for signature 'RCLSMIX,missing'
plot(x, y, nrow = 1, ncol = 1, cex = 0.8,
fg = "black", lty = "solid", lwd = 1, pty = "m", tcl = 0.5,
plot.cex = 0.8, plot.pch = 19, ...)
## ... and for other signatures
Arguments
x |
see Methods section below. |
y |
currently not used. |
pos |
a desired row number in |
s |
a desired number of clusters to be plotted. The default value is |
what |
a character vector giving the plot types. One of |
nrow |
a desired number of rows in which the empirical and predictive densities are to be plotted. The default value is |
ncol |
a desired number of columns in which the empirical and predictive densities are to be plotted. The default value is |
npts |
a number of points at which the predictive densities are to be plotted. The default value is |
n |
a number of observations to be plotted. The default value is |
cex |
a numerical value giving the amount by which the plotting text and symbols should be magnified
relative to the default, see also |
fg |
a colour used for things like axes and boxes around plots, see also |
lty |
a line type, see also |
lwd |
a line width, see also |
pty |
a character specifying the type of the plot region to be used. One of |
tcl |
a length of tick marks as a fraction of the height of a line of the text, see also |
plot.cex |
a numerical vector giving the amount by which plotting characters and symbols should be
scaled relative to the default. It works as a multiple of |
plot.pch |
a vector of plotting characters or symbols, see also |
contour.drawlabels |
logical. The contours are labelled if |
contour.labcex |
|
contour.method |
a character specifying where the labels will be located. The possible values
are |
contour.nlevels |
a number of desired contour levels. The default value is |
log |
a character which contains |
... |
further arguments to |
Value
Returns (invisibly) a list containing graphical parameters par
. Such a list can be passed as an argument to par
to restore the parameter values.
Methods
signature(x = "RNGMIX", y = "missing")
an object of class
RNGMIX
.signature(x = "RNGMVNORM", y = "missing")
an object of class
RNGMVNORM
.signature(x = "REBMIX", y = "missing")
an object of class
REBMIX
.signature(x = "REBMVNORM", y = "missing")
an object of class
REBMVNORM
.signature(x = "RCLRMIX", y = "missing")
an object of class
RCLRMIX
.signature(x = "RCLRMVNORM", y = "missing")
an object of class
RCLRMVNORM
.signature(x = "RCLSMIX", y = "missing")
an object of class
RCLSMIX
.signature(x = "RCLSMVNORM", y = "missing")
an object of class
RCLSMVNORM
.
Author(s)
Marko Nagode
References
C. M. Bishop. Neural Networks for Pattern Recognition. Clarendon Press, Oxford, 1995.
Examples
## Not run:
devAskNewPage(ask = TRUE)
data(wine)
colnames(wine)
# Remove Cultivar column from wine dataset.
winecolnames <- !(colnames(wine)
wine <- wine[, winecolnames]
# Determine number of dimensions d and wine dataset size n.
d <- ncol(wine)
n <- nrow(wine)
wineest <- REBMIX(model = "REBMVNORM",
Dataset = list(wine = wine),
Preprocessing = "kernel density estimation",
Criterion = "ICL-BIC",
EMcontrol = new("EM.Control", strategy = "best"))
# Plot finite mixture.
plot(wineest, what = c("pdf", "IC", "logL", "D"),
nrow = 2, ncol = 2, pty = "s")
## End(Not run)
Internal rebmix Functions, Methods and Classes
Description
Internal rebmix functions, methods and classes.
Details
These are not to be called by the user.
Sensorless Drive Faults Detection Data
Description
These data are the results of a sensorless drive diagnosis procedure. Features are extracted from the electric current drive signals. The drive has intact and defective components. This results in 11 different classes with different conditions. Each condition has been measured several times by 12 different operating conditions, this means by different speeds, load moments and load forces. The current signals are measured with a current probe and an oscilloscope on two phases. The original dataset contains 49 features, however, here only 3 are used, that is, features 5, 7 and 11. First class (1) are the healthy drives and the rest are the drives with fault components.
Usage
data(sensorlessdrive)
Format
sensorlessdrive
is a data frame with 58509 cases (rows) and 4 variables (columns) named:
-
V5
continuous. -
V7
continuous. -
V11
continuous. -
Class
discrete1
,2
,3
,4
,5
,6
,7
,8
,9
,10
or11
.
Source
A. Asuncion and D. J. Newman. Uci machine learning repository, 2007. http://archive.ics.uci.edu/ml/.
References
F. Paschke1, C. Bayer, M. Bator, U. Moenks, A. Dicks, O. Enge-Rosenblatt and V. Lohweg. Sensorlose Zustandsueberwachung an Synchronmotoren.
23. Workshop Computational Intelligence VDI/VDE-Gesellschaft Mess- und Automatisierungstechnik (GMA), 2013.
M. Bator, A. Dicks, U. Moenks and V. Lohweg. Feature extraction and reduction applied to sensorless drive diagnosis.
22. Workshop Computational Intelligence VDI/VDE-Gesellschaft Mess- und Automatisierungstechnik (GMA), 2012. doi:10.13140/2.1.2421.5689.
Examples
## Not run:
data(sensorlessdrive)
# Split dataset into train (75
set.seed(3)
Drive <- split(p = 0.75, Dataset = sensorlessdrive, class = 4)
# Estimate number of components, component weights and component
# parameters for train subsets.
driveest <- REBMIX(model = "REBMVNORM",
Dataset = a.train(Drive),
Preprocessing = "histogram",
cmax = 15,
Criterion = "BIC")
# Classification.
drivecla <- RCLSMIX(model = "RCLSMVNORM",
x = list(driveest),
Dataset = a.test(Drive),
Zt = a.Zt(Drive))
drivecla
summary(drivecla)
## End(Not run)
Splits Dataset into Train and Test Datasets
Description
Returns (invisibly) the object containing train and test observations \bm{y}_{1}, \ldots, \bm{y}_{n}
as well as true class membership \bm{\Omega}_{g}
for the test dataset.
Usage
## S4 method for signature 'numeric'
split(p = 0.75, Dataset = data.frame(), class = numeric(), ...)
## S4 method for signature 'list'
split(p = list(), Dataset = data.frame(), class = numeric(), ...)
## ... and for other signatures
Arguments
p |
see Methods section below. |
Dataset |
a data frame containing dataset |
class |
a column number in |
... |
further arguments to |
Value
Returns an object of class RCLS.chunk
.
Methods
signature(p = "numeric")
a number specifying the fraction of observations for training
0.0 \leq p \leq 1.0
. The default value is0.75
.signature(p = "list")
a list composed of column number
p$type
inDataset
containing the type membership information followed by the corresponding trainp$train
and testp$test
values. The default value islist()
.
Author(s)
Marko Nagode
Examples
## Not run:
data(iris)
# Split dataset into train (75
set.seed(5)
Iris <- split(p = 0.75, Dataset = iris, class = 5)
Iris
# Generate simulated dataset.
N <- 1000
class <- c(rep("A", 0.4 * N), rep("B", 0.2 * N),
rep("C", 0.1 * N), rep("D", 0.05 * N), rep("E", 0.25 * N))
type <- c(rep("train", 0.75 * N), rep("test", 0.25 * N))
n <- 300
Dataset <- data.frame(1:n, sample(class, n))
colnames(Dataset) <- c("y", "class")
# Split dataset into train (60
simulated <- split(p = 0.6, Dataset = Dataset, class = 2)
simulated
# Generate simulated dataset.
Dataset <- data.frame(1:n, sample(class, n), sample(type, n))
colnames(Dataset) <- c("y", "class", "type")
# Split dataset into train and test subsets.
simulated <- split(p = list(type = 3, train = "train",
test = "test"), Dataset = Dataset, class = 2)
simulated
## End(Not run)
Steel Plates Faults Recognition Data
Description
These data are the results of an extraction process from images of faults of steel plates. There are seven different faults: Pastry (1), Z_Scratch (2), K_Scratch (3), Stains (4), Dirtiness (5), Bumps (6), Other faults (7).
Usage
data(steelplates)
Format
steelplates
is a data frame with 1941 cases (rows) and 28 variables (columns) named:
-
X_Minimum
integer. -
X_Maximum
integer. -
Y_Minimum
integer. -
Y_Maximum
integer. -
Pixels_Areas
integer. -
X_Perimeter
integer. -
Y_Perimeter
integer. -
Sum_of_Luminosity
integer. -
Minimum_of_Luminosity
integer. -
Maximum_of_Luminosity
integer. -
Length_of_Conveyer
integer. -
TypeOfSteel_A300
binary. -
TypeOfSteel_A400
binary. -
Steel_Plate_Thickness
integer. -
Edges_Index
continuous. -
Empty_Index
continuous. -
Square_Index
continuous. -
Outside_X_Index
continuous. -
Edges_X_Index
continuous. -
Edges_Y_Index
continuous. -
Outside_Global_Index
continuous. -
LogOfAreas
continuous. -
Log_X_Index
continuous. -
Log_Y_Index
continuous. -
Orientation_Index
continuous. -
Luminosity_Index
continuous. -
SigmoidOfAreas
continuous. -
Class
discrete1
,2
,3
,4
,5
,6
or7
.
Source
A. Asuncion and D. J. Newman. Uci machine learning repository, 2007. http://archive.ics.uci.edu/ml/.
References
M. Buscema, S. Terzi, W. Tastle. A new meta-classifier. Annual Conference of the North American Fuzzy Information Processing Society - NAFIPS, 2010. doi:10.1109/NAFIPS.2010.5548298.
M. Buscema. MetaNet*: The theory of independent judges. Substance Use & Misuse. 33(2):439-461, 1998. doi:10.3109/10826089809115875.
Examples
## Not run:
data(steelplates)
# Split dataset into train (75
set.seed(3)
Steelplates <- split(p = 0.75, Dataset = steelplates, class = 28)
# Estimate number of components, component weights and component
# parameters for train subsets.
steelplatesest <- REBMIX(model = "REBMVNORM",
Dataset = a.train(Steelplates),
Preprocessing = "histogram",
cmax = 15,
Criterion = "BIC")
# Classification.
steelplatescla <- RCLSMIX(model = "RCLSMVNORM",
x = list(steelplatesest),
Dataset = a.test(Steelplates),
Zt = a.Zt(Steelplates))
steelplatescla
summary(steelplatescla)
## End(Not run)
Truck Dataset
Description
The dataset contains amplitudes and means measured on a truck wheels.
Usage
data(truck)
Format
truck
is a data frame with 31665 rows and 2 variables (columns) named:
-
Amplitude
continuous. -
Mean
continuous.
Author(s)
Mitja Franko
Examples
data(truck)
Weibull Dataset 8.1
Description
The complete data are the failure times in weeks.
Usage
data(weibull)
Format
weibull
is a data frame with 50 cases (rows) and 1 variables (columns) named:
-
Failure.Time
continuous.
References
D. N. P. Murthy, M. Xie and R. Jiang. Weibull Models. John Wiley & Sons, New York, 2003.
Examples
data(weibull)
Weibull-normal Simulated Dataset
Description
The dataset contains amplitudes and means simulated from a three component Weibull-normal mixture.
Usage
data(weibullnormal)
Format
weibullnormal
is a data frame with 10000 rows and 2 variables (columns) named:
-
Amplitude
continuous. -
Mean
continuous.
Author(s)
Mitja Franko
Examples
data(weibullnormal)
Wine Recognition Data
Description
These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars (1-3). The analysis determined the quantities of 13 constituents: alcohol, malic acid, ash, alcalinity of ash, magnesium, total phenols, flavanoids, nonflavanoid phenols, proanthocyanins, colour intensity, hue, OD280/OD315 of diluted wines, and proline found in each of the three types of the wines. The number of instances in classes 1 to 3 is 59, 71 and 48, respectively.
Usage
data(wine)
Format
wine
is a data frame with 178 cases (rows) and 14 variables (columns) named:
-
Alcohol
continuous. -
Malic.Acid
continuous. -
Ash
continuous. -
Alcalinity.of.Ash
continuous. -
Magnesium
continuous. -
Total.Phenols
continuous. -
Flavanoids
continuous. -
Nonflavanoid.Phenols
continuous. -
Proanthocyanins
continuous. -
Color.Intensity
continuous. -
Hue
continuous. -
OD280.OD315.of.Diluted.Wines
continuous. -
Proline
continuous. -
Cultivar
discrete1
,2
or3
.
Source
A. Asuncion and D. J. Newman. Uci machine learning repository, 2007. http://archive.ics.uci.edu/ml/.
References
S. J. Roberts, R. Everson and I. Rezek. Maximum certainty data partitioning. Pattern Recognition, 33(5):833-839, 2000. doi:10.1016/S0031-3203(99)00086-2.
Examples
## Not run:
devAskNewPage(ask = TRUE)
data(wine)
# Show level attributes.
levels(factor(wine[["Cultivar"]]))
# Split dataset into train (75
set.seed(3)
Wine <- split(p = 0.75, Dataset = wine, class = 14)
# Estimate number of components, component weights and component
# parameters for train subsets.
n <- range(a.ntrain(Wine))
K <- c(as.integer(1 + log2(n[1])), # Minimum v follows Sturges rule.
as.integer(10 * log10(n[2]))) # Maximum v follows log10 rule.
K <- c(floor(K[1]^(1/13)), ceiling(K[2]^(1/13)))
wineest <- REBMIX(model = "REBMVNORM",
Dataset = a.train(Wine),
Preprocessing = "kernel density estimation",
cmax = 10,
Criterion = "ICL-BIC",
pdf = rep("normal", 13),
K = K[1]:K[2],
Restraints = "loose",
Mode = "outliersplus")
plot(wineest, pos = 1, nrow = 7, ncol = 6, what = c("pdf"))
plot(wineest, pos = 2, nrow = 7, ncol = 6, what = c("pdf"))
plot(wineest, pos = 3, nrow = 7, ncol = 6, what = c("pdf"))
# Selected chunks.
winecla <- RCLSMIX(model = "RCLSMVNORM",
x = list(wineest),
Dataset = a.test(Wine),
Zt = a.Zt(Wine))
winecla
summary(winecla)
# Plot selected chunks.
plot(winecla, nrow = 7, ncol = 6)
## End(Not run)