REBMIX-class {rebmix} | R Documentation |
"REBMIX"
Object of class REBMIX
.
Objects can be created by calls of the form new("REBMIX", ...)
. Accessor methods for the slots are a.Dataset(x = NULL, pos = 0)
,
a.Preprocessing(x = NULL)
, a.cmax(x = NULL)
, a.cmin(x = NULL)
, a.Criterion(x = NULL)
, a.Variables(x = NULL)
,
a.pdf(x = NULL)
, a.theta1(x = NULL)
, a.theta2(x = NULL)
, a.K(x = NULL)
, a.y0(x = NULL)
, a.ymin(x = NULL)
,
a.ymax(x = NULL)
,
a.ar(x = NULL)
, a.Restraints(x = NULL)
, a.w(x = NULL, pos = 0)
, a.Theta(x = NULL, pos = 0)
, a.summary(x = NULL, pos = 0, col.name = character())
, a.pos(x = NULL)
,
a.opt.c(x = NULL)
, a.opt.IC(x = NULL)
, a.opt.logL(x = NULL)
, a.opt.D(x = NULL)
, a.all.K(x = NULL)
, a.all.IC(x = NULL)
,
a.theta1.all(x = NULL, pos = 1)
and a.theta2.all(x = NULL, pos = 1)
, where x
, pos
and col.name
stand for an object of class REBMIX
,
a desired slot item and a desired column name, respectively.
Dataset
:a list of data frames of size n \times d containing d-dimensional datasets. Each of the d columns represents one random variable. Numbers of observations n equal the number of rows in the datasets.
Preprocessing
:a character vector giving the preprocessing types. One of "histogram"
,
"kernel density estimation"
or "k-nearest neighbour"
.
cmax
:maximum number of components c_{\mathrm{max}} > 0. The default value is 15
.
cmin
:minimum number of components c_{\mathrm{min}} > 0. The default value is 1
.
Criterion
:a character vector giving the information criterion types. One of default Akaike "AIC"
, "AIC3"
, "AIC4"
or "AICc"
,
Bayesian "BIC"
, consistent Akaike "CAIC"
, Hannan-Quinn "HQC"
, minimum description length "MDL2"
or "MDL5"
,
approximate weight of evidence "AWE"
, classification likelihood "CLC"
,
integrated classification likelihood "ICL"
or "ICL-BIC"
, partition coefficient "PC"
,
total of positive relative deviations "D"
or sum of squares error "SSE"
.
Variables
:a character vector of length d containing types of variables. One of "continuous"
or "discrete"
.
pdf
:a character vector of length d containing continuous or discrete parametric family types. One of "normal"
, "lognormal"
, "Weibull"
, "gamma"
, "binomial"
, "Poisson"
, "Dirac"
or "vonMises"
.
theta1
:a vector of length d containing initial component parameters. One of n_{il} = \textrm{number of categories} - 1 for "binomial"
distribution or "NA"
otherwise.
theta2
:a vector of length d containing initial component parameters. Currently not used.
K
:a vector or a list of vectors containing numbers of bins v for the histogram and the kernel density estimation or numbers of nearest
neighbours k for the k-nearest neighbour. There is no genuine rule to identify v or k. Consequently,
the REBMIX algorithm identifies them from the set K
of input values by
minimizing the information criterion. The Sturges rule v = 1 + \mathrm{log_{2}}(n), \mathrm{Log}_{10} rule v = 10 \mathrm{log_{10}}(n) or RootN
rule v = 2 √{n} can be applied to estimate the limiting numbers of bins
or the rule of thumb k = √{n} to guess the intermediate number of nearest neighbours. If, e.g., K = c(10, 20, 40, 60)
and minimum IC
coincides, e.g., 40
, brackets are set to 20
and 60
and the golden section is applied to refine the minimum search. See also kseq
for sequence of bins or nearest neighbours generation.
y0
:a vector of length d containing origins. The default value is numeric()
.
ymin
:a vector of length d containing minimum observations. The default value is numeric()
.
ymax
:a vector of length d containing maximum observations. The default value is numeric()
.
ar
:acceleration rate 0 < a_{\mathrm{r}} ≤q 1. The default value is 0.1
and in most cases does not have to be altered.
Restraints
:a character giving the restraints type. One of "rigid"
or default "loose"
.
The rigid restraints are obsolete and applicable for well separated components only.
w
:a list of vectors of length c containing component weights w_{l} summing to 1.
Theta
:a list of lists each containing c parametric family types pdfl
. One of "normal"
, "lognormal"
, "Weibull"
, "gamma"
, "binomial"
, "Poisson"
, "Dirac"
or circular "vonMises"
defined for 0 ≤q y_{i} ≤q 2 π.
Component parameters theta1.l
follow the parametric family types. One of μ_{il} for normal, lognormal and von Mises distributions and θ_{il} for Weibull, gamma, binomial, Poisson and Dirac distributions.
Component parameters theta2.l
follow theta1.l
. One of σ_{il} for normal and lognormal distributions, β_{il} for Weibull and gamma distributions, p_{il} for binomial distribution and κ_{il} for von Mises distribution.
summary
:a data frame with additional information about dataset, preprocessing, c_{\mathrm{max}}, c_{\mathrm{min}}, information criterion type, a_{\mathrm{r}}, restraints type, optimal c, optimal v or k, K, y_{i0}, y_{i\mathrm{min}}, y_{i\mathrm{max}}, optimal h_{i}, information criterion \mathrm{IC}, log likelihood \mathrm{log}\, L and degrees of freedom M.
pos
:position in the summary
data frame at which log likelihood \mathrm{log}\, L attains its maximum.
opt.c
:a list of vectors containing numbers of components for optimal v for the histogram and the kernel density estimation or for optimal number of nearest neighbours k for the k-nearest neighbour.
opt.IC
:a list of vectors containing information criteria for optimal v for the histogram and the kernel density estimation or for optimal number of nearest neighbours k for the k-nearest neighbour.
opt.logL
:a list of vectors containing log likelihoods for optimal v for the histogram and the kernel density estimation or for optimal number of nearest neighbours k for the k-nearest neighbour.
opt.D
:a list of vectors containing totals of positive relative deviations for optimal v for the histogram and the kernel density estimation or for optimal number of nearest neighbours k for the k-nearest neighbour.
all.K
:a list of vectors containing all processed numbers of bins v for the histogram and the kernel density estimation or all processed numbers of nearest neighbours k for the k-nearest neighbour.
all.IC
:a list of vectors containing information criteria for all processed numbers of bins v for the histogram and the kernel density estimation or for all processed numbers of nearest neighbours k for the k-nearest neighbour.
Marko Nagode