bosclassif {ordinalClust}R Documentation

Function to perform a classification

Description

This function performs a classification on a dataset with features of the ordinal kind, and a label variable of the integer type (1,2,...,kr). The classification function proposes two classification models. The first one, (chosen by the argument kc=0), is a multivariate BOS model assuming that, conditionally on the class of the observations, the feature are independent. The second model is a parsimonious version of the first model. Parcimony is introduced by grouping the features into clusters (as in co-clustering) and assuming that the features of a cluster have a common distribution. If the data contains ordinal features with D different numbers of levels, the data is going to be seen as D matrices of ordinal data.

Usage

bosclassif(x, y, idx_list=c(1), kr, kc=0, init, nbSEM, nbSEMburn, 
          nbindmini, m=0, percentRandomB=0) 

Arguments

x

Matrix made of ordinal data, of dimension N*Jtot. The features with same numbers of levels must be placed side by side. The missing values should be coded as NA.

y

Vector of length N. It should represent the classes corresponding to each row of x. Must be labelled with integers (1,2,...,kr).

idx_list

Vector of length D. This argument is useful when variables have different numbers of levels. Element d should indicate where the variables with number of levels m[d] begins in matrix x.

kr

Number of row classes.

kc

Vector of length D. d^th element indicates the number of column clusters. Set to 0 to choose a classical multivariate BOS model.

m

Vector of length D. d^th element defines the ordinal data's number of levels.

nbSEM

Number of SEM-Gibbs iterations realized to estimate parameters.

nbSEMburn

Number of SEM-Gibbs burning iterations for estimating parameters. This parameter must be inferior to nbSEM.

nbindmini

Minimum number of cells belonging to a block.

init

String that indicates the kind of initialisation. Must be one of th following words : "kmeans", "random" or "randomBurnin".

percentRandomB

Vector of length 1. Indicates the percentage of resampling when init is equal to "randomBurnin".

Value

Return an object. The slots are:

@zr

Vector of length N with resulting row partitions.

@zc

List of length D. d^th item is a vector of length J[d] representing the columns partitions for the group of variables d.

@J

Vector of length D. d^th item represents the number of columns for d^th group of variables.

@W

List of length D. Item d is a matrix of dimension J*kc[d] such that W[j,h]=1 if j belongs to cluster h.

@V

Matrix of dimension N*kr such that V[i,g]=1 if i belongs to cluster g.

@icl

ICL value for co-clustering.

@kr

Number of row classes.

@name

Name of the result.

@number_distrib

Number of groups of variables.

@pi

Vector of length kr. Row mixing proportions.

@rho

List of length D. d^th item represents the column mixing proportion for d^th group of variables.

@dlist

List of length d. d^th item represents the indexes of group of variables d.

@kc

Vector of length D. d^th element represents the number of clusters column H for d^th group of variables.

@m

Vector of length D. d^th element represents the number of levels of d^th group of variables.

@nbSEM

Number of SEM-Gibbs algorithm iteration.

@params

List of length D. d^th item represents the blocks paramaters for group of variables d.

@xhat

List of length D. d^th item represents the d^th group of variables dataset, with missing values completed.

Author(s)

Margot Selosse, Julien Jacques, Christophe Biernacki.

Examples

# loading the real dataset
data("dataqol.classif")

set.seed(5)

# loading the ordinal data
M <- as.matrix(dataqol.classif[,2:29])


# creating the classes values
y <- as.vector(dataqol.classif$death)


# sampling datasets for training and to predict
nb.sample <- ceiling(nrow(M)*2/3)
sample.train <- sample(1:nrow(M), nb.sample, replace=FALSE)

M.train <- M[sample.train,]
M.validation <- M[-sample.train,]
nb.missing.validation <- length(which(M.validation==0))
m <- c(4)
M.validation[which(M.validation==0)] <- sample(1:m, nb.missing.validation,replace=TRUE)


y.train <- y[sample.train]
y.validation <- y[-sample.train]



# configuration for SEM algorithm
nbSEM=50
nbSEMburn=40
nbindmini=1
init="kmeans"

# number of classes to predict
kr <- 2
# different kc to test with cross-validation
kcol <- 1


res <- bosclassif(x=M.train,y=y.train,kr=kr,kc=kcol,m=m,
                  nbSEM=nbSEM,nbSEMburn=nbSEMburn,
                  nbindmini=nbindmini,init=init)

predictions <- predict(res, M.validation)




[Package ordinalClust version 1.3.4 Index]