bosclassif {ordinalClust} | R Documentation |
This function performs a classification on a dataset with features of the ordinal kind, and a label variable of the integer type (1,2,...,kr). The classification function proposes two classification models. The first one, (chosen by the argument kc=0), is a multivariate BOS model assuming that, conditionally on the class of the observations, the feature are independent. The second model is a parsimonious version of the first model. Parcimony is introduced by grouping the features into clusters (as in co-clustering) and assuming that the features of a cluster have a common distribution. If the data contains ordinal features with D different numbers of levels, the data is going to be seen as D matrices of ordinal data.
bosclassif(x, y, idx_list=c(1), kr, kc=0, init, nbSEM, nbSEMburn, nbindmini, m=0, percentRandomB=0)
x |
Matrix made of ordinal data, of dimension N*Jtot. The features with same numbers of levels must be placed side by side. The missing values should be coded as NA. |
y |
Vector of length N. It should represent the classes corresponding to each row of x. Must be labelled with integers (1,2,...,kr). |
idx_list |
Vector of length D. This argument is useful when variables have different numbers of levels. Element d should indicate where the variables with number of levels m[d] begins in matrix x. |
kr |
Number of row classes. |
kc |
Vector of length D. d^th element indicates the number of column clusters. Set to 0 to choose a classical multivariate BOS model. |
m |
Vector of length D. d^th element defines the ordinal data's number of levels. |
nbSEM |
Number of SEM-Gibbs iterations realized to estimate parameters. |
nbSEMburn |
Number of SEM-Gibbs burning iterations for estimating parameters. This parameter must be inferior to nbSEM. |
nbindmini |
Minimum number of cells belonging to a block. |
init |
String that indicates the kind of initialisation. Must be one of th following words : "kmeans", "random" or "randomBurnin". |
percentRandomB |
Vector of length 1. Indicates the percentage of resampling when init is equal to "randomBurnin". |
Return an object. The slots are:
@zr |
Vector of length N with resulting row partitions. |
@zc |
List of length D. d^th item is a vector of length J[d] representing the columns partitions for the group of variables d. |
@J |
Vector of length D. d^th item represents the number of columns for d^th group of variables. |
@W |
List of length D. Item d is a matrix of dimension J*kc[d] such that W[j,h]=1 if j belongs to cluster h. |
@V |
Matrix of dimension N*kr such that V[i,g]=1 if i belongs to cluster g. |
@icl |
ICL value for co-clustering. |
@kr |
Number of row classes. |
@name |
Name of the result. |
@number_distrib |
Number of groups of variables. |
@pi |
Vector of length kr. Row mixing proportions. |
@rho |
List of length D. d^th item represents the column mixing proportion for d^th group of variables. |
@dlist |
List of length d. d^th item represents the indexes of group of variables d. |
@kc |
Vector of length D. d^th element represents the number of clusters column H for d^th group of variables. |
@m |
Vector of length D. d^th element represents the number of levels of d^th group of variables. |
@nbSEM |
Number of SEM-Gibbs algorithm iteration. |
@params |
List of length D. d^th item represents the blocks paramaters for group of variables d. |
@xhat |
List of length D. d^th item represents the d^th group of variables dataset, with missing values completed. |
Margot Selosse, Julien Jacques, Christophe Biernacki.
# loading the real dataset data("dataqol.classif") set.seed(5) # loading the ordinal data M <- as.matrix(dataqol.classif[,2:29]) # creating the classes values y <- as.vector(dataqol.classif$death) # sampling datasets for training and to predict nb.sample <- ceiling(nrow(M)*2/3) sample.train <- sample(1:nrow(M), nb.sample, replace=FALSE) M.train <- M[sample.train,] M.validation <- M[-sample.train,] nb.missing.validation <- length(which(M.validation==0)) m <- c(4) M.validation[which(M.validation==0)] <- sample(1:m, nb.missing.validation,replace=TRUE) y.train <- y[sample.train] y.validation <- y[-sample.train] # configuration for SEM algorithm nbSEM=50 nbSEMburn=40 nbindmini=1 init="kmeans" # number of classes to predict kr <- 2 # different kc to test with cross-validation kcol <- 1 res <- bosclassif(x=M.train,y=y.train,kr=kr,kc=kcol,m=m, nbSEM=nbSEM,nbSEMburn=nbSEMburn, nbindmini=nbindmini,init=init) predictions <- predict(res, M.validation)