bdiscrim {bclust} | R Documentation |
This function provides a discrimination method equivalent to the bclust
clustering function.
bdiscrim(training, training.id, training.labels = NULL, predict, predict.label = rownames(predict)[1], effect.family = "gaussian", var.select = TRUE, transformed.par, priorprob = rep(1, max(training.id) + 1))
training |
A numeric matrix of the training set with observations in rows and variables in columns. |
training.id |
A vector consisting of positive integer elements having the
same length as the number of rows of |
training.labels |
A vector of strings
referring to the labels of each class. The length of the
vector should match to |
predict |
A single discriminating type to be classified to one of the classes, with replications of the discriminating type in rows and variables in columns. The number of variables should be the same as in |
predict.label |
A single string, the label of the discriminating type. |
effect.family |
Distribution family of the disappearing random components of the model. The choices are "gaussian" or "alaplace" allowing Gaussian or asymmetric Laplace family, respectively. |
var.select |
A logical value, |
transformed.par |
The transformed model parameters in a vector. The length of the vector depends on the chosen model and the availability of variable selection. The log transformation is supposed to be applied for the variance parameters, the identity for the mean, and the logit for the proportions. The function loglikelihood can be used to estimate them from the data. |
priorprob |
The prior probabilities for each class in a vector. The length of the vector should be the number of classes |
The function calls internal C
functions depending on the chosen
model. The C-stack of the system may overflow if you have a large
dataset. You may need to adjust the stack before running R
using your
operational system command line.
We assumed a Bayesian linear model for classification being
y_{vctr}=m+h_{vct}+d_{v}*g_{vc}*t_{vc}+e_{vctr}
where y_{vctr} is the available data on variable v,
cluster c, type t, and replicate r; h_{vct}
is the between-type error, t_{vc} is the disappearing random component controlled by the Bernoulli variables d_{v} with success probability q and g_{vc} with
success probability p; and e_{vctr} is the between-replicate error. The function computes the posterior probability that the predict
data share the same θ_{vc}.
This function also consideres that the predict
data may arise its own class. For more details see Vahid Partovi Nia and Anthony C. Davison (2012)
probs |
The posterior probabilities in a matrix with one row. |
var |
The variable importances, each being a log Bayes factor. |
varclass |
The variable-class importances, each being a log Bayes factor. |
Vahid Partovi Nia and Anthony C. Davison (2012). High-Dimensional Bayesian Clustering with Variable Selection: The R Package bclust. Journal of Statistical Software, 47(5), 1-22. URL http://www.jstatsoft.org/v47/i05/
loglikelihood, bclust, profileplot.
data(gaelle) gaelle.id<-rep(1:13,rep(4,13)) # all mutants have 4 replicates gaelle.lab<-c("d172","d263","isa2", "sex4","dpe2","mex1","sex3","pgm","sex1","WsWT", "tpt","RLDWT","ke103") gaelle.bdiscrim<-bdiscrim(gaelle[-(1:3),], training.id=gaelle.id,training.labels=gaelle.lab, transformed.par=c(-1.84,-0.99,1.63,0.08,-0.16,-1.68), predict=gaelle[1:3,], predict.label="ColWT") # classify ColWT to one of the types par(mfrow=c(1,1)) #retreive graphic defaults par(mar = c(5, 4, 4, 2) + 0.1) # leave some space for labels ColWT.prob<-as.vector(gaelle.bdiscrim$probs) # plots discrimination probabilities bp <- barplot(ColWT.prob,ylim=c(0,1)) #plot bars title("ColWT Discrimination") text(bp, par("usr")[3]-0.05, srt = 90,adj=1, labels = colnames(gaelle.bdiscrim$probs), xpd = TRUE) #plot variable labels mtext(1, text = "Mutant", line = 4,cex=1.5) # add x axis label mtext(2, text = "Probability", line = 3,cex=1.2) # add y axis labels abline(h=1/length(ColWT.prob)) # draw plot a as a reference line prior probabilities line # plots sorted discrimination probabilities par(mfrow=c(1,1)) #retreive graphic defaults par(mar = c(5, 4, 4, 2) + 0.1) # leave some space for labels bp <- barplot(sort(ColWT.prob,decreasing=TRUE),ylim=c(0,1), col=heat.colors(length(ColWT.prob))) #plot bars text(bp, par("usr")[3]-0.05, srt = 90,adj=1, labels = colnames(gaelle.bdiscrim$probs) [order(ColWT.prob,decreasing=TRUE)], xpd = TRUE) #plot variable labels mtext(1, text = "Mutant", line = 4,cex=1.5)# add x axis label mtext(2, text = "Probability", line = 3,cex=1.2)# add y axis labels abline(h=1/length(ColWT.prob)) varclassimp<-gaelle.bdiscrim$varclass #use thresholds to define blob colors blob<-matrix(0,nrow(varclassimp),ncol(varclassimp)) blob[varclassimp<=0]<-0 blob[varclassimp>0]<-1 blob[varclassimp>1]<-2 blob[varclassimp>3]<-3 blob[varclassimp>5]<-4 #log bayes factor thresholding varimp<-gaelle.bdiscrim$var varcol<-rep(0,ncol(gaelle)) varcol[varimp>0]<-1 var.order<-order(gaelle.bdiscrim$var,decreasing=TRUE) profileplot(x=gaelle[-(1:3),var.order],rep.id=gaelle.id, labels=gaelle.lab,scale=10,blob.matrix=blob,ylab.mar=5, xlab.mar=7) #plot var class importance on profile plot using blobs # and sort variables according to variable importance values viplot(varimp=varimp, xlab=colnames(gaelle)[var.order], xlab.mar=10,sort=TRUE,col=varcol[var.order]) #plot sorted variable importances