predict.crf {crfsuite} | R Documentation |
Predict the label sequence based on the Conditional Random Field
## S3 method for class 'crf' predict(object, newdata, group, type = c("marginal", "sequence"), trace = FALSE, ...)
object |
an object of class crf as returned by |
newdata |
a character matrix of data containing attributes about the label sequence |
group |
an integer or character vector of the same length as nrow |
type |
either 'marginal' or 'sequence' to get predictions at the level of |
trace |
a logical indicating to show the trace of the labelling output. Defaults to |
... |
not used |
If type
is 'marginal': a data.frame with columns label and marginal containing the viterbi decoded predicted label and marginal probability.
If type
is 'sequence': a data.frame with columns group and probability containing for each sequence group the probability of the sequence.
## Not run: library(udpipe) data(airbnb_chunks, package = "crfsuite") udmodel <- udpipe_download_model("dutch-lassysmall") udmodel <- udpipe_load_model(udmodel$file_model) airbnb_tokens <- unique(airbnb_chunks[, c("doc_id", "text")]) airbnb_tokens <- udpipe_annotate(udmodel, x = airbnb_tokens$text, doc_id = airbnb_tokens$doc_id) airbnb_tokens <- as.data.frame(airbnb_tokens) x <- merge(airbnb_chunks, airbnb_tokens) x <- crf_cbind_attributes(x, terms = c("upos", "lemma"), by = "doc_id") model <- crf(y = x$chunk_entity, x = x[, grep("upos|lemma", colnames(x))], group = x$doc_id, method = "lbfgs", options = list(max_iterations = 5)) scores <- predict(model, newdata = x[, grep("upos|lemma", colnames(x))], group = x$doc_id, type = "marginal") head(scores) scores <- predict(model, newdata = x[, grep("upos|lemma", colnames(x))], group = x$doc_id, type = "sequence") head(scores) ## cleanup for CRAN file.remove(model$file_model) file.remove("modeldetails.txt") file.remove(udmodel$file) ## End(Not run)