crf_cbind_attributes {crfsuite} | R Documentation |
The CRF attributes which are implemented in this function
are merely the neighbouring information of a certain field.
For example the previous word, the next word, the combination of the previous 2 words.
This function cbind
s these neighbouring attributes as columns to the provided data.frame.
By default it adds the following columns to the data.frame
the term itself (term[t])
the next term (term[t+1])
the term after that (term[t+2])
the previous term (term[t-1])
the term before the previous term (term[t-2])
as well as all combinations of these terms (bigrams/trigrams/...) where up to ngram_max
number of terms are combined.
See the examples.
crf_cbind_attributes(data, terms, by, from = -2, to = 2, ngram_max = 3, sep = "-")
data |
a data.frame which will be coerced to a data.table (cbinding will be done by reference on the existing data.frame) |
terms |
a character vector of column names which are part of |
by |
a character vector of column names which are part of |
from |
integer, by default set to -2, indicating to look up to 2 terms before the current term |
to |
integer, by default set to 2, indicating to look up to 2 terms after the current term |
ngram_max |
integer indicating the maximum number of terms to combine (2 means bigrams, 3 trigrams, ...) |
sep |
character indicating how to combine the previous/next/current terms. Defaults to '-'. |
x <- data.frame(doc_id = sort(sample.int(n = 10, size = 1000, replace = TRUE))) x$pos <- sample(c("Art", "N", "Prep", "V", "Adv", "Adj", "Conj", "Punc", "Num", "Pron", "Int", "Misc"), size = nrow(x), replace = TRUE) x <- crf_cbind_attributes(x, terms = "pos", by = "doc_id", from = -1, to = 1, ngram_max = 3) head(x) ## Not run: ## Example on some real data x <- ner_download_modeldata("conll2002-nl") x <- crf_cbind_attributes(x, terms = c("token", "pos"), by = c("doc_id", "sentence_id"), ngram_max = 3, sep = "|") ## End(Not run)