txt_feature {crfsuite} | R Documentation |
Extract basic text features which are useful for entity recognition
txt_feature(x, type = c("is_capitalised", "is_url", "is_email", "is_number", "prefix", "suffix"), n = 4)
x |
a character vector |
type |
a character string, which can be one of 'is_capitalised', 'is_url', 'is_email', 'is_number', 'prefix', 'suffix' |
n |
for type 'prefix' or 'suffix', the number of characters of the prefix/suffix |
For type 'is_capitalised', 'is_url', 'is_email', 'is_number': a logical vector of the same length as x
, indicating if x
is capitalised, a url, an email or a number
For type 'prefix', 'suffix': a character vector of the same length as x
, containing the prefix or suffix n
number of characters of x
txt_feature("Red Devils", type = "is_capitalised") txt_feature("red devils", type = "is_capitalised") txt_feature("http://www.bnosac.be", type = "is_url") txt_feature("info@google.com", type = "is_email") txt_feature("hi there", type = "is_email") txt_feature("1230000", type = "is_number") txt_feature("123.15", type = "is_number") txt_feature("123,15", type = "is_number") txt_feature("123abc", type = "is_number") txt_feature("abcdefghijklmnopqrstuvwxyz", type = "prefix", n = 3) txt_feature("abcdefghijklmnopqrstuvwxyz", type = "suffix", n = 3)