text_intersect {textTinyR} | R Documentation |
intersection of words or letters in tokenized text
# utl <- text_intersect$new(token_list1 = NULL, token_list2 = NULL)
token_list1 |
a list, where each sublist is a tokenized text sequence (token_list1 should be of same length with token_list2) |
token_list2 |
a list, where each sublist is a tokenized text sequence (token_list2 should be of same length with token_list1) |
distinct |
either TRUE or FALSE. If TRUE then the intersection of distinct words (or letters) will be taken into account |
letters |
either TRUE or FALSE. If TRUE then the intersection of letters in the text sequences will be computed |
An object of class R6ClassGenerator
of length 24.
This class includes methods for text or character intersection. If both distinct and letters are FALSE then the simple (count or ratio) word intersection will be computed.
a numeric vector
text_intersect$new(file_data = NULL)
--------------
count_intersect(distinct = FALSE, letters = FALSE)
--------------
ratio_intersect(distinct = FALSE, letters = FALSE)
https://www.kaggle.com/c/home-depot-product-search-relevance/discussion/20427 by Igor Buinyi
library(textTinyR) tok1 = list(c('compare', 'this', 'text'), c('and', 'this', 'text')) tok2 = list(c('with', 'another', 'set'), c('of', 'text', 'documents')) init = text_intersect$new(tok1, tok2) init$count_intersect(distinct = TRUE, letters = FALSE) init$ratio_intersect(distinct = FALSE, letters = TRUE)