bpe_decode {tokenizers.bpe} | R Documentation |
Decode a sequence of Byte Pair Encoding ids into text again
bpe_decode(model, x, ...)
model |
an object of class |
x |
an integer vector of BPE id's |
... |
further arguments passed on to youtokentome_encode_as_ids |
data(belgium_parliament, package = "tokenizers.bpe") x <- subset(belgium_parliament, language == "french") model <- bpe(x$text, coverage = 0.999, vocab_size = 5000, threads = 1) model str(model$vocabulary) text <- c("L'appartement est grand & vraiment bien situe en plein centre", "Proportion de femmes dans les situations de famille monoparentale.") bpe_encode(model, x = text, type = "subwords") bpe_encode(model, x = text, type = "ids") encoded <- bpe_encode(model, x = text, type = "ids") decoded <- bpe_decode(model, encoded) decoded ## Remove the model file (Clean up for CRAN) file.remove(model$model_path)