search for: toktxt

Displaying 1 result from an estimated 1 matches for "toktxt".

Did you mean: tmptxt
2011 Jan 19
0
Analyzing texts with tm
Hey everybody! I have to use R's tm package to do some text analysis, first thing would be to create a term frequency matrix. Digging in tm's source code it seems like it uses some logic like this to create term frequencies: data("crude") (txt <- Content(crude[[1]])) (tokTxt <- unlist(strsplit(gsub("[^[:alnum:]]+", " ", txt), " ", fixed = TRUE))) table(factor(tokTxt, levels = c('two'))) table(factor(tokTxt, levels = c('two days'))) Like this code example demostrates the tokenization of the input text makes it impossible...