Displaying 1 result from an estimated 1 matches for "toktxt".
Did you mean:
tmptxt
2011 Jan 19
0
Analyzing texts with tm
Hey everybody!
I have to use R's tm package to do some text analysis, first thing would be to create a term frequency matrix.
Digging in tm's source code it seems like it uses some logic like this to create term frequencies:
data("crude")
(txt <- Content(crude[[1]]))
(tokTxt <- unlist(strsplit(gsub("[^[:alnum:]]+", " ", txt), " ", fixed = TRUE)))
table(factor(tokTxt, levels = c('two')))
table(factor(tokTxt, levels = c('two days')))
Like this code example demostrates the tokenization of the input text makes it impossible...