Nikhil Goyal
2015-Mar-18 20:31 UTC
[R] Using lapply on term document matrix to calculate word frequency
Given three TermDocumentMatrix, text1, text2 and text3, I'd like to
calculate word frequency for each of them into a data frame and rbind all
the data frames. Three are sample - I have hundreds in reality so I need to
functionalize this.
It's easy to calculate word freq for one TDM:
apply(x, 1, sum)
or
rowSums(as.matrix(x))
I want to make a list of TDMs:
tdm_list <- Filter(function(x) is(x, "TermDocumentMatrix"),
mget(ls()))
and calculate word freq for each and put it in a data frame:
data.frame(lapply(tdm_list, sum)) # this is wrong. it simply sums
frequency of all words instead of frequency by each word.
and then rbind it all:
do.call(rbind, df_list)
I can't figure out how to use lapply on a TDM to calculate word frequency.
Adding sample Data to play around with :
require(tm)
text1 <- c("apple" , "love", "crazy",
"peaches", "cool", "coke",
"batman", "joker")
text2 <- c("omg", "#rstats" , "crazy",
"cool", "bananas", "functions",
"apple")
text3 <- c("Playing", "rstats", "football",
"data", "coke", "caffeine",
"peaches", "cool")
tdm1 <- TermDocumentMatrix(Corpus(VectorSource(text1)))
tdm2 <- TermDocumentMatrix(Corpus(VectorSource(text2)))
tdm3 <- TermDocumentMatrix(Corpus(VectorSource(text3)))
thanks.
[[alternative HTML version deleted]]