Nikhil Goyal
2015-Mar-18 20:31 UTC
[R] Using lapply on term document matrix to calculate word frequency
Given three TermDocumentMatrix, text1, text2 and text3, I'd like to calculate word frequency for each of them into a data frame and rbind all the data frames. Three are sample - I have hundreds in reality so I need to functionalize this. It's easy to calculate word freq for one TDM: apply(x, 1, sum) or rowSums(as.matrix(x)) I want to make a list of TDMs: tdm_list <- Filter(function(x) is(x, "TermDocumentMatrix"), mget(ls())) and calculate word freq for each and put it in a data frame: data.frame(lapply(tdm_list, sum)) # this is wrong. it simply sums frequency of all words instead of frequency by each word. and then rbind it all: do.call(rbind, df_list) I can't figure out how to use lapply on a TDM to calculate word frequency. Adding sample Data to play around with : require(tm) text1 <- c("apple" , "love", "crazy", "peaches", "cool", "coke", "batman", "joker") text2 <- c("omg", "#rstats" , "crazy", "cool", "bananas", "functions", "apple") text3 <- c("Playing", "rstats", "football", "data", "coke", "caffeine", "peaches", "cool") tdm1 <- TermDocumentMatrix(Corpus(VectorSource(text1))) tdm2 <- TermDocumentMatrix(Corpus(VectorSource(text2))) tdm3 <- TermDocumentMatrix(Corpus(VectorSource(text3))) thanks. [[alternative HTML version deleted]]