warwick maddock
2015-Jul-19 08:07 UTC
[R] searching for key phrases in collection of text files using tm
Hi R-Help! I am a newbie in R and computer science in general. I have done the basic readings of introduction to R and TM packages. I am using R Foundation on a windows 7 system. I have been given a project which requires me to search annual reports of 76 companies for multiple key phrases such as "finance program" or "improving working capital". The goal is to see how many times each key phrase appears in each annual report. The following script is what I have accomplished thus far: #load tm package library(tm) #set working directory of text files of annual reportssetwd('C:/Users/a446578/Desktop/Annual Reports Text Files') dest<-("C:/Users/a446578/Desktop/Annual Reports Text Files") #create corpus of 76 annual reports text files a<-Corpus(DirSource("C:/Users/a446578/Desktop/Annual Reports Text Files"), readerControl = list(language="lat") #cleaning corpus a<-tm_map(a, removeNumbers)a<-tm_map(a, removePunctuation)a<-tm_map(a, content_transformer(tolower))a<-tm_map(a, removeWords, stopwords("english")) #create the term document matrix dtm<-DocumentTermMatrix(a) #searching for key phrases tm_term_score(dtm, c("finance program", "improving working capital", "reduce days", "increase trade receivables")) Everything runs smoothly apart from the last step (#searching for key phrases). I understand that the tm_term_score function is only used for single key words and not phrases. How can I achieve the same result the tm_term_score function gives me, but with phrases instead of words? I have posted an almost identical question on another forum but was not able to comprehend the response. I trust you guys at R-help can give me a good solution able to be understood by someone as weak as I am at R. Thanks a lot guys!Warwivck [[alternative HTML version deleted]]