Dear All, I want to do lemmatization using the tm package and textstem package. The following is how I am doing it currently :- library("tm") library("wordcloud") library("RColorBrewer") filePath = < Path to any text file > text <- readLines(filePath) docs <- Corpus(VectorSource(text)) # Convert the text to lower case docs <- tm_map(docs,content_transformer(tolower)) # Remove numbers docs <- tm_map(docs, removeNumbers) # Remove english common stopwords docs <- tm_map(docs, removeWords, stopwords("english")) # Remove punctuations docs <- tm_map(docs, removePunctuation) # Eliminate extra white spaces docs <- tm_map(docs, stripWhitespace) # Text Lemmatization library(textstem) docs <- tm_map(docs, content_transformer(lemmatize_words)) My query : Is the above line the correct way to do lemmatization ? Can someone please confirm? For the sake of giving a complete example I am giving the following code as well. dtm <- TermDocumentMatrix(docs) m <- as.matrix(dtm) v <- sort(rowSums(m),decreasing=TRUE) d <- data.frame(word = names(v),freq=v) head(d, 10) set.seed(1234) wordcloud(words = d$word, freq = d$freq, min.freq = 1, max.words=200, random.order=FALSE, rot.per=0.35, colors=brewer.pal(8, "Dark2")) Thank you, Ashim [[alternative HTML version deleted]]