Christian Timmermann
2011-Sep-05 08:04 UTC
[R] Stemming functions only work on the last word of plain text documents
Hello, I want to use the SnowballStemmer on a collection of plain text documents. However, when I apply it to my corpus using the tm_map function it only stems the last word of each document (The problem is the for wordStem and stemDocument does not work at all). An example:> path <- c("c:\path\to\directory") # collection of plain text documents > corp <- Corpus(DirSource(path), readerControl = list(reader = readPlain, language = "en_US" , load = T))> inspect(corp)A corpus with 2 text documents The metadata consists of 2 tag-value pairs and a data frame Available tags are: create_date creator Available variables in the data frame are: MetaID $`1.txt` running runs runners $`2.txt` happyness happies> corp2<-tm_map(corp, SnowballStemmer) > inspect(corp2)A corpus with 2 text documents The metadata consists of 2 tag-value pairs and a data frame Available tags are: create_date creator Available variables in the data frame are: MetaID $`1.txt` [1] running runs runn $`2.txt` [1] happyness happi How can I get the stemming function to work? [[alternative HTML version deleted]]