Hi, All: I am new to R and tm package. I'm trying to do the stemming using tm_map() and it doesn't seem to work: *I used:*> stemDocument(t_cmts[[100]])*Where t_cmts is the corpus object, the results is:* bottle loose box abt airpak sections top plastic bottle squashed nearly flush neck previous shipments bottle wrapped securely bubble wrap wno bottle damage packaging poor surprisingly bottle leaking remove contents bottle reusable packaging cancel automatic shipments>Which doesn't seem to have any stemming done at all. *What did I do wrong*? I have rWeka, tm, rJava, Snowball installed (Use "install package" from the top menu and it didn't say it failed.) Thanks, Deborah [[alternative HTML version deleted]]
Check this slideshare.net/whitish/textmining-with-r Best, -Alex ________________________________________ From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] on behalf of Deborah H. Deng [deborah.deng at alumni.utexas.net] Sent: 13 April 2012 10:27 To: r-help at r-project.org Subject: [R] Help with stemDocument Hi, All: I am new to R and tm package. I'm trying to do the stemming using tm_map() and it doesn't seem to work: *I used:*> stemDocument(t_cmts[[100]])*Where t_cmts is the corpus object, the results is:* bottle loose box abt airpak sections top plastic bottle squashed nearly flush neck previous shipments bottle wrapped securely bubble wrap wno bottle damage packaging poor surprisingly bottle leaking remove contents bottle reusable packaging cancel automatic shipments>Which doesn't seem to have any stemming done at all. *What did I do wrong*? I have rWeka, tm, rJava, Snowball installed (Use "install package" from the top menu and it didn't say it failed.) Thanks, Deborah [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
I am having a problem with stemDocuments also. I can make it work by moving the data into a Corpus by using:> a <- Corpus(VectorSource(df$text)) # create corpus object > a <- tm_map(a, stemDocument, language = "english")but it is horrably slow. I want to stem outside the Corpus object like:>df$text <- stemDocument(df$text, language = "english")but it returns the original text. In fact, using the example in the tm package documentation does not work either:> data("crude") > crude[[1]]Diamond Shamrock Corp said that effective today it had cut its contract prices for crude oil by 1.50 dlrs a barrel. The reduction brings its posted price for West Texas Intermediate to 16.00 dlrs a barrel, the copany said. "The price reduction today was made in the light of falling oil product prices and a weak crude oil market," a company spokeswoman said. Diamond is the latest in a line of U.S. oil companies that have cut its contract, or posted, prices over the last two days citing weak oil markets. Reuter> stemDocument(crude[[1]], language = "english") # specify languageDiamond Shamrock Corp said that effective today it had cut its contract prices for crude oil by 1.50 dlrs a barrel. The reduction brings its posted price for West Texas Intermediate to 16.00 dlrs a barrel, the copany said. "The price reduction today was made in the light of falling oil product prices and a weak crude oil market," a company spokeswoman said. Diamond is the latest in a line of U.S. oil companies that have cut its contract, or posted, prices over the last two days citing weak oil markets. Reuter> stemDocument(crude[[1]]) # language not specifiedDiamond Shamrock Corp said that effective today it had cut its contract prices for crude oil by 1.50 dlrs a barrel. The reduction brings its posted price for West Texas Intermediate to 16.00 dlrs a barrel, the copany said. "The price reduction today was made in the light of falling oil product prices and a weak crude oil market," a company spokeswoman said. Diamond is the latest in a line of U.S. oil companies that have cut its contract, or posted, prices over the last two days citing weak oil markets. Reuter>-- View this message in context: http://r.789695.n4.nabble.com/Help-with-stemDocument-tp4554523p4604022.html Sent from the R help mailing list archive at Nabble.com.
Hi Triss, If you need to stem just one text in the Corupus use a[[n]]<-stemDocument Best, -Alex ________________________________________ From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] on behalf of Triss.Ashton [triss.ashton at unt.edu] Sent: 02 May 2012 21:09 To: r-help at r-project.org Subject: Re: [R] Help with stemDocument I am having a problem with stemDocuments also. I can make it work by moving the data into a Corpus by using:> a <- Corpus(VectorSource(df$text)) # create corpus object > a <- tm_map(a, stemDocument, language = "english")but it is horrably slow. I want to stem outside the Corpus object like:>df$text <- stemDocument(df$text, language = "english")but it returns the original text. In fact, using the example in the tm package documentation does not work either:> data("crude") > crude[[1]]Diamond Shamrock Corp said that effective today it had cut its contract prices for crude oil by 1.50 dlrs a barrel. The reduction brings its posted price for West Texas Intermediate to 16.00 dlrs a barrel, the copany said. "The price reduction today was made in the light of falling oil product prices and a weak crude oil market," a company spokeswoman said. Diamond is the latest in a line of U.S. oil companies that have cut its contract, or posted, prices over the last two days citing weak oil markets. Reuter> stemDocument(crude[[1]], language = "english") # specify languageDiamond Shamrock Corp said that effective today it had cut its contract prices for crude oil by 1.50 dlrs a barrel. The reduction brings its posted price for West Texas Intermediate to 16.00 dlrs a barrel, the copany said. "The price reduction today was made in the light of falling oil product prices and a weak crude oil market," a company spokeswoman said. Diamond is the latest in a line of U.S. oil companies that have cut its contract, or posted, prices over the last two days citing weak oil markets. Reuter> stemDocument(crude[[1]]) # language not specifiedDiamond Shamrock Corp said that effective today it had cut its contract prices for crude oil by 1.50 dlrs a barrel. The reduction brings its posted price for West Texas Intermediate to 16.00 dlrs a barrel, the copany said. "The price reduction today was made in the light of falling oil product prices and a weak crude oil market," a company spokeswoman said. Diamond is the latest in a line of U.S. oil companies that have cut its contract, or posted, prices over the last two days citing weak oil markets. Reuter>-- View this message in context: http://r.789695.n4.nabble.com/Help-with-stemDocument-tp4554523p4604022.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Did you ever get this to work? I am also having a problem with stemDocument and removeWords. I think it is an issue with R 2.15 or the TM package refresh because I can get everything to run under R2.10. -- View this message in context: http://r.789695.n4.nabble.com/Help-with-stemDocument-tp4554523p4625051.html Sent from the R help mailing list archive at Nabble.com.
Reasonably Related Threads
- tm::stemDocument function not work
- Help: stemming and stem completion with package tm in R
- Fax Problems with SpanDSP
- Troubles with stemming (tm + Snowball packages) under MacOS
- [LLVMdev] [Target] Custom Lowering expansion of 32-bit ISD::SHL, ISD::SHR without barrel shifter