search for: stemdocu

Displaying 14 results from an estimated 14 matches for "stemdocu".

Did you mean: stemdict
2012 Apr 13
4
Help with stemDocument
Hi, All: I am new to R and tm package. I'm trying to do the stemming using tm_map() and it doesn't seem to work: *I used:* > stemDocument(t_cmts[[100]]) *Where t_cmts is the corpus object, the results is:* bottle loose box abt airpak sections top plastic bottle squashed nearly flush neck previous shipments bottle wrapped securely bubble wrap wno bottle damage packaging poor surprisingly bottle leaking remove contents bottle reu...
2013 Sep 03
1
tm::stemDocument function not work
https://gist.github.com/rpietro/6430771 stemDocument function doesn't seem to be working. Tried to look up and a few people have reported the problem, but no solution that I could find. would appreciate any help
2011 Nov 04
1
Help: stemming and stem completion with package tm in R
Hi All I came across a problem below when doing stemming and stem completion with package tm in R. Word "mining" was stemmed to "mine" with stemDocument(), and then completed to "miners"with stemCompletion(). However, I prefer to keep "mining" intact. For stemCompletion(), the default type of completion is "prevalent", which takes the most frequent match as completion. Although "mining" is much more freq...
2012 Jan 13
4
Troubles with stemming (tm + Snowball packages) under MacOS
...have tried several versions) I have installed all the needed packages (tm, rJava, rWeka, Snowball) + dependencies. I have desactivated AWT (like written in http://r.789695.n4.nabble.com/Problem-with-Snowball-amp-RWeka-td3402126.html) with : Sys.setenv(NOAWT=TRUE) The command tm_map(reuters, stemDocument) gives the following errors : - First time: Error in .jnew(name) : java.lang.InternalError: Can't start the AWT because Java was started on the first thread. Make sure StartOnFirstThread is not specified in your application's Info.plist or on the command line Refreshing GOE pro...
2009 Nov 12
2
package "tm" fails to remove "the" with remove stopwords
...ot;to fetch a pail of water") text.corp <- Corpus(VectorSource(myDocument)) ######################### text.corp <- tm_map(text.corp, stripWhitespace) text.corp <- tm_map(text.corp, removeNumbers) text.corp <- tm_map(text.corp, removePunctuation) ## text.corp <- tm_map(text.corp, stemDocument) text.corp <- tm_map(text.corp, removeWords, c("the", stopwords("english"))) dtm <- DocumentTermMatrix(text.corp) dtm dtm.mat <- as.matrix(dtm) dtm.mat > dtm.mat Terms Docs falls fetch hill jack jill mainly pail plain rain ran spain the water 1 0...
2011 Jun 04
1
Problem with Snowball & RWeka
...ll library(Snowball) source <- readLines(system.file("words", "porter","voc.txt",package = "Snowball")) result <- SnowballStemmer(source) #2) Using package tm library(tm) data("crude") stemDocument(crude[[1]]) In both instances I got a Java error "Could not initialize the GenericPropertiesCreator. This exception was produced: java.lang.NullPointerException". After receiving this error once in the session, no further error messages are generated. However, SnowballStemmer() and s...
2012 Jan 27
2
tm package: handling contractions
...on address using the tm package to process the text sotu <- scan(file="c:/R/data/sotu2012.txt", what="character") sotu <- tolower(sotu) corp <-Corpus(VectorSource(paste(sotu, collapse=" "))) corp <- tm_map(corp, removePunctuation) corp <- tm_map(corp, stemDocument) corp <- tm_map(corp, function(x)removeWords(x,stopwords())) tdm <- TermDocumentMatrix(corp) m <- as.matrix(tdm) v <- sort(rowSums(m),decreasing=TRUE) d <- data.frame(word = names(v),freq=v) wordcloud(d$word,d$freq) I ended up with a large number of contractions that were split...
2011 Mar 24
2
Problem with Snowball & RWeka
Dear Forum, when I try to use SnowballStemmer() I get the following error message: "Could not initialize the GenericPropertiesCreator. This exception was produced: java.lang.NullPointerException" It seems to have something to do with either Snowball or RWeka, however I can't figure out, what to do myself. If you could spend 5 minutes of your valuable time, to help me or give me a
2012 Dec 13
2
Tamaño de la matriz de términos y memoria. Paquete TM
...urces/Stopwords.es.txt",encoding="UTF-8") sw = iconv(sw, to="ASCII//TRANSLIT") # remueve palabras vacías genericas corpus <- tm_map(corpus, removeWords, stopwords("spanish")) # stemming corpus <- tm_map(corpus, stemDocument, language = "spanish") # crea matriz de terminos #a) términos como filas y documentos como columnas dtm <- DocumentTermMatrix(corpus) inspect(dtm[1000:1005,1000:1005]) # Términos con frecuencia mínima igu...
2014 Jun 17
2
No es un problema de tm tienes doc.corpus vacío
...t(corpus[1:7])corpus <- tm_map(corpus, > tolower)corpus <- tm_map(corpus, removePunctuation)corpus <- tm_map(corpus, > removeNumbers)corpus <- tm_map(corpus, removeWords, > stopwords("english"))inspect(doc.corpus[1:2])library(SnowballC)corpus <- > tm_map(corpus, stemDocument)corpus <- tm_map(corpus, > stripWhitespace)inspect(doc.corpus[1:8])TDM <- > TermDocumentMatrix(corpus)TDM* > > por adelantado, muchas gracias!!! > > ruben! > ------------ próxima parte ------------ > Se ha borrado un adjunto en formato HTML... > URL: <https:/...
2011 Sep 05
0
Stemming functions only work on the last word of plain text documents
Hello, I want to use the SnowballStemmer on a collection of plain text documents. However, when I apply it to my corpus using the tm_map function it only stems the last word of each document (The problem is the for wordStem and stemDocument does not work at all).  An example: > path <- c("c:\path\to\directory")       # collection of plain text documents > corp <- Corpus(DirSource(path), readerControl = list(reader = readPlain, language = "en_US" , load = T)) > inspect(corp) A corpus with 2 tex...
2012 Feb 26
2
tm_map help
...pus <- tm_map(myCorpus, removePunctuation) myCorpus <- tm_map(myCorpus, removeNumbers) myStopwords <- c(stopwords('english'), "available", "via") myCorpus <- tm_map(myCorpus, removeWords, myStopwords) dictCorpus <- myCorpus myCorpus <- tm_map(myCorpus, stemDocument) ################ERROR HAPPENS ON NEXT LINE################################## myCorpus <- tm_map(myCorpus, stemCompletion, dictionary=dictCorpus) myDtm <- TermDocumentMatrix(myCorpus, control = list(minWordLength = 1)) m <- as.matrix(myDtm) v <- sort(rowSums(m), decreasing=TRUE) m...
2014 Jun 18
2
No es un problema de tm tienes doc.corpus vacío
...tolower)corpus <- tm_map(corpus, > >> removePunctuation)corpus <- tm_map(corpus, removeNumbers)corpus <- > >> tm_map(corpus, removeWords, > >> > stopwords("english"))inspect(doc.corpus[1:2])library(SnowballC)corpus > >> <- tm_map(corpus, stemDocument)corpus <- tm_map(corpus, > >> stripWhitespace)inspect(doc.corpus[1:8])TDM <- > >> TermDocumentMatrix(corpus)TDM* > >> > >> por adelantado, muchas gracias!!! > >> > >> ruben! > >> ------------ próxima parte ------------ Se ha...
2014 Jun 18
3
No es un problema de tm tienes doc.corpus vacío
...; >> removePunctuation)corpus <- tm_map(corpus, removeNumbers)corpus <- > > > >> tm_map(corpus, removeWords, > > > >> > > > stopwords("english"))inspect(doc.corpus[1:2])library(SnowballC)corpus > > > >> <- tm_map(corpus, stemDocument)corpus <- tm_map(corpus, > > > >> stripWhitespace)inspect(doc.corpus[1:8])TDM <- > > > >> TermDocumentMatrix(corpus)TDM* > > > >> > > > >> por adelantado, muchas gracias!!! > > > >> > > > >> ruben! &...