search for: stemdocu

Displaying 11 results from an estimated 11 matches for "stemdocu".

Did you mean: stemdict
2013 Sep 03
1
tm::stemDocument function not work
https://gist.github.com/rpietro/6430771 stemDocument function doesn't seem to be working. Tried to look up and a few people have reported the problem, but no solution that I could find. would appreciate any help
2011 Nov 04
1
Help: stemming and stem completion with package tm in R
Hi All I came across a problem below when doing stemming and stem completion with package tm in R. Word "mining" was stemmed to "mine" with stemDocument(), and then completed to "miners"with stemCompletion(). However, I prefer to keep "mining" intact. For stemCompletion(), the default type of completion is "prevalent", which takes the most frequent match as completion. Although "mining" is much more freq...
2012 Jan 13
4
Troubles with stemming (tm + Snowball packages) under MacOS
...everal versions) I have installed all the needed packages (tm, rJava, rWeka, Snowball) + dependencies. I have desactivated AWT (like written in http://r.789695.n4.nabble.com/Problem-with-Snowball-amp-RWeka-td3402126.html) with : Sys.setenv(NOAWT=TRUE) The command tm_map(reuters, stemDocument) gives the following errors : - First time: Error in .jnew(name) : java.lang.InternalError: Can't start the AWT because Java was started on the first thread. Make sure StartOnFirstThread is not specified in your application's Info.plist or on the command line Refreshing GOE pro...
2009 Nov 12
2
package "tm" fails to remove "the" with remove stopwords
...ot;to fetch a pail of water") text.corp <- Corpus(VectorSource(myDocument)) ######################### text.corp <- tm_map(text.corp, stripWhitespace) text.corp <- tm_map(text.corp, removeNumbers) text.corp <- tm_map(text.corp, removePunctuation) ## text.corp <- tm_map(text.corp, stemDocument) text.corp <- tm_map(text.corp, removeWords, c("the", stopwords("english"))) dtm <- DocumentTermMatrix(text.corp) dtm dtm.mat <- as.matrix(dtm) dtm.mat > dtm.mat Terms Docs falls fetch hill jack jill mainly pail plain rain ran spain the water 1 0...
2011 Jun 04
1
Problem with Snowball & RWeka
...ll library(Snowball) source <- readLines(system.file("words", "porter","voc.txt",package = "Snowball")) result <- SnowballStemmer(source) #2) Using package tm library(tm) data("crude") stemDocument(crude[[1]]) In both instances I got a Java error "Could not initialize the GenericPropertiesCreator. This exception was produced: java.lang.NullPointerException". After receiving this error once in the session, no further error messages are generated. However, SnowballStemmer() and s...
2012 Dec 13
2
Tamaño de la matriz de términos y memoria. Paquete TM
...opwords.es.txt",encoding="UTF-8") sw = iconv(sw, to="ASCII//TRANSLIT") # remueve palabras vacías genericas corpus <- tm_map(corpus, removeWords, stopwords("spanish")) # stemming corpus <- tm_map(corpus, stemDocument, language = "spanish") # crea matriz de terminos #a) términos como filas y documentos como columnas dtm <- DocumentTermMatrix(corpus) inspect(dtm[1000:1005,1000:1005]) # Términos con frecuencia mínima igu...
2014 Jun 17
2
No es un problema de tm tienes doc.corpus vacío
...t(corpus[1:7])corpus <- tm_map(corpus, > tolower)corpus <- tm_map(corpus, removePunctuation)corpus <- tm_map(corpus, > removeNumbers)corpus <- tm_map(corpus, removeWords, > stopwords("english"))inspect(doc.corpus[1:2])library(SnowballC)corpus <- > tm_map(corpus, stemDocument)corpus <- tm_map(corpus, > stripWhitespace)inspect(doc.corpus[1:8])TDM <- > TermDocumentMatrix(corpus)TDM* > > por adelantado, muchas gracias!!! > > ruben! > ------------ próxima parte ------------ > Se ha borrado un adjunto en formato HTML... > URL: <https:&...
2011 Sep 05
0
Stemming functions only work on the last word of plain text documents
Hello, I want to use the SnowballStemmer on a collection of plain text documents. However, when I apply it to my corpus using the tm_map function it only stems the last word of each document (The problem is the for wordStem and stemDocument does not work at all).  An example: > path <- c("c:\path\to\directory")       # collection of plain text documents > corp <- Corpus(DirSource(path), readerControl = list(reader = readPlain, language = "en_US" , load = T)) > inspect(corp) A corpus with 2 tex...
2012 Feb 26
2
tm_map help
...pus <- tm_map(myCorpus, removePunctuation) myCorpus <- tm_map(myCorpus, removeNumbers) myStopwords <- c(stopwords('english'), "available", "via") myCorpus <- tm_map(myCorpus, removeWords, myStopwords) dictCorpus <- myCorpus myCorpus <- tm_map(myCorpus, stemDocument) ################ERROR HAPPENS ON NEXT LINE################################## myCorpus <- tm_map(myCorpus, stemCompletion, dictionary=dictCorpus) myDtm <- TermDocumentMatrix(myCorpus, control = list(minWordLength = 1)) m <- as.matrix(myDtm) v <- sort(rowSums(m), decreasing=TRUE) m...
2014 Jun 18
2
No es un problema de tm tienes doc.corpus vacío
...tolower)corpus <- tm_map(corpus, > >> removePunctuation)corpus <- tm_map(corpus, removeNumbers)corpus <- > >> tm_map(corpus, removeWords, > >> > stopwords("english"))inspect(doc.corpus[1:2])library(SnowballC)corpus > >> <- tm_map(corpus, stemDocument)corpus <- tm_map(corpus, > >> stripWhitespace)inspect(doc.corpus[1:8])TDM <- > >> TermDocumentMatrix(corpus)TDM* > >> > >> por adelantado, muchas gracias!!! > >> > >> ruben! > >> ------------ próxima parte ------------ Se ha...
2014 Jun 18
3
No es un problema de tm tienes doc.corpus vacío
...; >> removePunctuation)corpus <- tm_map(corpus, removeNumbers)corpus <- > > > >> tm_map(corpus, removeWords, > > > >> > > > stopwords("english"))inspect(doc.corpus[1:2])library(SnowballC)corpus > > > >> <- tm_map(corpus, stemDocument)corpus <- tm_map(corpus, > > > >> stripWhitespace)inspect(doc.corpus[1:8])TDM <- > > > >> TermDocumentMatrix(corpus)TDM* > > > >> > > > >> por adelantado, muchas gracias!!! > > > >> > > > >> ruben! &...