thr3ads.net - search: "stemdocu"

Displaying 14 results from an estimated 14 matches for "stemdocu".

Did you mean: stemdict

2012 Apr 13

Help with stemDocument

Hi, All: I am new to R and tm package. I'm trying to do the stemming using tm_map() and it doesn't seem to work: *I used:* > stemDocument(t_cmts[[100]]) *Where t_cmts is the corpus object, the results is:* bottle loose box abt airpak sections top plastic bottle squashed nearly flush neck previous shipments bottle wrapped securely bubble wrap wno bottle damage packaging poor surprisingly bottle leaking remove contents bottle reu...

tm::stemDocument function not work

2013 Sep 03

tm::stemDocument function not work

https://gist.github.com/rpietro/6430771 stemDocument function doesn't seem to be working. Tried to look up and a few people have reported the problem, but no solution that I could find. would appreciate any help

Help: stemming and stem completion with package tm in R

2011 Nov 04

Help: stemming and stem completion with package tm in R

Hi All I came across a problem below when doing stemming and stem completion with package tm in R. Word "mining" was stemmed to "mine" with stemDocument(), and then completed to "miners"with stemCompletion(). However, I prefer to keep "mining" intact. For stemCompletion(), the default type of completion is "prevalent", which takes the most frequent match as completion. Although "mining" is much more freq...

Troubles with stemming (tm + Snowball packages) under MacOS

2012 Jan 13

Troubles with stemming (tm + Snowball packages) under MacOS

...have tried several versions) I have installed all the needed packages (tm, rJava, rWeka, Snowball) + dependencies. I have desactivated AWT (like written in http://r.789695.n4.nabble.com/Problem-with-Snowball-amp-RWeka-td3402126.html) with : Sys.setenv(NOAWT=TRUE) The command tm_map(reuters, stemDocument) gives the following errors : - First time: Error in .jnew(name) : java.lang.InternalError: Can't start the AWT because Java was started on the first thread. Make sure StartOnFirstThread is not specified in your application's Info.plist or on the command line Refreshing GOE pro...

package "tm" fails to remove "the" with remove stopwords

2009 Nov 12

package "tm" fails to remove "the" with remove stopwords

...ot;to fetch a pail of water") text.corp <- Corpus(VectorSource(myDocument)) ######################### text.corp <- tm_map(text.corp, stripWhitespace) text.corp <- tm_map(text.corp, removeNumbers) text.corp <- tm_map(text.corp, removePunctuation) ## text.corp <- tm_map(text.corp, stemDocument) text.corp <- tm_map(text.corp, removeWords, c("the", stopwords("english"))) dtm <- DocumentTermMatrix(text.corp) dtm dtm.mat <- as.matrix(dtm) dtm.mat > dtm.mat Terms Docs falls fetch hill jack jill mainly pail plain rain ran spain the water 1 0...

Problem with Snowball & RWeka

2011 Jun 04

Problem with Snowball & RWeka

...ll library(Snowball) source <- readLines(system.file("words", "porter","voc.txt",package = "Snowball")) result <- SnowballStemmer(source) #2) Using package tm library(tm) data("crude") stemDocument(crude[[1]]) In both instances I got a Java error "Could not initialize the GenericPropertiesCreator. This exception was produced: java.lang.NullPointerException". After receiving this error once in the session, no further error messages are generated. However, SnowballStemmer() and s...

tm package: handling contractions

2012 Jan 27

tm package: handling contractions

...on address using the tm package to process the text sotu <- scan(file="c:/R/data/sotu2012.txt", what="character") sotu <- tolower(sotu) corp <-Corpus(VectorSource(paste(sotu, collapse=" "))) corp <- tm_map(corp, removePunctuation) corp <- tm_map(corp, stemDocument) corp <- tm_map(corp, function(x)removeWords(x,stopwords())) tdm <- TermDocumentMatrix(corp) m <- as.matrix(tdm) v <- sort(rowSums(m),decreasing=TRUE) d <- data.frame(word = names(v),freq=v) wordcloud(d$word,d$freq) I ended up with a large number of contractions that were split...

Problem with Snowball & RWeka

2011 Mar 24

Problem with Snowball & RWeka

Dear Forum, when I try to use SnowballStemmer() I get the following error message: "Could not initialize the GenericPropertiesCreator. This exception was produced: java.lang.NullPointerException" It seems to have something to do with either Snowball or RWeka, however I can't figure out, what to do myself. If you could spend 5 minutes of your valuable time, to help me or give me a

Tamaño de la matriz de términos y memoria. Paquete TM

2012 Dec 13

Tamaño de la matriz de términos y memoria. Paquete TM

...urces/Stopwords.es.txt",encoding="UTF-8") sw = iconv(sw, to="ASCII//TRANSLIT") # remueve palabras vacías genericas corpus <- tm_map(corpus, removeWords, stopwords("spanish")) # stemming corpus <- tm_map(corpus, stemDocument, language = "spanish") # crea matriz de terminos #a) términos como filas y documentos como columnas dtm <- DocumentTermMatrix(corpus) inspect(dtm[1000:1005,1000:1005]) # Términos con frecuencia mínima igu...

No es un problema de tm tienes doc.corpus vacío

2014 Jun 17

No es un problema de tm tienes doc.corpus vacío

...t(corpus[1:7])corpus <- tm_map(corpus, > tolower)corpus <- tm_map(corpus, removePunctuation)corpus <- tm_map(corpus, > removeNumbers)corpus <- tm_map(corpus, removeWords, > stopwords("english"))inspect(doc.corpus[1:2])library(SnowballC)corpus <- > tm_map(corpus, stemDocument)corpus <- tm_map(corpus, > stripWhitespace)inspect(doc.corpus[1:8])TDM <- > TermDocumentMatrix(corpus)TDM* > > por adelantado, muchas gracias!!! > > ruben! > ------------ próxima parte ------------ > Se ha borrado un adjunto en formato HTML... > URL: <https:/...

Stemming functions only work on the last word of plain text documents

2011 Sep 05

Stemming functions only work on the last word of plain text documents

Hello, I want to use the SnowballStemmer on a collection of plain text documents. However, when I apply it to my corpus using the tm_map function it only stems the last word of each document (The problem is the for wordStem and stemDocument does not work at all). An example: > path <- c("c:\path\to\directory") # collection of plain text documents > corp <- Corpus(DirSource(path), readerControl = list(reader = readPlain, language = "en_US" , load = T)) > inspect(corp) A corpus with 2 tex...

tm_map help

2012 Feb 26

tm_map help

...pus <- tm_map(myCorpus, removePunctuation) myCorpus <- tm_map(myCorpus, removeNumbers) myStopwords <- c(stopwords('english'), "available", "via") myCorpus <- tm_map(myCorpus, removeWords, myStopwords) dictCorpus <- myCorpus myCorpus <- tm_map(myCorpus, stemDocument) ################ERROR HAPPENS ON NEXT LINE################################## myCorpus <- tm_map(myCorpus, stemCompletion, dictionary=dictCorpus) myDtm <- TermDocumentMatrix(myCorpus, control = list(minWordLength = 1)) m <- as.matrix(myDtm) v <- sort(rowSums(m), decreasing=TRUE) m...

No es un problema de tm tienes doc.corpus vacío

2014 Jun 18

No es un problema de tm tienes doc.corpus vacío

...tolower)corpus <- tm_map(corpus, > >> removePunctuation)corpus <- tm_map(corpus, removeNumbers)corpus <- > >> tm_map(corpus, removeWords, > >> > stopwords("english"))inspect(doc.corpus[1:2])library(SnowballC)corpus > >> <- tm_map(corpus, stemDocument)corpus <- tm_map(corpus, > >> stripWhitespace)inspect(doc.corpus[1:8])TDM <- > >> TermDocumentMatrix(corpus)TDM* > >> > >> por adelantado, muchas gracias!!! > >> > >> ruben! > >> ------------ prÃ³xima parte ------------ Se ha...

No es un problema de tm tienes doc.corpus vacío

2014 Jun 18

No es un problema de tm tienes doc.corpus vacío

...; >> removePunctuation)corpus <- tm_map(corpus, removeNumbers)corpus <- > > > >> tm_map(corpus, removeWords, > > > >> > > > stopwords("english"))inspect(doc.corpus[1:2])library(SnowballC)corpus > > > >> <- tm_map(corpus, stemDocument)corpus <- tm_map(corpus, > > > >> stripWhitespace)inspect(doc.corpus[1:8])TDM <- > > > >> TermDocumentMatrix(corpus)TDM* > > > >> > > > >> por adelantado, muchas gracias!!! > > > >> > > > >> ruben! &...

search for: stemdocu