thr3ads.net - search: "removenumb"

Displaying 20 results from an estimated 20 matches for "removenumb".

2011 Nov 10

Removing numbers from a list

I am using gsub to remove numbers for each element of a list. Code is given below. testList <- list("this contains a number 1000","this does not contain") removeNumbers <- function(X) { gsub("\\d","",X) } outputList <- lapply(testList,removeNumbers) However, when I try to find the number of words in outputList as follows outLength <- lapply(strsplit(outputList," "),length) it throws out the following error:...

LSA package: problem with textmatrix()

2012 Feb 22

LSA package: problem with textmatrix()

I have a problem with the textmatrix() function of the LSA package whenever I specify 'removeNumbers=TRUE'. The data for the function are stored in a directory LSAwork which consists of a series of files that houses the text in column form. As long as removeNumbers = FALSE or it is not present the textmatrix function works just fine. The error message I get seems to suggest it is finding...

package "tm" fails to remove "the" with remove stopwords

2009 Nov 12

package "tm" fails to remove "the" with remove stopwords

...ain in Spain", "falls mainly on the plain", "jack and jill ran up the hill", "to fetch a pail of water") text.corp <- Corpus(VectorSource(myDocument)) ######################### text.corp <- tm_map(text.corp, stripWhitespace) text.corp <- tm_map(text.corp, removeNumbers) text.corp <- tm_map(text.corp, removePunctuation) ## text.corp <- tm_map(text.corp, stemDocument) text.corp <- tm_map(text.corp, removeWords, c("the", stopwords("english"))) dtm <- DocumentTermMatrix(text.corp) dtm dtm.mat <- as.matrix(dtm) dtm.mat > dtm.m...

Loop sobre muchos data frames

2015 Apr 10

Loop sobre muchos data frames

...talkin to u last time !!! starts at 4pm. come get sunny munny :) kind of scary to imagine what needs military wiping!!!! #.....más.contenido del corpus...... si quiero aplicarle un función propia de limpieza de corpus, por ejemplo eliminar los números presentes en el corpus > tm_map(txt[[1]], removeNumbers) <<VCorpus (documents: 1, metadata (corpus/indexed): 0/0)>> no hace nada de nada... Saludos Oscar El 10 de abril de 2015, 1:15, Jorge I Velez <jorgeivanvelez en gmail.com> escribió: > Oscar, > > Una forma de trabajar con este tipo de archivos es utilizando lis...

No es un problema de tm tienes doc.corpus vacío

2014 Jun 17

No es un problema de tm tienes doc.corpus vacío

...> readLines(TEXTFILE)length(inmortal)head(inmortal)tail(inmortal)library(tm)vec > <- VectorSource(inmortal)corpus <- > Corpus(vec)summary(corpus)inspect(corpus[1:7])corpus <- tm_map(corpus, > tolower)corpus <- tm_map(corpus, removePunctuation)corpus <- tm_map(corpus, > removeNumbers)corpus <- tm_map(corpus, removeWords, > stopwords("english"))inspect(doc.corpus[1:2])library(SnowballC)corpus <- > tm_map(corpus, stemDocument)corpus <- tm_map(corpus, > stripWhitespace)inspect(doc.corpus[1:8])TDM <- > TermDocumentMatrix(corpus)TDM* > > po...

Loop sobre muchos data frames

2015 Apr 12

Loop sobre muchos data frames

...y :) >> kind of scary to imagine what needs military wiping!!!! >> #.....más.contenido del corpus...... >> >> >> si quiero aplicarle un función propia de limpieza de corpus, por ejemplo >> eliminar los números presentes en el corpus >> > tm_map(txt[[1]], removeNumbers) >> <<VCorpus (documents: 1, metadata (corpus/indexed): 0/0)>> >> >> no hace nada de nada... >> >> Saludos >> >> Oscar >> >> >> >> >> >> El 10 de abril de 2015, 1:15, Jorge I Velez <jorgeivanvelez en gma...

tm package

2010 Feb 16

tm package

...(DirSource(corpusDir), readerControl = list(reader = readReut21578XMLasPlain)) reuters21578 <- tm_map(reuters21578, stripWhitespace) reuters21578 <- tm_map(reuters21578, tolower) reuters21578 <- tm_map(reuters21578, removePunctuation) reuters21578 <- tm_map(reuters21578, removeNumbers) reuters21578.dtm <- DocumentTermMatrix(reuters21578) that reuters21578.dtm does not include terms from the Heading (e.g. the Title). I'm wondering if anyone can confirm this and if so, is there an option to have the terms from the Heading included? Many thanks! Cheers, David

Help with cleaning a corpus

2011 Apr 18

Help with cleaning a corpus

Hi! I created a corpus and I started to clean through this piece of code: txt <-tm_map(txt,removeWords, stopwords("spanish")) txt <-tm_map(txt,stripWhitespace) txt <-tm_map(txt,tolower) txt <-tm_map(txt,removeNumbers) txt <-tm_map(txt,removePunctuation) But something happpended: some of the documents in the corpus became empty, this is a problem when i try to make a document term matrix with tfidf. Is there any way to eliminate automatically a document if it become empty? Or manually, how could i ge...

tm_map help

2012 Feb 26

tm_map help

...ind", lapply(tweets, as.data.frame)) myCorpus <- Corpus(VectorSource(df$text)) myCorpus <- tm_map(myCorpus, function(x) iconv(enc2utf8(x), sub = "byte")) myCorpus <- tm_map(myCorpus, tolower) myCorpus <- tm_map(myCorpus, removePunctuation) myCorpus <- tm_map(myCorpus, removeNumbers) myStopwords <- c(stopwords('english'), "available", "via") myCorpus <- tm_map(myCorpus, removeWords, myStopwords) dictCorpus <- myCorpus myCorpus <- tm_map(myCorpus, stemDocument) ################ERROR HAPPENS ON NEXT LINE###############################...

R hangs at NGramTokenizer

2013 Sep 26

R hangs at NGramTokenizer

...b("&", "", x)> myCorpus <- tm_map(myCorpus, removeAmp)> removeWWW <- function(x) gsub("www[[:alnum:]]*", "", x)> myCorpus <- tm_map(myCorpus, removeWWW)> myCorpus <- tm_map(myCorpus, tolower)> myCorpus <- tm_map(myCorpus, removeNumbers)> myCorpus <- tm_map(myCorpus, removePunctuation)> myCorpus <- tm_map(myCorpus, removeWords, stopwords("english"))> myCorpus <- tm_map(myCorpus, removeWords, stopwords("SMART"))> myCorpus <- tm_map(myCorpus, stripWhitespace)> myDtm <- DocumentTer...

No es un problema de tm tienes doc.corpus vacío

2014 Jun 18

No es un problema de tm tienes doc.corpus vacío

...al)tail( > >> inmortal)library(tm)vec > >> <- VectorSource(inmortal)corpus <- > >> Corpus(vec)summary(corpus)inspect(corpus[1:7])corpus <- > >> tm_map(corpus, tolower)corpus <- tm_map(corpus, > >> removePunctuation)corpus <- tm_map(corpus, removeNumbers)corpus <- > >> tm_map(corpus, removeWords, > >> > stopwords("english"))inspect(doc.corpus[1:2])library(SnowballC)corpus > >> <- tm_map(corpus, stemDocument)corpus <- tm_map(corpus, > >> stripWhitespace)inspect(doc.corpus[1:8])TDM <- &g...

Problem with Snowball & RWeka

2011 Mar 24

Problem with Snowball & RWeka

Dear Forum, when I try to use SnowballStemmer() I get the following error message: "Could not initialize the GenericPropertiesCreator. This exception was produced: java.lang.NullPointerException" It seems to have something to do with either Snowball or RWeka, however I can't figure out, what to do myself. If you could spend 5 minutes of your valuable time, to help me or give me a

No es un problema de tm tienes doc.corpus vacío

2014 Jun 18

No es un problema de tm tienes doc.corpus vacío

...tm)vec > > > >> <- VectorSource(inmortal)corpus <- > > > >> Corpus(vec)summary(corpus)inspect(corpus[1:7])corpus <- > > > >> tm_map(corpus, tolower)corpus <- tm_map(corpus, > > > >> removePunctuation)corpus <- tm_map(corpus, removeNumbers)corpus <- > > > >> tm_map(corpus, removeWords, > > > >> > > > stopwords("english"))inspect(doc.corpus[1:2])library(SnowballC)corpus > > > >> <- tm_map(corpus, stemDocument)corpus <- tm_map(corpus, > > > >> st...

wordcloud y tabla de palabras [Avanzando]

2014 Jul 29

wordcloud y tabla de palabras [Avanzando]

...l informe 2005", "todo el informe 2013"), row.names=c("2005", "2013")) ds<- DataframeSource(tmpText) ds<- DataframeSource(tmpinformes) corp = Corpus(ds) corp = tm_map(corp,removePunctuation) corp = tm_map(corp,content_transformer(tolower)) corp = tm_map(corp,removeNumbers) corp = tm_map(corp, stripWhitespace) corp = tm_map(corp, removeWords, sw) corp = tm_map(corp, removeWords, stopwords("spanish")) term.matrix<- TermDocumentMatrix(corp) term.matrix<- as.matrix(term.matrix) colnames(term.matrix) <- c("Año2005","Año2013") png...

Troubles with stemming (tm + Snowball packages) under MacOS

2012 Jan 13

Troubles with stemming (tm + Snowball packages) under MacOS

...coding="UTF-8") reuters <- Corpus(source) reuters <- tm_map(reuters, as.PlainTextDocument) reuters <- tm_map(reuters, removePunctuation) reuters <- tm_map(reuters, tolower) reuters <- tm_map(reuters, removeWords, stopwords("english")) reuters <- tm_map(reuters, removeNumbers) reuters <- tm_map(reuters, stripWhitespace) reuters <- tm_map(reuters, stemDocument) ------ Thank you for your help, Julien

count number of stop words in R

2017 Jun 12

count number of stop words in R

You can use regular expressions. ?regex and/or the stringr package are good places to start. Of course, you have to define "stop words." Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Jun 12, 2017 at 5:40

Ayuda con el paquete de text mining (TM)

2009 Jul 17

Ayuda con el paquete de text mining (TM)

Estimados, les escribo para consultar, lo siguiente: Estoy haciendo un trabajo de text mining y necesito importar una serie de textos para preprocesarlos, es decir eliminar los Stopwords, hacer stemming, eliminar signos de puntuación etc. Esto último lo puedo realizar con los datasets que trae la librería TM. Lo que no puedo lograr es importar texto desde algún medio a pesar que existe funciones

count number of stop words in R

2017 Jun 12

count number of stop words in R

Hi all, Is there a way in R to count the number of stop words (English) of a string using tm package? str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink . And the

Loop sobre muchos data frames

2015 Apr 10

Loop sobre muchos data frames

Hola a todos! Estoy en un proyecto de text mining y por razones de los recursos con que cuento tuve que separar los archivos de texto de input del proyecto en muchos archivos pequeños. Luego de transformar cada uno de estos archivos en un corpus separado, puedo aplicar limpieza sobre cada corpus, buscar n-gramas, construir cada termDocumentMatrix y finalmente reunir todo en una sola TDM. Pero

Minería de texto

2012 Oct 25

Minería de texto

Cordial Saludo Actualmente estoy realizando una función para gráficar una nube de palabras el código que tengo es el siguiente: library(twitteR)library(tm)library(wordcloud)library(RXKCD)library(RColorBrewer) tweets=searchTwitter(''@afflorezr'', n=1500) generateCorpus= function(tweets,my.stopwords=c(),min.freq){ #Install the textmining library require(tm) require(wordcloud)

search for: removenumb