thr3ads.net - search: "removewords"

Displaying 20 results from an estimated 29 matches for "removewords".

2012 Oct 25

Minería de texto

...and seems to do what it says on the can tw.corpus = Corpus(VectorSource(df)) tw.corpus = tm_map(tw.corpus, function(x) iconv(enc2utf8(x), sub = "byte")) tw.corpus = tm_map(tw.corpus, tolower) tw.corpus = tm_map(tw.corpus, removePunctuation) tw.corpus = tm_map(tw.corpus, function(x) removeWords(x, c(stopwords("spanish"),"rt"))) tw.corpus = tm_map(tw.corpus, removeWords, my.stopwords) tw.corpus = tm_map(tw.corpus, stripWhitespace) sw <- readLines("stopwords.es.txt",encoding="UTF-8") sw = iconv(sw, to="ASCII//TRANSLIT") tw.corpus = t...

package "tm" fails to remove "the" with remove stopwords

2009 Nov 12

package "tm" fails to remove "the" with remove stopwords

...;- Corpus(VectorSource(myDocument)) ######################### text.corp <- tm_map(text.corp, stripWhitespace) text.corp <- tm_map(text.corp, removeNumbers) text.corp <- tm_map(text.corp, removePunctuation) ## text.corp <- tm_map(text.corp, stemDocument) text.corp <- tm_map(text.corp, removeWords, c("the", stopwords("english"))) dtm <- DocumentTermMatrix(text.corp) dtm dtm.mat <- as.matrix(dtm) dtm.mat > dtm.mat Terms Docs falls fetch hill jack jill mainly pail plain rain ran spain the water 1 0 0 0 0 0 0 0 0 1 0 1...

Ayuda Error in `colnames<-`(`*tmp*`, value = c(

2014 Jul 22

Ayuda Error in `colnames<-`(`*tmp*`, value = c(

...Source(df)) > d<-tm_map(corpus, content_transformer(tolower)) > d<-tm_map(d, stripWhitespace) > d<-tm_map(d, removePunctuation) > sw<-readLines("./StopWords.txt", encoding="UTF-8") > sw<-iconv(enc2utf8(sw), sub="byte") > d<-tm_map(d, removeWords, sw) > d<-tm_map(d, removeWords, stopwords("spanish")) > tdm<-TermDocumentMatrix(d) > m<-as.matrix(tdm) > colnames(m) = c("P05", "P13") Error in `colnames<-`(`*tmp*`, value = c("P05", "P13")) : length of 'dimnames'...

wordcloud y tabla de palabras [Avanzando]

2014 Jul 29

wordcloud y tabla de palabras [Avanzando]

...ot;2005", "2013")) ds<- DataframeSource(tmpText) ds<- DataframeSource(tmpinformes) corp = Corpus(ds) corp = tm_map(corp,removePunctuation) corp = tm_map(corp,content_transformer(tolower)) corp = tm_map(corp,removeNumbers) corp = tm_map(corp, stripWhitespace) corp = tm_map(corp, removeWords, sw) corp = tm_map(corp, removeWords, stopwords("spanish")) term.matrix<- TermDocumentMatrix(corp) term.matrix<- as.matrix(term.matrix) colnames(term.matrix) <- c("Año2005","Año2013") png(file="Org2005vs2013.png",height=600,width=1200) par(mfrow=c(1,...

count number of stop words in R

2017 Jun 12

count number of stop words in R

You can define stop words as below. data <- tm_map(data, removeWords, stopwords("english")) Patrick Casimir, PhD Health Analytics, Data Science, Big Data Expert & Independent Consultant C: 954.614.1178 ________________________________ From: R-help <r-help-bounces at r-project.org> on behalf of Bert Gunter <bgunter.4567 at gmail.com> Sen...

count number of stop words in R

2017 Jun 12

count number of stop words in R

Thanks for your reply. I know the command data <- tm_map(data, removeWords, stopwords("english")) removes English stop words, I don't know how should I count stop words of my string: str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the pictur...

count number of stop words in R

2017 Jun 12

count number of stop words in R

...________________________________ From: Elahe chalabi <chalabi.elahe at yahoo.de> Sent: Monday, June 12, 2017 11:23:42 AM To: Patrick Casimir; Bert Gunter Cc: R-help Mailing List Subject: Re: [R] count number of stop words in R Thanks for your reply. I know the command data <- tm_map(data, removeWords, stopwords("english")) removes English stop words, I don't know how should I count stop words of my string: str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the pictur...

count number of stop words in R

2017 Jun 12

count number of stop words in R

...t & Independent Consultant C: 954.614.1178 ________________________________ Sent: Monday, June 12, 2017 11:23:42 AM To: Patrick Casimir; Bert Gunter Cc: R-help Mailing List Subject: Re: [R] count number of stop words in R Thanks for your reply. I know the command data <- tm_map(data, removeWords, stopwords("english")) removes English stop words, I don't know how should I count stop words of my string: str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the pictur...

tm package- remove stowords failling

2010 Mar 31

tm package- remove stowords failling

Hi, I just noticed that by inspecting the matrix term that no all stopwords are removed, does someone know how to fix that? library(tm) data("crude") d<-tm_map(crude, removeWords, stopwords(language='english')) dt<-DocumentTermMatrix(d,control=list(minWordLength=3, minDocFreq=2)) inspect( dt) I am using R version 2.10, tm package 0.5-3 cheers Welma [[alternative HTML version deleted]]

Reading stopwords from a csv file

2011 Oct 04

Reading stopwords from a csv file

...ge to do text miniing: I have a huge list of stopwords (2000+) that are in a csv file. I read it as follows: stopwordlist <- read.csv("stopwords to be Removed 10042011.csv") myStopwords <- as.character(stopwordlist$stopwords) When try removing the stopwords using tr1=tm_map(tr1,removeWords,myStopwords) I am getting the following error: Error in gsub(sprintf("\\b(%s)\\b", paste(words, collapse = "|")), "", : internal error in compiling regexp However, this works fine when I define myStopwords = c(....) instead of reading from the csv file. Can som...

R hangs at NGramTokenizer

2013 Sep 26

R hangs at NGramTokenizer

...;- function(x) gsub("www[[:alnum:]]*", "", x)> myCorpus <- tm_map(myCorpus, removeWWW)> myCorpus <- tm_map(myCorpus, tolower)> myCorpus <- tm_map(myCorpus, removeNumbers)> myCorpus <- tm_map(myCorpus, removePunctuation)> myCorpus <- tm_map(myCorpus, removeWords, stopwords("english"))> myCorpus <- tm_map(myCorpus, removeWords, stopwords("SMART"))> myCorpus <- tm_map(myCorpus, stripWhitespace)> myDtm <- DocumentTermMatrix(myCorpus, control = list(wordLengths = c(1,Inf))) Everything works fine upto this stage, if I do no...

tm package: handling contractions

2012 Jan 27

tm package: handling contractions

...ext sotu <- scan(file="c:/R/data/sotu2012.txt", what="character") sotu <- tolower(sotu) corp <-Corpus(VectorSource(paste(sotu, collapse=" "))) corp <- tm_map(corp, removePunctuation) corp <- tm_map(corp, stemDocument) corp <- tm_map(corp, function(x)removeWords(x,stopwords())) tdm <- TermDocumentMatrix(corp) m <- as.matrix(tdm) v <- sort(rowSums(m),decreasing=TRUE) d <- data.frame(word = names(v),freq=v) wordcloud(d$word,d$freq) I ended up with a large number of contractions that were split at the "?" character, e.g., "don?t&...

wordcloud y tabla de palabras

2014 Jul 28

wordcloud y tabla de palabras

...l, stripWhitespace) > > info.cor.cl<-tm_map(info.cor.cl,removePunctuation) > > sw<-readLines("C:/Users/d_2/Documents/StopWords.txt", encoding="UTF-8") > > sw<-iconv(enc2utf8(sw), sub = "byte") > > info.cor.cl<-tm_map(info.cor.cl, removeWords, stopwords("spanish")) > > info.tdm<-TermDocumentMatrix(info.cor.cl) > > result<-list(name = informes, tdm= info.tdm) > > } > >>tdm<-lapply(informes, TDM, path = pathname) > > > > Resultado: > > > >> tdm > > [[1]] &gt...

wordcloud y tabla de palabras

2014 Jul 25

wordcloud y tabla de palabras

...wer)) info.cor.cl<-tm_map(info.cor.cl, stripWhitespace) info.cor.cl<-tm_map(info.cor.cl,removePunctuation) sw<-readLines("C:/Users/d_2/Documents/StopWords.txt", encoding="UTF-8") sw<-iconv(enc2utf8(sw), sub = "byte") info.cor.cl<-tm_map(info.cor.cl, removeWords, stopwords("spanish")) info.tdm<-TermDocumentMatrix(info.cor.cl) result<-list(name = informes, tdm= info.tdm) } >tdm<-lapply(informes, TDM, path = pathname) Resultado: > tdm [[1]] [[1]]$name [1] "2013" [[1]]$tdm <<TermDocumentMatrix (terms: 1540, docume...

Tamaño de la matriz de términos y memoria. Paquete TM

2012 Dec 13

Tamaño de la matriz de términos y memoria. Paquete TM

...convierte a ASCII sw <- readLines("D:/Publico/Documents/TextMinigSpanishResources/Stopwords.es.txt",encoding="UTF-8") sw = iconv(sw, to="ASCII//TRANSLIT") # remueve palabras vacías genericas corpus <- tm_map(corpus, removeWords, stopwords("spanish")) # stemming corpus <- tm_map(corpus, stemDocument, language = "spanish") # crea matriz de terminos #a) términos como filas y documentos como columnas dtm <- DocumentTermMatrix(corpus)...

No es un problema de tm tienes doc.corpus vacío

2014 Jun 17

No es un problema de tm tienes doc.corpus vacío

...ad(inmortal)tail(inmortal)library(tm)vec > <- VectorSource(inmortal)corpus <- > Corpus(vec)summary(corpus)inspect(corpus[1:7])corpus <- tm_map(corpus, > tolower)corpus <- tm_map(corpus, removePunctuation)corpus <- tm_map(corpus, > removeNumbers)corpus <- tm_map(corpus, removeWords, > stopwords("english"))inspect(doc.corpus[1:2])library(SnowballC)corpus <- > tm_map(corpus, stemDocument)corpus <- tm_map(corpus, > stripWhitespace)inspect(doc.corpus[1:8])TDM <- > TermDocumentMatrix(corpus)TDM* > > por adelantado, muchas gracias!!! > > r...

Help with cleaning a corpus

2011 Apr 18

Help with cleaning a corpus

Hi! I created a corpus and I started to clean through this piece of code: txt <-tm_map(txt,removeWords, stopwords("spanish")) txt <-tm_map(txt,stripWhitespace) txt <-tm_map(txt,tolower) txt <-tm_map(txt,removeNumbers) txt <-tm_map(txt,removePunctuation) But something happpended: some of the documents in the corpus became empty, this is a problem when i try to make a document...

tm_map help

2012 Feb 26

tm_map help

...(enc2utf8(x), sub = "byte")) myCorpus <- tm_map(myCorpus, tolower) myCorpus <- tm_map(myCorpus, removePunctuation) myCorpus <- tm_map(myCorpus, removeNumbers) myStopwords <- c(stopwords('english'), "available", "via") myCorpus <- tm_map(myCorpus, removeWords, myStopwords) dictCorpus <- myCorpus myCorpus <- tm_map(myCorpus, stemDocument) ################ERROR HAPPENS ON NEXT LINE################################## myCorpus <- tm_map(myCorpus, stemCompletion, dictionary=dictCorpus) myDtm <- TermDocumentMatrix(myCorpus, control = list(minWord...

No es un problema de tm tienes doc.corpus vacío

2014 Jun 18

No es un problema de tm tienes doc.corpus vacío

...> <- VectorSource(inmortal)corpus <- > >> Corpus(vec)summary(corpus)inspect(corpus[1:7])corpus <- > >> tm_map(corpus, tolower)corpus <- tm_map(corpus, > >> removePunctuation)corpus <- tm_map(corpus, removeNumbers)corpus <- > >> tm_map(corpus, removeWords, > >> > stopwords("english"))inspect(doc.corpus[1:2])library(SnowballC)corpus > >> <- tm_map(corpus, stemDocument)corpus <- tm_map(corpus, > >> stripWhitespace)inspect(doc.corpus[1:8])TDM <- > >> TermDocumentMatrix(corpus)TDM* > >>...

Borrar carácteres extraños /xax

2016 Sep 09

Borrar carácteres extraños /xax

Buenos días, estoy realizando análisis de texto con Twitter y tengo un problema con unos carácteres que no logro quitar. Són cadenas de letras con forma similar a *xaexdfxdeaxoa*. Creo que surgen de la códificación de los emojis. Yo suelo utilizar, más o menos el siguiente codigo con gsub para limpiar texto, pero no me sirve # remove rt x = gsub("rt", "", x) # remove at x =

search for: removewords