Displaying 20 results from an estimated 29 matches for "removewords".
2012 Oct 25
2
Minería de texto
...and seems to do what it says on the can tw.corpus = Corpus(VectorSource(df)) tw.corpus = tm_map(tw.corpus, function(x) iconv(enc2utf8(x), sub = "byte")) tw.corpus = tm_map(tw.corpus, tolower) tw.corpus = tm_map(tw.corpus, removePunctuation) tw.corpus = tm_map(tw.corpus, function(x) removeWords(x, c(stopwords("spanish"),"rt"))) tw.corpus = tm_map(tw.corpus, removeWords, my.stopwords) tw.corpus = tm_map(tw.corpus, stripWhitespace) sw <- readLines("stopwords.es.txt",encoding="UTF-8") sw = iconv(sw, to="ASCII//TRANSLIT") tw.corpus = t...
2009 Nov 12
2
package "tm" fails to remove "the" with remove stopwords
...;- Corpus(VectorSource(myDocument))
#########################
text.corp <- tm_map(text.corp, stripWhitespace)
text.corp <- tm_map(text.corp, removeNumbers)
text.corp <- tm_map(text.corp, removePunctuation)
## text.corp <- tm_map(text.corp, stemDocument)
text.corp <- tm_map(text.corp, removeWords, c("the", stopwords("english")))
dtm <- DocumentTermMatrix(text.corp)
dtm
dtm.mat <- as.matrix(dtm)
dtm.mat
> dtm.mat
Terms
Docs falls fetch hill jack jill mainly pail plain rain ran spain the water
1 0 0 0 0 0 0 0 0 1 0 1...
2014 Jul 22
2
Ayuda Error in `colnames<-`(`*tmp*`, value = c(
...Source(df))
> d<-tm_map(corpus, content_transformer(tolower))
> d<-tm_map(d, stripWhitespace)
> d<-tm_map(d, removePunctuation)
> sw<-readLines("./StopWords.txt", encoding="UTF-8")
> sw<-iconv(enc2utf8(sw), sub="byte")
> d<-tm_map(d, removeWords, sw)
> d<-tm_map(d, removeWords, stopwords("spanish"))
> tdm<-TermDocumentMatrix(d)
> m<-as.matrix(tdm)
> colnames(m) = c("P05", "P13")
Error in `colnames<-`(`*tmp*`, value = c("P05", "P13")) :
length of 'dimnames'...
2014 Jul 29
2
wordcloud y tabla de palabras [Avanzando]
...ot;2005", "2013"))
ds<- DataframeSource(tmpText)
ds<- DataframeSource(tmpinformes)
corp = Corpus(ds)
corp = tm_map(corp,removePunctuation)
corp = tm_map(corp,content_transformer(tolower))
corp = tm_map(corp,removeNumbers)
corp = tm_map(corp, stripWhitespace)
corp = tm_map(corp, removeWords, sw)
corp = tm_map(corp, removeWords, stopwords("spanish"))
term.matrix<- TermDocumentMatrix(corp)
term.matrix<- as.matrix(term.matrix)
colnames(term.matrix) <- c("Año2005","Año2013")
png(file="Org2005vs2013.png",height=600,width=1200)
par(mfrow=c(1,...
2017 Jun 12
3
count number of stop words in R
You can define stop words as below.
data <- tm_map(data, removeWords, stopwords("english"))
Patrick Casimir, PhD
Health Analytics, Data Science, Big Data Expert & Independent Consultant
C: 954.614.1178
________________________________
From: R-help <r-help-bounces at r-project.org> on behalf of Bert Gunter <bgunter.4567 at gmail.com>
Sen...
2017 Jun 12
0
count number of stop words in R
Thanks for your reply. I know the command
data <- tm_map(data, removeWords, stopwords("english"))
removes English stop words, I don't know how should I count stop words of my string:
str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the pictur...
2017 Jun 12
3
count number of stop words in R
...________________________________
From: Elahe chalabi <chalabi.elahe at yahoo.de>
Sent: Monday, June 12, 2017 11:23:42 AM
To: Patrick Casimir; Bert Gunter
Cc: R-help Mailing List
Subject: Re: [R] count number of stop words in R
Thanks for your reply. I know the command
data <- tm_map(data, removeWords, stopwords("english"))
removes English stop words, I don't know how should I count stop words of my string:
str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the pictur...
2017 Jun 12
0
count number of stop words in R
...t & Independent Consultant
C: 954.614.1178
________________________________
Sent: Monday, June 12, 2017 11:23:42 AM
To: Patrick Casimir; Bert Gunter
Cc: R-help Mailing List
Subject: Re: [R] count number of stop words in R
Thanks for your reply. I know the command
data <- tm_map(data, removeWords, stopwords("english"))
removes English stop words, I don't know how should I count stop words of my string:
str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the pictur...
2010 Mar 31
1
tm package- remove stowords failling
Hi,
I just noticed that by inspecting the matrix term that no all stopwords are
removed, does someone know how to fix that?
library(tm)
data("crude")
d<-tm_map(crude, removeWords, stopwords(language='english'))
dt<-DocumentTermMatrix(d,control=list(minWordLength=3, minDocFreq=2))
inspect( dt)
I am using R version 2.10, tm package 0.5-3
cheers
Welma
[[alternative HTML version deleted]]
2011 Oct 04
1
Reading stopwords from a csv file
...ge to do text miniing:
I have a huge list of stopwords (2000+) that are in a csv file. I read it as
follows:
stopwordlist <- read.csv("stopwords to be Removed 10042011.csv")
myStopwords <- as.character(stopwordlist$stopwords)
When try removing the stopwords using
tr1=tm_map(tr1,removeWords,myStopwords)
I am getting the following error:
Error in gsub(sprintf("\\b(%s)\\b", paste(words, collapse = "|")), "", :
internal error in compiling regexp
However, this works fine when I define myStopwords = c(....) instead of
reading from the csv file.
Can som...
2013 Sep 26
0
R hangs at NGramTokenizer
...;- function(x) gsub("www[[:alnum:]]*", "", x)> myCorpus <- tm_map(myCorpus, removeWWW)> myCorpus <- tm_map(myCorpus, tolower)> myCorpus <- tm_map(myCorpus, removeNumbers)> myCorpus <- tm_map(myCorpus, removePunctuation)> myCorpus <- tm_map(myCorpus, removeWords, stopwords("english"))> myCorpus <- tm_map(myCorpus, removeWords, stopwords("SMART"))> myCorpus <- tm_map(myCorpus, stripWhitespace)> myDtm <- DocumentTermMatrix(myCorpus, control = list(wordLengths = c(1,Inf)))
Everything works fine upto this stage, if I do no...
2012 Jan 27
2
tm package: handling contractions
...ext
sotu <- scan(file="c:/R/data/sotu2012.txt", what="character")
sotu <- tolower(sotu)
corp <-Corpus(VectorSource(paste(sotu, collapse=" ")))
corp <- tm_map(corp, removePunctuation)
corp <- tm_map(corp, stemDocument)
corp <- tm_map(corp, function(x)removeWords(x,stopwords()))
tdm <- TermDocumentMatrix(corp)
m <- as.matrix(tdm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
wordcloud(d$word,d$freq)
I ended up with a large number of contractions that were split at the
"?" character, e.g., "don?t&...
2014 Jul 28
2
wordcloud y tabla de palabras
...l, stripWhitespace)
> > info.cor.cl<-tm_map(info.cor.cl,removePunctuation)
> > sw<-readLines("C:/Users/d_2/Documents/StopWords.txt", encoding="UTF-8")
> > sw<-iconv(enc2utf8(sw), sub = "byte")
> > info.cor.cl<-tm_map(info.cor.cl, removeWords, stopwords("spanish"))
> > info.tdm<-TermDocumentMatrix(info.cor.cl)
> > result<-list(name = informes, tdm= info.tdm)
> > }
> >>tdm<-lapply(informes, TDM, path = pathname)
> >
> > Resultado:
> >
> >> tdm
> > [[1]]
>...
2014 Jul 25
3
wordcloud y tabla de palabras
...wer))
info.cor.cl<-tm_map(info.cor.cl, stripWhitespace)
info.cor.cl<-tm_map(info.cor.cl,removePunctuation)
sw<-readLines("C:/Users/d_2/Documents/StopWords.txt", encoding="UTF-8")
sw<-iconv(enc2utf8(sw), sub = "byte")
info.cor.cl<-tm_map(info.cor.cl, removeWords, stopwords("spanish"))
info.tdm<-TermDocumentMatrix(info.cor.cl)
result<-list(name = informes, tdm= info.tdm)
}
>tdm<-lapply(informes, TDM, path = pathname)
Resultado:
> tdm
[[1]]
[[1]]$name
[1] "2013"
[[1]]$tdm
<<TermDocumentMatrix (terms: 1540, docume...
2012 Dec 13
2
Tamaño de la matriz de términos y memoria. Paquete TM
...convierte a ASCII
sw <- readLines("D:/Publico/Documents/TextMinigSpanishResources/Stopwords.es.txt",encoding="UTF-8")
sw = iconv(sw, to="ASCII//TRANSLIT")
# remueve palabras vacías genericas
corpus <- tm_map(corpus, removeWords, stopwords("spanish"))
# stemming
corpus <- tm_map(corpus, stemDocument, language = "spanish")
# crea matriz de terminos
#a) términos como filas y documentos como columnas
dtm <- DocumentTermMatrix(corpus)...
2014 Jun 17
2
No es un problema de tm tienes doc.corpus vacío
...ad(inmortal)tail(inmortal)library(tm)vec
> <- VectorSource(inmortal)corpus <-
> Corpus(vec)summary(corpus)inspect(corpus[1:7])corpus <- tm_map(corpus,
> tolower)corpus <- tm_map(corpus, removePunctuation)corpus <- tm_map(corpus,
> removeNumbers)corpus <- tm_map(corpus, removeWords,
> stopwords("english"))inspect(doc.corpus[1:2])library(SnowballC)corpus <-
> tm_map(corpus, stemDocument)corpus <- tm_map(corpus,
> stripWhitespace)inspect(doc.corpus[1:8])TDM <-
> TermDocumentMatrix(corpus)TDM*
>
> por adelantado, muchas gracias!!!
>
> r...
2011 Apr 18
0
Help with cleaning a corpus
Hi!
I created a corpus and I started to clean through this piece of code:
txt <-tm_map(txt,removeWords, stopwords("spanish"))
txt <-tm_map(txt,stripWhitespace)
txt <-tm_map(txt,tolower)
txt <-tm_map(txt,removeNumbers)
txt <-tm_map(txt,removePunctuation)
But something happpended: some of the documents in the corpus became empty,
this is a problem when i try to make a document...
2012 Feb 26
2
tm_map help
...(enc2utf8(x), sub = "byte"))
myCorpus <- tm_map(myCorpus, tolower)
myCorpus <- tm_map(myCorpus, removePunctuation)
myCorpus <- tm_map(myCorpus, removeNumbers)
myStopwords <- c(stopwords('english'), "available", "via")
myCorpus <- tm_map(myCorpus, removeWords, myStopwords)
dictCorpus <- myCorpus
myCorpus <- tm_map(myCorpus, stemDocument)
################ERROR HAPPENS ON NEXT LINE##################################
myCorpus <- tm_map(myCorpus, stemCompletion, dictionary=dictCorpus)
myDtm <- TermDocumentMatrix(myCorpus, control = list(minWord...
2014 Jun 18
2
No es un problema de tm tienes doc.corpus vacío
...> <- VectorSource(inmortal)corpus <-
> >> Corpus(vec)summary(corpus)inspect(corpus[1:7])corpus <-
> >> tm_map(corpus, tolower)corpus <- tm_map(corpus,
> >> removePunctuation)corpus <- tm_map(corpus, removeNumbers)corpus <-
> >> tm_map(corpus, removeWords,
> >>
> stopwords("english"))inspect(doc.corpus[1:2])library(SnowballC)corpus
> >> <- tm_map(corpus, stemDocument)corpus <- tm_map(corpus,
> >> stripWhitespace)inspect(doc.corpus[1:8])TDM <-
> >> TermDocumentMatrix(corpus)TDM*
> >>...
2016 Sep 09
2
Borrar carácteres extraños /xax
Buenos días,
estoy realizando análisis de texto con Twitter y tengo un problema con unos
carácteres que no logro quitar. Són cadenas de letras con forma similar a
*xaexdfxdeaxoa*. Creo que surgen de la códificación de los emojis.
Yo suelo utilizar, más o menos el siguiente codigo con gsub para limpiar
texto, pero no me sirve
# remove rt
x = gsub("rt", "", x)
# remove at
x =