2012 Feb 26
tm_map help
Hi all,
I am trying to do some text mining with twitter and I am getting the error:
Error in structure(names(sapply(possibleCompletions, "[", 1)), names = x) :
'names' attribute [1] must be the same length as the vector [0]
When I use tm_map. Has anyone had/seen this error before? The code I
have is shown below and this error only occurs with #qantas, hashtags
like #asx, #obama work ok.
Appreciate any help.
hashTag<-function (hashTag, minFreq){
tweets<- searc...
2012 Oct 25
Minería de texto
...e(tm) require(wordcloud) tw.df=twListToDF(tweets) RemoveAtPeople <- function(x){gsub("@\\w+", "",x)} df<- as.vector(sapply(tw.df$text, RemoveAtPeople)) #The following is cribbed and seems to do what it says on the can tw.corpus = Corpus(VectorSource(df)) tw.corpus = tm_map(tw.corpus, function(x) iconv(enc2utf8(x), sub = "byte")) tw.corpus = tm_map(tw.corpus, tolower) tw.corpus = tm_map(tw.corpus, removePunctuation) tw.corpus = tm_map(tw.corpus, function(x) removeWords(x, c(stopwords("spanish"),"rt"))) tw.corpus = tm_map(tw.corpus,...
2009 Nov 12
package "tm" fails to remove "the" with remove stopwords
...fo() below.
myDocument <- c("the rain in Spain", "falls mainly on the plain", "jack and
jill ran up the hill", "to fetch a pail of water")
text.corp <- Corpus(VectorSource(myDocument))
text.corp <- tm_map(text.corp, stripWhitespace)
text.corp <- tm_map(text.corp, removeNumbers)
text.corp <- tm_map(text.corp, removePunctuation)
## text.corp <- tm_map(text.corp, stemDocument)
text.corp <- tm_map(text.corp, removeWords, c("the", stopwords("english")))
dtm <- DocumentT...
2013 Sep 26
R hangs at NGramTokenizer
...; invisible(clusterEvalQ(cl, library(RTextTools)))> myCorpus <-Corpus(DirSource("/home/neeph/Test/DMOZ_Business"), encoding="UTF-8", readerControl=list(reader=readPlain))> removeURL <- function(x) gsub("http[[:alnum:]]*", "", x)> myCorpus <- tm_map(myCorpus, removeURL)> removeAmp <- function(x) gsub("&", "", x)> myCorpus <- tm_map(myCorpus, removeAmp)> removeWWW <- function(x) gsub("www[[:alnum:]]*", "", x)> myCorpus <- tm_map(myCorpus, removeWWW)> myCorpus <- tm_ma...
2014 Jul 29
wordcloud y tabla de palabras [Avanzando]
...n R: 3.1.1
tmpinformes<-data.frame(c("todo el informe 2005", "todo el informe
2013"), row.names=c("2005", "2013"))
ds<- DataframeSource(tmpText)
ds<- DataframeSource(tmpinformes)
corp = Corpus(ds)
corp = tm_map(corp,removePunctuation)
corp = tm_map(corp,content_transformer(tolower))
corp = tm_map(corp,removeNumbers)
corp = tm_map(corp, stripWhitespace)
corp = tm_map(corp, removeWords, sw)
corp = tm_map(corp, removeWords, stopwords("spanish"))
term.matrix<- TermDocumentMatrix(corp)
2014 Jul 22
Ayuda Error in `colnames<-`(`*tmp*`, value = c(
> d1<-readLines(txt1, encoding="UTF-8")
> d1<-iconv(enc2utf8(d1), sub = "byte")
> d2<-readLines(txt2, encoding="UTF-8")
> d2<-iconv(enc2utf8(d2), sub = "byte")
> df<-c(d1,d2)
> corpus<-Corpus(VectorSource(df))
> d<-tm_map(corpus, content_transformer(tolower))
> d<-tm_map(d, stripWhitespace)
> d<-tm_map(d, removePunctuation)
> sw<-readLines("./StopWords.txt", encoding="UTF-8")
> sw<-iconv(enc2utf8(sw), sub="byte")
> d<-tm_map(d, removeWords, sw)
> d<-t...
2014 Nov 22
Problemas con tm
Estimados compañeros tengo un problema con la librería tm o con windows
8.1 o con algo que no controlo.
Hace tiempo con windows 7 y una versión anterior de R ejecutaba este código:
crude <- tm_map(crude, tolower)
y sin problemas me creaba tdm. Ahora si lo ejecuto me da el siguiente error:
Error: inherits(doc, "TextDocument") is not TRUE
Pero si quito la línea de código
crude <- tm_map(crude, tolower)
Me crea tdm sin problema.
¿Qué está pas...
2012 Jan 13
Troubles with stemming (tm + Snowball packages) under MacOS
...1 / R 2.14.1 (I have tried several versions)
I have installed all the needed packages (tm, rJava, rWeka, Snowball)
+ dependencies. I have desactivated AWT (like written in http://r.789695.n4.nabble.com/Problem-with-Snowball-amp-RWeka-td3402126.html)
with :
The command tm_map(reuters, stemDocument) gives the following errors :
- First time:
Error in .jnew(name) :
java.lang.InternalError: Can't start the AWT because Java was
started on the first thread. Make sure StartOnFirstThread is not
specified in your application's Info.plist or on the command line...
2012 Jan 27
tm package: handling contractions
...tried making a wordcloud of Obama's State of the Union address using
the tm package to process the text
sotu <- scan(file="c:/R/data/sotu2012.txt", what="character")
sotu <- tolower(sotu)
corp <-Corpus(VectorSource(paste(sotu, collapse=" ")))
corp <- tm_map(corp, removePunctuation)
corp <- tm_map(corp, stemDocument)
corp <- tm_map(corp, function(x)removeWords(x,stopwords()))
tdm <- TermDocumentMatrix(corp)
m <- as.matrix(tdm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
I en...
2012 Dec 13
Tamaño de la matriz de términos y memoria. Paquete TM
...txt <- readLines("D:/Publico/Documents/texto1.txt",encoding="UTF-8")
txt = iconv(txt, to="ASCII//TRANSLIT")
# construye un corpus
corpus <- Corpus(VectorSource(txt))
# lleva a minúsculas
corpus <- tm_map(corpus, tolower)
# quita espacios en blanco
corpus <- tm_map(corpus, stripWhitespace)
# remueve la puntuación
corpus <- tm_map(corpus, removePunctuation)
# carga el archivo de palabras vacías personalizada en español y lo convierte a ASCII
sw &...
2014 Jun 17
No es un problema de tm tienes doc.corpus vacío
...ciologia/Soc Musica/Black
> metal/Analisis texto/Inmortal"inmortal = readLines(TEXTFILE)inmortal =
> readLines(TEXTFILE)length(inmortal)head(inmortal)tail(inmortal)library(tm)vec
> <- VectorSource(inmortal)corpus <-
> Corpus(vec)summary(corpus)inspect(corpus[1:7])corpus <- tm_map(corpus,
> tolower)corpus <- tm_map(corpus, removePunctuation)corpus <- tm_map(corpus,
> removeNumbers)corpus <- tm_map(corpus, removeWords,
> stopwords("english"))inspect(doc.corpus[1:2])library(SnowballC)corpus <-
> tm_map(corpus, stemDocument)corpus <- tm_map(...
2014 Jun 18
...rtal"inmortal = readLines(TEXTFILE)inmortal
> >> = readLines(TEXTFILE)length(inmortal)head(inmortal)tail(
> >> inmortal)library(tm)vec
> >> <- VectorSource(inmortal)corpus <-
> >> Corpus(vec)summary(corpus)inspect(corpus[1:7])corpus <-
> >> tm_map(corpus, tolower)corpus <- tm_map(corpus,
> >> removePunctuation)corpus <- tm_map(corpus, removeNumbers)corpus <-
> >> tm_map(corpus, removeWords,
> >>
> stopwords("english"))inspect(doc.corpus[1:2])library(SnowballC)corpus
> >> <- tm_map(...
2014 Jul 25
wordcloud y tabla de palabras
>pathname<-"C:/Users/d_2/Documents/Comision/PLAN de INSPECCIONES/Informes/"
>TDM<-function(informes, pathname) {
info.dir<-sprintf("%s/%s", pathname, informes)
info.cor<-Corpus(DirSource(directory=info.dir, encoding="UTF-8"))
info.cor.cl<-tm_map(info.cor, content_transformer(tolower))
info.cor.cl<-tm_map(info.cor.cl, stripWhitespace)
sw<-readLines("C:/Users/d_2/Documents/StopWords.txt", encoding="UTF-8")
sw<-iconv(enc2utf8(sw), sub = "byte")
2011 Apr 18
Help with cleaning a corpus
I created a corpus and I started to clean through this piece of code:
txt <-tm_map(txt,removeWords, stopwords("spanish"))
txt <-tm_map(txt,stripWhitespace)
txt <-tm_map(txt,tolower)
txt <-tm_map(txt,removeNumbers)
txt <-tm_map(txt,removePunctuation)
But something happpended: some of the documents in the corpus became empty,
this is a problem when i try to...
2014 Jul 28
wordcloud y tabla de palabras
...omision/PLAN de
> >
> >>TDM<-function(informes, pathname) {
> > info.dir<-sprintf("%s/%s", pathname, informes)
> > info.cor<-Corpus(DirSource(directory=info.dir, encoding="UTF-8"))
> > info.cor.cl<-tm_map(info.cor, content_transformer(tolower))
> > info.cor.cl<-tm_map(info.cor.cl, stripWhitespace)
> > info.cor.cl<-tm_map(info.cor.cl,removePunctuation)
> > sw<-readLines("C:/Users/d_2/Documents/StopWords.txt", encoding="UTF-8")
> > sw<-iconv(...
2017 Jun 12
count number of stop words in R
Defining data as you mentioned in your respond causes the following error:
Error in UseMethod("tm_map", x) :
no applicable method for 'tm_map' applied to an object of class "character"
I can solve this error by using Corpus(VectorSource(my string)) and the using your command but I cannot see the number of stop words in my string!
On Monday, June 12, 2017 8:36 AM, Patrick...
2017 Jun 12
count number of stop words in R
From: Elahe chalabi <chalabi.elahe at yahoo.de>
Sent: Monday, June 12, 2017 11:23:42 AM
To: Patrick Casimir; Bert Gunter
Cc: R-help Mailing List
Subject: Re: [R] count number of stop words in R
Thanks for your reply. I know the command
data <- tm_map(data, removeWords, stopwords("english"))
removes English stop words, I don't know how should I count stop words of my string:
str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over ....
2014 Jun 18
...> > > >> readLines(TEXTFILE)length(inmortal)head(inmortal)tail(
> > > >> inmortal)library(tm)vec
> > > >> <- VectorSource(inmortal)corpus <-
> > > >> Corpus(vec)summary(corpus)inspect(corpus[1:7])corpus <-
> > > >> tm_map(corpus, tolower)corpus <- tm_map(corpus,
> > > >> removePunctuation)corpus <- tm_map(corpus, removeNumbers)corpus <-
> > > >> tm_map(corpus, removeWords,
> > > >>
> > > stopwords("english"))inspect(doc.corpus[1:2])library(Snow...
2010 Feb 16
tm package
I'm using version 0.5.1 of tm package with R 2.10.1. It looks to me
as if after the following
reuters21578 <- Corpus(DirSource(corpusDir), readerControl =
list(reader = readReut21578XMLasPlain))
reuters21578 <- tm_map(reuters21578, stripWhitespace)
reuters21578 <- tm_map(reuters21578, tolower)
reuters21578 <- tm_map(reuters21578, removePunctuation)
reuters21578 <- tm_map(reuters21578, removeNumbers)
reuters21578.dtm <- DocumentTermMatrix(reuters21578)
that reuters21578.dtm does not i...
2011 Mar 24
Problem with Snowball & RWeka
Dear Forum,
when I try to use SnowballStemmer() I get the following error message:
"Could not initialize the GenericPropertiesCreator. This exception was
produced: java.lang.NullPointerException"
It seems to have something to do with either Snowball or RWeka, however I
can't figure out, what to do myself. If you could spend 5 minutes of your
valuable time, to help me or give me a