thr3ads.net - search: "tm

Displaying 20 results from an estimated 35 matches for "tm_map".

Did you mean: vm_map

2012 Feb 26

tm_map help

Hi all, I am trying to do some text mining with twitter and I am getting the error: Error in structure(names(sapply(possibleCompletions, "[", 1)), names = x) : 'names' attribute [1] must be the same length as the vector [0] When I use tm_map. Has anyone had/seen this error before? The code I have is shown below and this error only occurs with #qantas, hashtags like #asx, #obama work ok. Appreciate any help. Thanks, Sachin library(twitteR) library(tm) library(wordcloud) hashTag<-function (hashTag, minFreq){ tweets<- searc...

Minería de texto

2012 Oct 25

Minería de texto

...e(tm) require(wordcloud) tw.df=twListToDF(tweets) RemoveAtPeople <- function(x){gsub("@\\w+", "",x)} df<- as.vector(sapply(tw.df$text, RemoveAtPeople)) #The following is cribbed and seems to do what it says on the can tw.corpus = Corpus(VectorSource(df)) tw.corpus = tm_map(tw.corpus, function(x) iconv(enc2utf8(x), sub = "byte")) tw.corpus = tm_map(tw.corpus, tolower) tw.corpus = tm_map(tw.corpus, removePunctuation) tw.corpus = tm_map(tw.corpus, function(x) removeWords(x, c(stopwords("spanish"),"rt"))) tw.corpus = tm_map(tw.corpus,...

package "tm" fails to remove "the" with remove stopwords

2009 Nov 12

package "tm" fails to remove "the" with remove stopwords

...fo() below. Thanks! Mark require(tm) myDocument <- c("the rain in Spain", "falls mainly on the plain", "jack and jill ran up the hill", "to fetch a pail of water") text.corp <- Corpus(VectorSource(myDocument)) ######################### text.corp <- tm_map(text.corp, stripWhitespace) text.corp <- tm_map(text.corp, removeNumbers) text.corp <- tm_map(text.corp, removePunctuation) ## text.corp <- tm_map(text.corp, stemDocument) text.corp <- tm_map(text.corp, removeWords, c("the", stopwords("english"))) dtm <- DocumentT...

R hangs at NGramTokenizer

2013 Sep 26

R hangs at NGramTokenizer

...; invisible(clusterEvalQ(cl, library(RTextTools)))> myCorpus <-Corpus(DirSource("/home/neeph/Test/DMOZ_Business"), encoding="UTF-8", readerControl=list(reader=readPlain))> removeURL <- function(x) gsub("http[[:alnum:]]*", "", x)> myCorpus <- tm_map(myCorpus, removeURL)> removeAmp <- function(x) gsub("&", "", x)> myCorpus <- tm_map(myCorpus, removeAmp)> removeWWW <- function(x) gsub("www[[:alnum:]]*", "", x)> myCorpus <- tm_map(myCorpus, removeWWW)> myCorpus <- tm_ma...

wordcloud y tabla de palabras [Avanzando]

2014 Jul 29

wordcloud y tabla de palabras [Avanzando]

...n R: 3.1.1 require(tm) require(wordcloud) require(Rcpp) tmpinformes<-data.frame(c("todo el informe 2005", "todo el informe 2013"), row.names=c("2005", "2013")) ds<- DataframeSource(tmpText) ds<- DataframeSource(tmpinformes) corp = Corpus(ds) corp = tm_map(corp,removePunctuation) corp = tm_map(corp,content_transformer(tolower)) corp = tm_map(corp,removeNumbers) corp = tm_map(corp, stripWhitespace) corp = tm_map(corp, removeWords, sw) corp = tm_map(corp, removeWords, stopwords("spanish")) term.matrix<- TermDocumentMatrix(corp) term.matrix...

Ayuda Error in `colnames<-`(`*tmp*`, value = c(

2014 Jul 22

Ayuda Error in `colnames<-`(`*tmp*`, value = c(

...2) > d1<-readLines(txt1, encoding="UTF-8") > d1<-iconv(enc2utf8(d1), sub = "byte") > d2<-readLines(txt2, encoding="UTF-8") > d2<-iconv(enc2utf8(d2), sub = "byte") > df<-c(d1,d2) > corpus<-Corpus(VectorSource(df)) > d<-tm_map(corpus, content_transformer(tolower)) > d<-tm_map(d, stripWhitespace) > d<-tm_map(d, removePunctuation) > sw<-readLines("./StopWords.txt", encoding="UTF-8") > sw<-iconv(enc2utf8(sw), sub="byte") > d<-tm_map(d, removeWords, sw) > d<-t...

Problemas con tm

2014 Nov 22

Problemas con tm

Estimados compañeros tengo un problema con la librería tm o con windows 8.1 o con algo que no controlo. Hace tiempo con windows 7 y una versión anterior de R ejecutaba este código: library(tm) data("crude") crude <- tm_map(crude, tolower) tdm<-TermDocumentMatrix(crude) y sin problemas me creaba tdm. Ahora si lo ejecuto me da el siguiente error: Error: inherits(doc, "TextDocument") is not TRUE Pero si quito la línea de código crude <- tm_map(crude, tolower) Me crea tdm sin problema. ¿Qué está pas...

Troubles with stemming (tm + Snowball packages) under MacOS

2012 Jan 13

Troubles with stemming (tm + Snowball packages) under MacOS

...1 / R 2.14.1 (I have tried several versions) I have installed all the needed packages (tm, rJava, rWeka, Snowball) + dependencies. I have desactivated AWT (like written in http://r.789695.n4.nabble.com/Problem-with-Snowball-amp-RWeka-td3402126.html) with : Sys.setenv(NOAWT=TRUE) The command tm_map(reuters, stemDocument) gives the following errors : - First time: Error in .jnew(name) : java.lang.InternalError: Can't start the AWT because Java was started on the first thread. Make sure StartOnFirstThread is not specified in your application's Info.plist or on the command line...

tm package: handling contractions

2012 Jan 27

tm package: handling contractions

...tried making a wordcloud of Obama's State of the Union address using the tm package to process the text sotu <- scan(file="c:/R/data/sotu2012.txt", what="character") sotu <- tolower(sotu) corp <-Corpus(VectorSource(paste(sotu, collapse=" "))) corp <- tm_map(corp, removePunctuation) corp <- tm_map(corp, stemDocument) corp <- tm_map(corp, function(x)removeWords(x,stopwords())) tdm <- TermDocumentMatrix(corp) m <- as.matrix(tdm) v <- sort(rowSums(m),decreasing=TRUE) d <- data.frame(word = names(v),freq=v) wordcloud(d$word,d$freq) I en...

Tamaño de la matriz de términos y memoria. Paquete TM

2012 Dec 13

Tamaño de la matriz de términos y memoria. Paquete TM

...txt <- readLines("D:/Publico/Documents/texto1.txt",encoding="UTF-8") txt = iconv(txt, to="ASCII//TRANSLIT") # construye un corpus corpus <- Corpus(VectorSource(txt)) # lleva a minúsculas corpus <- tm_map(corpus, tolower) # quita espacios en blanco corpus <- tm_map(corpus, stripWhitespace) # remueve la puntuación corpus <- tm_map(corpus, removePunctuation) # carga el archivo de palabras vacías personalizada en español y lo convierte a ASCII sw &...

No es un problema de tm tienes doc.corpus vacío

2014 Jun 17

No es un problema de tm tienes doc.corpus vacío

...ciologia/Soc Musica/Black > metal/Analisis texto/Inmortal"inmortal = readLines(TEXTFILE)inmortal = > readLines(TEXTFILE)length(inmortal)head(inmortal)tail(inmortal)library(tm)vec > <- VectorSource(inmortal)corpus <- > Corpus(vec)summary(corpus)inspect(corpus[1:7])corpus <- tm_map(corpus, > tolower)corpus <- tm_map(corpus, removePunctuation)corpus <- tm_map(corpus, > removeNumbers)corpus <- tm_map(corpus, removeWords, > stopwords("english"))inspect(doc.corpus[1:2])library(SnowballC)corpus <- > tm_map(corpus, stemDocument)corpus <- tm_map(...

No es un problema de tm tienes doc.corpus vacío

2014 Jun 18

No es un problema de tm tienes doc.corpus vacío

...rtal"inmortal = readLines(TEXTFILE)inmortal > >> = readLines(TEXTFILE)length(inmortal)head(inmortal)tail( > >> inmortal)library(tm)vec > >> <- VectorSource(inmortal)corpus <- > >> Corpus(vec)summary(corpus)inspect(corpus[1:7])corpus <- > >> tm_map(corpus, tolower)corpus <- tm_map(corpus, > >> removePunctuation)corpus <- tm_map(corpus, removeNumbers)corpus <- > >> tm_map(corpus, removeWords, > >> > stopwords("english"))inspect(doc.corpus[1:2])library(SnowballC)corpus > >> <- tm_map(...

wordcloud y tabla de palabras

2014 Jul 25

wordcloud y tabla de palabras

...uot;) >pathname<-"C:/Users/d_2/Documents/Comision/PLAN de INSPECCIONES/Informes/" >TDM<-function(informes, pathname) { info.dir<-sprintf("%s/%s", pathname, informes) info.cor<-Corpus(DirSource(directory=info.dir, encoding="UTF-8")) info.cor.cl<-tm_map(info.cor, content_transformer(tolower)) info.cor.cl<-tm_map(info.cor.cl, stripWhitespace) info.cor.cl<-tm_map(info.cor.cl,removePunctuation) sw<-readLines("C:/Users/d_2/Documents/StopWords.txt", encoding="UTF-8") sw<-iconv(enc2utf8(sw), sub = "byte") i...

Help with cleaning a corpus

2011 Apr 18

Help with cleaning a corpus

Hi! I created a corpus and I started to clean through this piece of code: txt <-tm_map(txt,removeWords, stopwords("spanish")) txt <-tm_map(txt,stripWhitespace) txt <-tm_map(txt,tolower) txt <-tm_map(txt,removeNumbers) txt <-tm_map(txt,removePunctuation) But something happpended: some of the documents in the corpus became empty, this is a problem when i try to...

wordcloud y tabla de palabras

2014 Jul 28

wordcloud y tabla de palabras

...omision/PLAN de > INSPECCIONES/Informes/" > > > >>TDM<-function(informes, pathname) { > > info.dir<-sprintf("%s/%s", pathname, informes) > > info.cor<-Corpus(DirSource(directory=info.dir, encoding="UTF-8")) > > info.cor.cl<-tm_map(info.cor, content_transformer(tolower)) > > info.cor.cl<-tm_map(info.cor.cl, stripWhitespace) > > info.cor.cl<-tm_map(info.cor.cl,removePunctuation) > > sw<-readLines("C:/Users/d_2/Documents/StopWords.txt", encoding="UTF-8") > > sw<-iconv(...

count number of stop words in R

2017 Jun 12

count number of stop words in R

Defining data as you mentioned in your respond causes the following error: Error in UseMethod("tm_map", x) : no applicable method for 'tm_map' applied to an object of class "character" I can solve this error by using Corpus(VectorSource(my string)) and the using your command but I cannot see the number of stop words in my string! On Monday, June 12, 2017 8:36 AM, Patrick...

count number of stop words in R

2017 Jun 12

count number of stop words in R

...54.614.1178 ________________________________ From: Elahe chalabi <chalabi.elahe at yahoo.de> Sent: Monday, June 12, 2017 11:23:42 AM To: Patrick Casimir; Bert Gunter Cc: R-help Mailing List Subject: Re: [R] count number of stop words in R Thanks for your reply. I know the command data <- tm_map(data, removeWords, stopwords("english")) removes English stop words, I don't know how should I count stop words of my string: str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over ....

No es un problema de tm tienes doc.corpus vacío

2014 Jun 18

No es un problema de tm tienes doc.corpus vacío

...> > > >> readLines(TEXTFILE)length(inmortal)head(inmortal)tail( > > > >> inmortal)library(tm)vec > > > >> <- VectorSource(inmortal)corpus <- > > > >> Corpus(vec)summary(corpus)inspect(corpus[1:7])corpus <- > > > >> tm_map(corpus, tolower)corpus <- tm_map(corpus, > > > >> removePunctuation)corpus <- tm_map(corpus, removeNumbers)corpus <- > > > >> tm_map(corpus, removeWords, > > > >> > > > stopwords("english"))inspect(doc.corpus[1:2])library(Snow...

tm package

2010 Feb 16

tm package

Hi, I'm using version 0.5.1 of tm package with R 2.10.1. It looks to me as if after the following reuters21578 <- Corpus(DirSource(corpusDir), readerControl = list(reader = readReut21578XMLasPlain)) reuters21578 <- tm_map(reuters21578, stripWhitespace) reuters21578 <- tm_map(reuters21578, tolower) reuters21578 <- tm_map(reuters21578, removePunctuation) reuters21578 <- tm_map(reuters21578, removeNumbers) reuters21578.dtm <- DocumentTermMatrix(reuters21578) that reuters21578.dtm does not i...

Problem with Snowball & RWeka

2011 Mar 24

Problem with Snowball & RWeka

Dear Forum, when I try to use SnowballStemmer() I get the following error message: "Could not initialize the GenericPropertiesCreator. This exception was produced: java.lang.NullPointerException" It seems to have something to do with either Snowball or RWeka, however I can't figure out, what to do myself. If you could spend 5 minutes of your valuable time, to help me or give me a

search for: tm_map