search for: tm_map

Displaying 20 results from an estimated 31 matches for "tm_map".

Did you mean: vm_map
2012 Feb 26
2
tm_map help
Hi all, I am trying to do some text mining with twitter and I am getting the error: Error in structure(names(sapply(possibleCompletions, "[", 1)), names = x) : 'names' attribute [1] must be the same length as the vector [0] When I use tm_map. Has anyone had/seen this error before? The code I have is shown below and this error only occurs with #qantas, hashtags like #asx, #obama work ok. Appreciate any help. Thanks, Sachin library(twitteR) library(tm) library(wordcloud) hashTag<-function (hashTag, minFreq){ tweets<- s...
2009 Nov 12
2
package "tm" fails to remove "the" with remove stopwords
...fo() below. Thanks! Mark require(tm) myDocument <- c("the rain in Spain", "falls mainly on the plain", "jack and jill ran up the hill", "to fetch a pail of water") text.corp <- Corpus(VectorSource(myDocument)) ######################### text.corp <- tm_map(text.corp, stripWhitespace) text.corp <- tm_map(text.corp, removeNumbers) text.corp <- tm_map(text.corp, removePunctuation) ## text.corp <- tm_map(text.corp, stemDocument) text.corp <- tm_map(text.corp, removeWords, c("the", stopwords("english"))) dtm <- DocumentT...
2012 Oct 25
2
Minería de texto
...e(tm) require(wordcloud) tw.df=twListToDF(tweets) RemoveAtPeople <- function(x){gsub("@\\w+", "",x)} df<- as.vector(sapply(tw.df$text, RemoveAtPeople)) #The following is cribbed and seems to do what it says on the can tw.corpus = Corpus(VectorSource(df)) tw.corpus = tm_map(tw.corpus, function(x) iconv(enc2utf8(x), sub = "byte")) tw.corpus = tm_map(tw.corpus, tolower) tw.corpus = tm_map(tw.corpus, removePunctuation) tw.corpus = tm_map(tw.corpus, function(x) removeWords(x, c(stopwords("spanish"),"rt"))) tw.corpus = tm_map(tw.corpus,...
2013 Sep 26
0
R hangs at NGramTokenizer
...terEvalQ(cl, library(RTextTools)))> myCorpus <-Corpus(DirSource("/home/neeph/Test/DMOZ_Business"), encoding="UTF-8", readerControl=list(reader=readPlain))> removeURL <- function(x) gsub("http[[:alnum:]]*", "", x)> myCorpus <- tm_map(myCorpus, removeURL)> removeAmp <- function(x) gsub("&amp;", "", x)> myCorpus <- tm_map(myCorpus, removeAmp)> removeWWW <- function(x) gsub("www[[:alnum:]]*", "", x)> myCorpus <- tm_map(myCorpus, removeWWW)> myCorpus <- tm_ma...
2014 Jul 29
2
wordcloud y tabla de palabras [Avanzando]
...n R: 3.1.1 require(tm) require(wordcloud) require(Rcpp) tmpinformes<-data.frame(c("todo el informe 2005", "todo el informe 2013"), row.names=c("2005", "2013")) ds<- DataframeSource(tmpText) ds<- DataframeSource(tmpinformes) corp = Corpus(ds) corp = tm_map(corp,removePunctuation) corp = tm_map(corp,content_transformer(tolower)) corp = tm_map(corp,removeNumbers) corp = tm_map(corp, stripWhitespace) corp = tm_map(corp, removeWords, sw) corp = tm_map(corp, removeWords, stopwords("spanish")) term.matrix<- TermDocumentMatrix(corp) term.matrix...
2014 Jul 22
2
Ayuda Error in `colnames<-`(`*tmp*`, value = c(
...2) > d1<-readLines(txt1, encoding="UTF-8") > d1<-iconv(enc2utf8(d1), sub = "byte") > d2<-readLines(txt2, encoding="UTF-8") > d2<-iconv(enc2utf8(d2), sub = "byte") > df<-c(d1,d2) > corpus<-Corpus(VectorSource(df)) > d<-tm_map(corpus, content_transformer(tolower)) > d<-tm_map(d, stripWhitespace) > d<-tm_map(d, removePunctuation) > sw<-readLines("./StopWords.txt", encoding="UTF-8") > sw<-iconv(enc2utf8(sw), sub="byte") > d<-tm_map(d, removeWords, sw) > d&l...
2014 Nov 22
2
Problemas con tm
Estimados compañeros tengo un problema con la librería tm o con windows 8.1 o con algo que no controlo. Hace tiempo con windows 7 y una versión anterior de R ejecutaba este código: library(tm) data("crude") crude <- tm_map(crude, tolower) tdm<-TermDocumentMatrix(crude) y sin problemas me creaba tdm. Ahora si lo ejecuto me da el siguiente error: Error: inherits(doc, "TextDocument") is not TRUE Pero si quito la línea de código crude <- tm_map(crude, tolower) Me crea tdm sin problema. ¿Qué está pas...
2014 Jun 17
2
No es un problema de tm tienes doc.corpus vacío
...Musica/Black > metal/Analisis texto/Inmortal"inmortal = readLines(TEXTFILE)inmortal = > readLines(TEXTFILE)length(inmortal)head(inmortal)tail(inmortal)library(tm)vec > <- VectorSource(inmortal)corpus <- > Corpus(vec)summary(corpus)inspect(corpus[1:7])corpus <- tm_map(corpus, > tolower)corpus <- tm_map(corpus, removePunctuation)corpus <- tm_map(corpus, > removeNumbers)corpus <- tm_map(corpus, removeWords, > stopwords("english"))inspect(doc.corpus[1:2])library(SnowballC)corpus <- > tm_map(corpus, stemDocument)corpus <- tm_map(...
2012 Dec 13
2
Tamaño de la matriz de términos y memoria. Paquete TM
...lt;- readLines("D:/Publico/Documents/texto1.txt",encoding="UTF-8") txt = iconv(txt, to="ASCII//TRANSLIT") # construye un corpus corpus <- Corpus(VectorSource(txt)) # lleva a minúsculas corpus <- tm_map(corpus, tolower) # quita espacios en blanco corpus <- tm_map(corpus, stripWhitespace) # remueve la puntuación corpus <- tm_map(corpus, removePunctuation) # carga el archivo de palabras vacías personalizada en español y lo convierte a ASCII sw &...
2014 Jun 18
2
No es un problema de tm tienes doc.corpus vacío
...rtal"inmortal = readLines(TEXTFILE)inmortal > >> = readLines(TEXTFILE)length(inmortal)head(inmortal)tail( > >> inmortal)library(tm)vec > >> <- VectorSource(inmortal)corpus <- > >> Corpus(vec)summary(corpus)inspect(corpus[1:7])corpus <- > >> tm_map(corpus, tolower)corpus <- tm_map(corpus, > >> removePunctuation)corpus <- tm_map(corpus, removeNumbers)corpus <- > >> tm_map(corpus, removeWords, > >> > stopwords("english"))inspect(doc.corpus[1:2])library(SnowballC)corpus > >> <- tm_map(...
2014 Jul 25
3
wordcloud y tabla de palabras
...#47;Users/d_2/Documents/Comision/PLAN de INSPECCIONES/Informes/" >TDM<-function(informes, pathname) { info.dir<-sprintf("%s/%s", pathname, informes) info.cor<-Corpus(DirSource(directory=info.dir, encoding="UTF-8")) info.cor.cl<-tm_map(info.cor, content_transformer(tolower)) info.cor.cl<-tm_map(info.cor.cl, stripWhitespace) info.cor.cl<-tm_map(info.cor.cl,removePunctuation) sw<-readLines("C:/Users/d_2/Documents/StopWords.txt", encoding="UTF-8") sw<-iconv(enc2utf8(sw), sub = &quo...
2014 Jul 28
2
wordcloud y tabla de palabras
...de > INSPECCIONES/Informes/" > > > >>TDM<-function(informes, pathname) { > > info.dir<-sprintf("%s/%s", pathname, informes) > > info.cor<-Corpus(DirSource(directory=info.dir, encoding="UTF-8")) > > info.cor.cl<-tm_map(info.cor, content_transformer(tolower)) > > info.cor.cl<-tm_map(info.cor.cl, stripWhitespace) > > info.cor.cl<-tm_map(info.cor.cl,removePunctuation) > > sw<-readLines("C:/Users/d_2/Documents/StopWords.txt", encoding="UTF-8") > &gt...
2012 Jan 13
4
Troubles with stemming (tm + Snowball packages) under MacOS
...(I have tried several versions) I have installed all the needed packages (tm, rJava, rWeka, Snowball) + dependencies. I have desactivated AWT (like written in http://r.789695.n4.nabble.com/Problem-with-Snowball-amp-RWeka-td3402126.html) with : Sys.setenv(NOAWT=TRUE) The command tm_map(reuters, stemDocument) gives the following errors : - First time: Error in .jnew(name) : java.lang.InternalError: Can't start the AWT because Java was started on the first thread. Make sure StartOnFirstThread is not specified in your application's Info.plist or on the command line...
2017 Jun 12
0
count number of stop words in R
Defining data as you mentioned in your respond causes the following error: Error in UseMethod("tm_map", x) : no applicable method for 'tm_map' applied to an object of class "character" I can solve this error by using Corpus(VectorSource(my string)) and the using your command but I cannot see the number of stop words in my string! On Monday, June 12, 2017 8:36 AM, Patrick...
2011 Apr 18
0
Help with cleaning a corpus
Hi! I created a corpus and I started to clean through this piece of code: txt <-tm_map(txt,removeWords, stopwords("spanish")) txt <-tm_map(txt,stripWhitespace) txt <-tm_map(txt,tolower) txt <-tm_map(txt,removeNumbers) txt <-tm_map(txt,removePunctuation) But something happpended: some of the documents in the corpus became empty, this is a problem when i try to...
2017 Jun 12
3
count number of stop words in R
...54.614.1178 ________________________________ From: Elahe chalabi <chalabi.elahe at yahoo.de> Sent: Monday, June 12, 2017 11:23:42 AM To: Patrick Casimir; Bert Gunter Cc: R-help Mailing List Subject: Re: [R] count number of stop words in R Thanks for your reply. I know the command data <- tm_map(data, removeWords, stopwords("english")) removes English stop words, I don't know how should I count stop words of my string: str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over ....
2014 Jun 18
3
No es un problema de tm tienes doc.corpus vacío
...> > > >> readLines(TEXTFILE)length(inmortal)head(inmortal)tail( > > > >> inmortal)library(tm)vec > > > >> <- VectorSource(inmortal)corpus <- > > > >> Corpus(vec)summary(corpus)inspect(corpus[1:7])corpus <- > > > >> tm_map(corpus, tolower)corpus <- tm_map(corpus, > > > >> removePunctuation)corpus <- tm_map(corpus, removeNumbers)corpus <- > > > >> tm_map(corpus, removeWords, > > > >> > > > stopwords("english"))inspect(doc.corpus[1:2])library(Snow...
2010 Feb 16
0
tm package
Hi, I'm using version 0.5.1 of tm package with R 2.10.1. It looks to me as if after the following reuters21578 <- Corpus(DirSource(corpusDir), readerControl = list(reader = readReut21578XMLasPlain)) reuters21578 <- tm_map(reuters21578, stripWhitespace) reuters21578 <- tm_map(reuters21578, tolower) reuters21578 <- tm_map(reuters21578, removePunctuation) reuters21578 <- tm_map(reuters21578, removeNumbers) reuters21578.dtm <- DocumentTermMatrix(reuters21578) that reuters21578.dtm does not i...
2017 Jun 12
3
count number of stop words in R
You can define stop words as below. data <- tm_map(data, removeWords, stopwords("english")) Patrick Casimir, PhD Health Analytics, Data Science, Big Data Expert & Independent Consultant C: 954.614.1178 ________________________________ From: R-help <r-help-bounces at r-project.org> on behalf of Bert Gunter <bgunter.4567 at...
2017 Jun 12
0
count number of stop words in R
Thanks for your reply. I know the command data <- tm_map(data, removeWords, stopwords("english")) removes English stop words, I don't know how should I count stop words of my string: str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over ....