search for: vectorsourc

Displaying 20 results from an estimated 22 matches for "vectorsourc".

Did you mean: vectorsource
2015 Apr 10
3
Loop sobre muchos data frames
...car n-gramas, construir cada termDocumentMatrix y finalmente reunir todo en una sola TDM. Pero estoy atorado en el paso de transformar cada uno de los archivos en corpus mediante un loop. Es decir que en lugar de hacer esto infinitas veces: #Librerias necesarias library(tm) corpus_001<-Corpus(VectorSource(qBlog001)) corpus_002<-Corpus(VectorSource(qBlog002)) corpus_003<-Corpus(VectorSource(qBlog003)) ......... corpus_150<-Corpus(VectorSource(qBlog150)) ........ quisiera poder armar un loop que haga el trabajo, como por ejemplo #lista con los nombres que quiero para cada corpus c_names...
2015 Apr 10
5
Loop sobre muchos data frames
...jo. Aparentemente no lo estoy aplicando bien, pues el objeto que obtengo no contiene lo que quiero. Me explico, al ejecutar txt <- vector('list', length = length(names)) #names el el vector donde ya tenía almacenada la lista de txt's for(i in seq_along(txt)){ txt[[i]] <- Corpus(VectorSource(names[i])) } obtengo el objeto txt: > class(txt) [1] "list" si extraigo solamente el primer objeto de esa lista: > txt[[1]] <<VCorpus (documents: 1, metadata (corpus/indexed): 0/0)>> si quiero ver el contenido del primer copus > inspect(txt[[1]]) <<VCorpus...
2015 Apr 12
2
Loop sobre muchos data frames
...objeto que obtengo no contiene lo que quiero. >> Me explico, al ejecutar >> >> txt <- vector('list', length = length(names)) #names el el vector donde >> ya tenía almacenada la lista de txt's >> for(i in seq_along(txt)){ >> txt[[i]] <- Corpus(VectorSource(names[i])) >> } >> >> obtengo el objeto txt: >> > class(txt) >> [1] "list" >> >> si extraigo solamente el primer objeto de esa lista: >> > txt[[1]] >> <<VCorpus (documents: 1, metadata (corpus/indexed): 0/0)>> >&g...
2013 Apr 09
3
Question on Stopword Removal from a Cyrillic (Bulgarian)Text
...blank.lines.skip=TRUE, fileEncoding='CP1251', encoding='CP1251') (also tried with UTF-8 here on a correspondingly encoded file) I currently only test with a corpus based on the contents of just one variable, and I construct the corpus from a VectorSource. When I run inspect, all seems fine and I can see the text properly, with unicode characters present: data.corpus<-Corpus(VectorSource(data$variable,encoding='UTF-8'), readerControl=list(language='bulgarian')) However, no matter what I do - like which enc...
2013 Apr 09
3
Question on Stopword Removal from a Cyrillic (Bulgarian)Text
...blank.lines.skip=TRUE, fileEncoding='CP1251', encoding='CP1251') (also tried with UTF-8 here on a correspondingly encoded file) I currently only test with a corpus based on the contents of just one variable, and I construct the corpus from a VectorSource. When I run inspect, all seems fine and I can see the text properly, with unicode characters present: data.corpus<-Corpus(VectorSource(data$variable,encoding='UTF-8'), readerControl=list(language='bulgarian')) However, no matter what I do - like which enc...
2009 Nov 12
2
package "tm" fails to remove "the" with remove stopwords
...sing something. Please see my simple example, output, and sessionInfo() below. Thanks! Mark require(tm) myDocument <- c("the rain in Spain", "falls mainly on the plain", "jack and jill ran up the hill", "to fetch a pail of water") text.corp <- Corpus(VectorSource(myDocument)) ######################### text.corp <- tm_map(text.corp, stripWhitespace) text.corp <- tm_map(text.corp, removeNumbers) text.corp <- tm_map(text.corp, removePunctuation) ## text.corp <- tm_map(text.corp, stemDocument) text.corp <- tm_map(text.corp, removeWords, c("...
2013 Oct 08
1
how to check the accuracy for maxent ?
...g through this example of maxent use: http://cran.r-project.org/web/packages/maxent/maxent.pdf # LOAD LIBRARY library(maxent) # READ THE DATA, PREPARE THE CORPUS, and CREATE THE MATRIX data <- read.csv(system.file("data/NYTimes.csv.gz",package="maxent")) corpus <- Corpus(VectorSource(data$Title[1:150])) matrix <- DocumentTermMatrix(corpus) # TRAIN/PREDICT USING SPARSEM REPRESENTATION sparse <- as.compressed.matrix(matrix) model <- maxent(sparse[1:100,],data$Topic.Code[1:100]) results <- predict(model,sparse[101:150,]) Any idea how I can check the accuracy wrt th...
2012 Jan 27
2
tm package: handling contractions
I tried making a wordcloud of Obama's State of the Union address using the tm package to process the text sotu <- scan(file="c:/R/data/sotu2012.txt", what="character") sotu <- tolower(sotu) corp <-Corpus(VectorSource(paste(sotu, collapse=" "))) corp <- tm_map(corp, removePunctuation) corp <- tm_map(corp, stemDocument) corp <- tm_map(corp, function(x)removeWords(x,stopwords())) tdm <- TermDocumentMatrix(corp) m <- as.matrix(tdm) v <- sort(rowSums(m),decreasing=TRUE) d <- data.fra...
2012 Oct 25
2
Minería de texto
...the textmining library require(tm) require(wordcloud) tw.df=twListToDF(tweets) RemoveAtPeople <- function(x){gsub("@\\w+", "",x)} df<- as.vector(sapply(tw.df$text, RemoveAtPeople)) #The following is cribbed and seems to do what it says on the can tw.corpus = Corpus(VectorSource(df)) tw.corpus = tm_map(tw.corpus, function(x) iconv(enc2utf8(x), sub = "byte")) tw.corpus = tm_map(tw.corpus, tolower) tw.corpus = tm_map(tw.corpus, removePunctuation) tw.corpus = tm_map(tw.corpus, function(x) removeWords(x, c(stopwords("spanish"),"rt"))) tw.c...
2014 Jul 22
2
Ayuda Error in `colnames<-`(`*tmp*`, value = c(
...", ".txt", pdf2) > d1<-readLines(txt1, encoding="UTF-8") > d1<-iconv(enc2utf8(d1), sub = "byte") > d2<-readLines(txt2, encoding="UTF-8") > d2<-iconv(enc2utf8(d2), sub = "byte") > df<-c(d1,d2) > corpus<-Corpus(VectorSource(df)) > d<-tm_map(corpus, content_transformer(tolower)) > d<-tm_map(d, stripWhitespace) > d<-tm_map(d, removePunctuation) > sw<-readLines("./StopWords.txt", encoding="UTF-8") > sw<-iconv(enc2utf8(sw), sub="byte") > d<-tm_map(d, remov...
2012 Dec 13
2
Tamaño de la matriz de términos y memoria. Paquete TM
...library(Snowball) # lee el documento UTF-8 y lo convierte a ASCII txt <- readLines("D:/Publico/Documents/texto1.txt",encoding="UTF-8") txt = iconv(txt, to="ASCII//TRANSLIT") # construye un corpus corpus <- Corpus(VectorSource(txt)) # lleva a minúsculas corpus <- tm_map(corpus, tolower) # quita espacios en blanco corpus <- tm_map(corpus, stripWhitespace) # remueve la puntuación corpus <- tm_map(corpus, removePunctuation) # carga el archivo de palabras vacías per...
2012 Apr 13
4
Help with stemDocument
Hi, All: I am new to R and tm package. I'm trying to do the stemming using tm_map() and it doesn't seem to work: *I used:* > stemDocument(t_cmts[[100]]) *Where t_cmts is the corpus object, the results is:* bottle loose box abt airpak sections top plastic bottle squashed nearly flush neck previous shipments bottle wrapped securely bubble wrap wno bottle damage packaging poor
2017 Jun 12
0
count number of stop words in R
Defining data as you mentioned in your respond causes the following error: Error in UseMethod("tm_map", x) : no applicable method for 'tm_map' applied to an object of class "character" I can solve this error by using Corpus(VectorSource(my string)) and the using your command but I cannot see the number of stop words in my string! On Monday, June 12, 2017 8:36 AM, Patrick Casimir <patrcasi at nova.edu> wrote: define your string as whatever object you want: data <- "Mhm . Alright . There's um a young boy that...
2014 Jun 17
2
No es un problema de tm tienes doc.corpus vacío
...> > > > > > > > > > > *TEXTFILE = "/home/rubent/Documentos/Sociologia/Soc Musica/Black > metal/Analisis texto/Inmortal"inmortal = readLines(TEXTFILE)inmortal = > readLines(TEXTFILE)length(inmortal)head(inmortal)tail(inmortal)library(tm)vec > <- VectorSource(inmortal)corpus <- > Corpus(vec)summary(corpus)inspect(corpus[1:7])corpus <- tm_map(corpus, > tolower)corpus <- tm_map(corpus, removePunctuation)corpus <- tm_map(corpus, > removeNumbers)corpus <- tm_map(corpus, removeWords, > stopwords("english"))inspect(doc.co...
2012 Feb 26
2
tm_map help
...shtags like #asx, #obama work ok. Appreciate any help. Thanks, Sachin library(twitteR) library(tm) library(wordcloud) hashTag<-function (hashTag, minFreq){ tweets<- searchTwitter(hashTag, n=200) df <- do.call("rbind", lapply(tweets, as.data.frame)) myCorpus <- Corpus(VectorSource(df$text)) myCorpus <- tm_map(myCorpus, function(x) iconv(enc2utf8(x), sub = "byte")) myCorpus <- tm_map(myCorpus, tolower) myCorpus <- tm_map(myCorpus, removePunctuation) myCorpus <- tm_map(myCorpus, removeNumbers) myStopwords <- c(stopwords('english'), "avail...
2017 Jun 12
3
count number of stop words in R
define your string as whatever object you want: data <- "Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink . And the dishes might get falled over if you don't fell
2014 Jun 18
2
No es un problema de tm tienes doc.corpus vacío
...gt; >> *TEXTFILE = "/home/rubent/Documentos/Sociologia/Soc Musica/Black > >> metal/Analisis texto/Inmortal"inmortal = readLines(TEXTFILE)inmortal > >> = readLines(TEXTFILE)length(inmortal)head(inmortal)tail( > >> inmortal)library(tm)vec > >> <- VectorSource(inmortal)corpus <- > >> Corpus(vec)summary(corpus)inspect(corpus[1:7])corpus <- > >> tm_map(corpus, tolower)corpus <- tm_map(corpus, > >> removePunctuation)corpus <- tm_map(corpus, removeNumbers)corpus <- > >> tm_map(corpus, removeWords, > >...
2014 Jun 18
3
No es un problema de tm tienes doc.corpus vacío
...a/Soc Musica/Black > > > >> metal/Analisis texto/Inmortal"inmortal = > > > >> readLines(TEXTFILE)inmortal = > > > >> readLines(TEXTFILE)length(inmortal)head(inmortal)tail( > > > >> inmortal)library(tm)vec > > > >> <- VectorSource(inmortal)corpus <- > > > >> Corpus(vec)summary(corpus)inspect(corpus[1:7])corpus <- > > > >> tm_map(corpus, tolower)corpus <- tm_map(corpus, > > > >> removePunctuation)corpus <- tm_map(corpus, removeNumbers)corpus <- > > > >&g...
2014 Jul 29
2
wordcloud y tabla de palabras [Avanzando]
...Informes/2013/2013_21SeguridadCiudadana.txt", encoding="UTF-8") >>> info.05<-iconv(enc2utf8(info.05), sub="byte") >>> info.13<-iconv(enc2utf8(info.13), sub="byte") >>> informes<-c(info.05, info.13) >>> corpus<-Corpus(VectorSource(informes)) >>> inspect(corpus[1:2]) >> <<VCorpus (documents: 2, metadata (corpus/indexed): 0/0)>> >> >> [[1]] >> <<PlainTextDocument (metadata: 7)>> >> Derecho a la seguridad ciudadana. Toda persona tiene derecho a la >> protecci...
2017 Jun 12
0
count number of stop words in R
You can use regular expressions. ?regex and/or the stringr package are good places to start. Of course, you have to define "stop words." Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Jun 12, 2017 at 5:40