thr3ads.net - search: "vectorsourc"

Displaying 20 results from an estimated 22 matches for "vectorsourc".

Did you mean: vectorsource

2015 Apr 10

Loop sobre muchos data frames

...car n-gramas, construir cada termDocumentMatrix y finalmente reunir todo en una sola TDM. Pero estoy atorado en el paso de transformar cada uno de los archivos en corpus mediante un loop. Es decir que en lugar de hacer esto infinitas veces: #Librerias necesarias library(tm) corpus_001<-Corpus(VectorSource(qBlog001)) corpus_002<-Corpus(VectorSource(qBlog002)) corpus_003<-Corpus(VectorSource(qBlog003)) ......... corpus_150<-Corpus(VectorSource(qBlog150)) ........ quisiera poder armar un loop que haga el trabajo, como por ejemplo #lista con los nombres que quiero para cada corpus c_names...

Loop sobre muchos data frames

2015 Apr 10

Loop sobre muchos data frames

...jo. Aparentemente no lo estoy aplicando bien, pues el objeto que obtengo no contiene lo que quiero. Me explico, al ejecutar txt <- vector('list', length = length(names)) #names el el vector donde ya tenía almacenada la lista de txt's for(i in seq_along(txt)){ txt[[i]] <- Corpus(VectorSource(names[i])) } obtengo el objeto txt: > class(txt) [1] "list" si extraigo solamente el primer objeto de esa lista: > txt[[1]] <<VCorpus (documents: 1, metadata (corpus/indexed): 0/0)>> si quiero ver el contenido del primer copus > inspect(txt[[1]]) <<VCorpus...

Loop sobre muchos data frames

2015 Apr 12

Loop sobre muchos data frames

...objeto que obtengo no contiene lo que quiero. >> Me explico, al ejecutar >> >> txt <- vector('list', length = length(names)) #names el el vector donde >> ya tenía almacenada la lista de txt's >> for(i in seq_along(txt)){ >> txt[[i]] <- Corpus(VectorSource(names[i])) >> } >> >> obtengo el objeto txt: >> > class(txt) >> [1] "list" >> >> si extraigo solamente el primer objeto de esa lista: >> > txt[[1]] >> <<VCorpus (documents: 1, metadata (corpus/indexed): 0/0)>> >&g...

Question on Stopword Removal from a Cyrillic (Bulgarian)Text

2013 Apr 09

Question on Stopword Removal from a Cyrillic (Bulgarian)Text

...blank.lines.skip=TRUE, fileEncoding='CP1251', encoding='CP1251') (also tried with UTF-8 here on a correspondingly encoded file) I currently only test with a corpus based on the contents of just one variable, and I construct the corpus from a VectorSource. When I run inspect, all seems fine and I can see the text properly, with unicode characters present: data.corpus<-Corpus(VectorSource(data$variable,encoding='UTF-8'), readerControl=list(language='bulgarian')) However, no matter what I do - like which enc...

Question on Stopword Removal from a Cyrillic (Bulgarian)Text

2013 Apr 09

Question on Stopword Removal from a Cyrillic (Bulgarian)Text

package "tm" fails to remove "the" with remove stopwords

2009 Nov 12

package "tm" fails to remove "the" with remove stopwords

...sing something. Please see my simple example, output, and sessionInfo() below. Thanks! Mark require(tm) myDocument <- c("the rain in Spain", "falls mainly on the plain", "jack and jill ran up the hill", "to fetch a pail of water") text.corp <- Corpus(VectorSource(myDocument)) ######################### text.corp <- tm_map(text.corp, stripWhitespace) text.corp <- tm_map(text.corp, removeNumbers) text.corp <- tm_map(text.corp, removePunctuation) ## text.corp <- tm_map(text.corp, stemDocument) text.corp <- tm_map(text.corp, removeWords, c("...

how to check the accuracy for maxent ?

2013 Oct 08

how to check the accuracy for maxent ?

...g through this example of maxent use: http://cran.r-project.org/web/packages/maxent/maxent.pdf # LOAD LIBRARY library(maxent) # READ THE DATA, PREPARE THE CORPUS, and CREATE THE MATRIX data <- read.csv(system.file("data/NYTimes.csv.gz",package="maxent")) corpus <- Corpus(VectorSource(data$Title[1:150])) matrix <- DocumentTermMatrix(corpus) # TRAIN/PREDICT USING SPARSEM REPRESENTATION sparse <- as.compressed.matrix(matrix) model <- maxent(sparse[1:100,],data$Topic.Code[1:100]) results <- predict(model,sparse[101:150,]) Any idea how I can check the accuracy wrt th...

tm package: handling contractions

2012 Jan 27

tm package: handling contractions

I tried making a wordcloud of Obama's State of the Union address using the tm package to process the text sotu <- scan(file="c:/R/data/sotu2012.txt", what="character") sotu <- tolower(sotu) corp <-Corpus(VectorSource(paste(sotu, collapse=" "))) corp <- tm_map(corp, removePunctuation) corp <- tm_map(corp, stemDocument) corp <- tm_map(corp, function(x)removeWords(x,stopwords())) tdm <- TermDocumentMatrix(corp) m <- as.matrix(tdm) v <- sort(rowSums(m),decreasing=TRUE) d <- data.fra...

Minería de texto

2012 Oct 25

Minería de texto

...the textmining library require(tm) require(wordcloud) tw.df=twListToDF(tweets) RemoveAtPeople <- function(x){gsub("@\\w+", "",x)} df<- as.vector(sapply(tw.df$text, RemoveAtPeople)) #The following is cribbed and seems to do what it says on the can tw.corpus = Corpus(VectorSource(df)) tw.corpus = tm_map(tw.corpus, function(x) iconv(enc2utf8(x), sub = "byte")) tw.corpus = tm_map(tw.corpus, tolower) tw.corpus = tm_map(tw.corpus, removePunctuation) tw.corpus = tm_map(tw.corpus, function(x) removeWords(x, c(stopwords("spanish"),"rt"))) tw.c...

Ayuda Error in `colnames<-`(`*tmp*`, value = c(

2014 Jul 22

Ayuda Error in `colnames<-`(`*tmp*`, value = c(

...", ".txt", pdf2) > d1<-readLines(txt1, encoding="UTF-8") > d1<-iconv(enc2utf8(d1), sub = "byte") > d2<-readLines(txt2, encoding="UTF-8") > d2<-iconv(enc2utf8(d2), sub = "byte") > df<-c(d1,d2) > corpus<-Corpus(VectorSource(df)) > d<-tm_map(corpus, content_transformer(tolower)) > d<-tm_map(d, stripWhitespace) > d<-tm_map(d, removePunctuation) > sw<-readLines("./StopWords.txt", encoding="UTF-8") > sw<-iconv(enc2utf8(sw), sub="byte") > d<-tm_map(d, remov...

Tamaño de la matriz de términos y memoria. Paquete TM

2012 Dec 13

Tamaño de la matriz de términos y memoria. Paquete TM

...library(Snowball) # lee el documento UTF-8 y lo convierte a ASCII txt <- readLines("D:/Publico/Documents/texto1.txt",encoding="UTF-8") txt = iconv(txt, to="ASCII//TRANSLIT") # construye un corpus corpus <- Corpus(VectorSource(txt)) # lleva a minúsculas corpus <- tm_map(corpus, tolower) # quita espacios en blanco corpus <- tm_map(corpus, stripWhitespace) # remueve la puntuación corpus <- tm_map(corpus, removePunctuation) # carga el archivo de palabras vacías per...

Help with stemDocument

2012 Apr 13

Help with stemDocument

Hi, All: I am new to R and tm package. I'm trying to do the stemming using tm_map() and it doesn't seem to work: *I used:* > stemDocument(t_cmts[[100]]) *Where t_cmts is the corpus object, the results is:* bottle loose box abt airpak sections top plastic bottle squashed nearly flush neck previous shipments bottle wrapped securely bubble wrap wno bottle damage packaging poor

count number of stop words in R

2017 Jun 12

count number of stop words in R

Defining data as you mentioned in your respond causes the following error: Error in UseMethod("tm_map", x) : no applicable method for 'tm_map' applied to an object of class "character" I can solve this error by using Corpus(VectorSource(my string)) and the using your command but I cannot see the number of stop words in my string! On Monday, June 12, 2017 8:36 AM, Patrick Casimir <patrcasi at nova.edu> wrote: define your string as whatever object you want: data <- "Mhm . Alright . There's um a young boy that...

No es un problema de tm tienes doc.corpus vacío

2014 Jun 17

No es un problema de tm tienes doc.corpus vacío

...> > > > > > > > > > > *TEXTFILE = "/home/rubent/Documentos/Sociologia/Soc Musica/Black > metal/Analisis texto/Inmortal"inmortal = readLines(TEXTFILE)inmortal = > readLines(TEXTFILE)length(inmortal)head(inmortal)tail(inmortal)library(tm)vec > <- VectorSource(inmortal)corpus <- > Corpus(vec)summary(corpus)inspect(corpus[1:7])corpus <- tm_map(corpus, > tolower)corpus <- tm_map(corpus, removePunctuation)corpus <- tm_map(corpus, > removeNumbers)corpus <- tm_map(corpus, removeWords, > stopwords("english"))inspect(doc.co...

tm_map help

2012 Feb 26

tm_map help

...shtags like #asx, #obama work ok. Appreciate any help. Thanks, Sachin library(twitteR) library(tm) library(wordcloud) hashTag<-function (hashTag, minFreq){ tweets<- searchTwitter(hashTag, n=200) df <- do.call("rbind", lapply(tweets, as.data.frame)) myCorpus <- Corpus(VectorSource(df$text)) myCorpus <- tm_map(myCorpus, function(x) iconv(enc2utf8(x), sub = "byte")) myCorpus <- tm_map(myCorpus, tolower) myCorpus <- tm_map(myCorpus, removePunctuation) myCorpus <- tm_map(myCorpus, removeNumbers) myStopwords <- c(stopwords('english'), "avail...

count number of stop words in R

2017 Jun 12

count number of stop words in R

define your string as whatever object you want: data <- "Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink . And the dishes might get falled over if you don't fell

No es un problema de tm tienes doc.corpus vacío

2014 Jun 18

No es un problema de tm tienes doc.corpus vacío

...gt; >> *TEXTFILE = "/home/rubent/Documentos/Sociologia/Soc Musica/Black > >> metal/Analisis texto/Inmortal"inmortal = readLines(TEXTFILE)inmortal > >> = readLines(TEXTFILE)length(inmortal)head(inmortal)tail( > >> inmortal)library(tm)vec > >> <- VectorSource(inmortal)corpus <- > >> Corpus(vec)summary(corpus)inspect(corpus[1:7])corpus <- > >> tm_map(corpus, tolower)corpus <- tm_map(corpus, > >> removePunctuation)corpus <- tm_map(corpus, removeNumbers)corpus <- > >> tm_map(corpus, removeWords, > >...

No es un problema de tm tienes doc.corpus vacío

2014 Jun 18

No es un problema de tm tienes doc.corpus vacío

...a/Soc Musica/Black > > > >> metal/Analisis texto/Inmortal"inmortal = > > > >> readLines(TEXTFILE)inmortal = > > > >> readLines(TEXTFILE)length(inmortal)head(inmortal)tail( > > > >> inmortal)library(tm)vec > > > >> <- VectorSource(inmortal)corpus <- > > > >> Corpus(vec)summary(corpus)inspect(corpus[1:7])corpus <- > > > >> tm_map(corpus, tolower)corpus <- tm_map(corpus, > > > >> removePunctuation)corpus <- tm_map(corpus, removeNumbers)corpus <- > > > >&g...

wordcloud y tabla de palabras [Avanzando]

2014 Jul 29

wordcloud y tabla de palabras [Avanzando]

...Informes/2013/2013_21SeguridadCiudadana.txt", encoding="UTF-8") >>> info.05<-iconv(enc2utf8(info.05), sub="byte") >>> info.13<-iconv(enc2utf8(info.13), sub="byte") >>> informes<-c(info.05, info.13) >>> corpus<-Corpus(VectorSource(informes)) >>> inspect(corpus[1:2]) >> <<VCorpus (documents: 2, metadata (corpus/indexed): 0/0)>> >> >> [[1]] >> <<PlainTextDocument (metadata: 7)>> >> Derecho a la seguridad ciudadana. Toda persona tiene derecho a la >> protecci...

count number of stop words in R

2017 Jun 12

count number of stop words in R

You can use regular expressions. ?regex and/or the stringr package are good places to start. Of course, you have to define "stop words." Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Jun 12, 2017 at 5:40

search for: vectorsourc