thr3ads.net - search: "termdocumentmatrix"

Displaying 20 results from an estimated 35 matches for "termdocumentmatrix".

2010 Oct 11

topicmodels error

I try to fit a LDA model to a TermDocumentMatrix with the topicmodels package... but R says: > Error in LDA(TDM, k = k, method = "Gibbs", control = list(seed = SEED, : > x is of class ?TermDocumentMatrix??simple_triplet_matrix? > class(TDM) > [1] "TermDocumentMatrix" "simple_triplet_matrix" I try t...

wordcloud y tabla de palabras

2014 Jul 25

wordcloud y tabla de palabras

...info.cor.cl<-tm_map(info.cor.cl,removePunctuation) sw<-readLines("C:/Users/d_2/Documents/StopWords.txt", encoding="UTF-8") sw<-iconv(enc2utf8(sw), sub = "byte") info.cor.cl<-tm_map(info.cor.cl, removeWords, stopwords("spanish")) info.tdm<-TermDocumentMatrix(info.cor.cl) result<-list(name = informes, tdm= info.tdm) } >tdm<-lapply(informes, TDM, path = pathname) Resultado: > tdm [[1]] [[1]]$name [1] "2013" [[1]]$tdm <<TermDocumentMatrix (terms: 1540, documents: 1)>> Non-/sparse entries: 1540/0 Sparsity : 0...

wordcloud y tabla de palabras

2014 Jul 28

wordcloud y tabla de palabras

...nte, seguí el ejemplo de wordclouds, y al igual que > anteriormente logró hacer la nube de texto, pero sólo por cada uno de > los textos considerados. > Tengo los dos "corpus clean" por cada uno de los informes que estoy > considerando: año 2005 y 2013. > > >tdm05<-TermDocumentMatrix(cor.05.cl) > >tdm13<-TermDocumentMatrix(cor.13.cl) > > m05<-as.matrix(tdm05) > > m13<-as.matrix(tdm13) > >v05 <- sort(rowSums(m05),decreasing=TRUE) > > v13 <- sort(rowSums(m13),decreasing=TRUE) > > df05<-data.frame(word = names(v05), freq=v05)...

Loop sobre muchos data frames

2015 Apr 12

Loop sobre muchos data frames

Jorge, estimados colaboradores de R-help Estuve tratando de utilizar un script para uno de los pasos en mi análisis, que es transformar cada uno de los corpus en mi espacio de trabajo en un objeto TermDocumentMatrix Tengo un vector llamado bNames que lista todos los corpus que quiero pasar a TDM, y construí los siguientes comandos: tdm.n1 <- vector('list', length = length(bNames)) for(i in seq_along(tdm.n1)){ tdm.n1.[[i]] <- TermDocumentMatrix(bNames[[i]],control=list(tokenize=nGram1Tok)) }...

wordcloud y tabla de palabras [Avanzando]

2014 Jul 29

wordcloud y tabla de palabras [Avanzando]

...mes) corp = Corpus(ds) corp = tm_map(corp,removePunctuation) corp = tm_map(corp,content_transformer(tolower)) corp = tm_map(corp,removeNumbers) corp = tm_map(corp, stripWhitespace) corp = tm_map(corp, removeWords, sw) corp = tm_map(corp, removeWords, stopwords("spanish")) term.matrix<- TermDocumentMatrix(corp) term.matrix<- as.matrix(term.matrix) colnames(term.matrix) <- c("Año2005","Año2013") png(file="Org2005vs2013.png",height=600,width=1200) par(mfrow=c(1,2)) comparison.cloud(term.matrix,max.words=300,random.order=FALSE,colors=c("#1F497D","#C050...

Cannot allocate a vector of size...

2020 Feb 10

Cannot allocate a vector of size...

...., 7 feb. 2020 17:26, <miriam.alzate en unavarra.es> escribió: >> > >> >> Buenas tardes, >> >> >> >> Estoy haciendo un análisis de contenido con el paquete tm. A la hora >> de >> >> ejecutar este código: >> >> tdm<-TermDocumentMatrix(corpus,control=list(weighting =weightTf)) >> >> tdm.reviews.m<-as.matrix(tdm) >> >> >> >> La primera línea sí me la ejecuta bien pero en la segunda tengo este >> >> error: >> >> Error: cannot allocate vector of size 14.0 Gb >> &g...

Cannot allocate a vector of size...

2020 Feb 07

Cannot allocate a vector of size...

Buenas tardes, Estoy haciendo un análisis de contenido con el paquete tm. A la hora de ejecutar este código: tdm<-TermDocumentMatrix(corpus,control=list(weighting =weightTf)) tdm.reviews.m<-as.matrix(tdm) La primera línea sí me la ejecuta bien pero en la segunda tengo este error: Error: cannot allocate vector of size 14.0 Gb ¿Cómo puedo corregirlo? Estoy usando la versión de 64bits de R. Un saludo Miriam

SVD Memory Issue

2011 Sep 13

SVD Memory Issue

...terms (the DTM is 771 x 5677). When I try to do the SVD, it runs out of memory. I am using a 12GB Dual core Machine with Windows XP and don't think I can increase the memory anymore. Are there any other memory efficient methods to find the SVD? The term document is obtained using: tdm2 <- TermDocumentMatrix(tr1,control=list(weighting=weightTf,minWordLength=3)) str(tdm2) List of 6 $ i : int [1:6438] 202 729 737 278 402 621 654 718 157 380 ... $ j : int [1:6438] 1 2 3 7 7 7 7 8 10 10 ... $ v : num [1:6438] 8 5 6 9 5 7 5 6 5 7 ... $ nrow : int 771 $ ncol : int 5677 $ dimnam...

Cannot allocate a vector of size...

2020 Feb 10

Cannot allocate a vector of size...

...do > trabajas con una matriz mayoritariamente con 0s, puedes representar-la en > forma de sparse matrix, y ocupa mucho menos espacio porque no guardas > todos > los valores, sino aquellos distintos de 0 y su posición. > > Estas construyendo la matriz sparse con esto: > tdm<-TermDocumentMatrix(corpus,control=list(weighting =weightTf)) > > puedes ver aquí > <https://www.rdocumentation.org/packages/tm/versions/0.7-7/topics/TermDocumentMatrix> > la documentación. > > Al hacer esto, conviertes la matrz sparse a matriz normal y pones en > memoria todos los 0s, que a...

filtering out unwanted words in a Term Document Matrix

2011 May 11

filtering out unwanted words in a Term Document Matrix

...iolent","litigation", "prisoner", "corporate", "lockout", "disposition", "discharge", "reason")) I get the following error: "no applicable method for 'tm_intersect' applied to an object of class "c('TermDocumentMatrix', 'simple_triplet_matrix')" " What am I doing wrong? I'd greatly appreciate any ideas or thoughts on this!!!! Thank you!! Thomas Heiman, PhD Info Systems Eng, Sr The MITRE Corporation | Center for Enterprise Modernization Office: 703-983-2951 | theiman@mitre.org<mai...

findFreqTerms vs minDocFreq in Package 'tm'

2011 Sep 12

findFreqTerms vs minDocFreq in Package 'tm'

...with both. I have given the results from both the commands below: findFreqTerms identifies 3140 words that appear more than 5 times but minDocFreq identifies only 659 terms. Can someone please explain the reason for the different or whether I have misunderstood their definitions?? >tdm1 <- TermDocumentMatrix(tr1,control=list(weighting=weightBin)) > freq_terms <- findFreqTerms(tdm1, lowfreq =5, highfreq = Inf) > str(freq_terms) chr [1:3140] "abc" "abil" "abl" "abnorm" "abort" "absenc" ... > tdm2 <- TermDocumentMatrix(tr1,cont...

Loop sobre muchos data frames

2015 Apr 10

Loop sobre muchos data frames

...rsos con que >> cuento tuve que separar los archivos de texto de input del proyecto en >> muchos archivos pequeños. >> Luego de transformar cada uno de estos archivos en un corpus separado, >> puedo aplicar limpieza sobre cada corpus, buscar n-gramas, construir cada >> termDocumentMatrix y finalmente reunir todo en una sola TDM. >> >> Pero estoy atorado en el paso de transformar cada uno de los archivos en >> corpus mediante un loop. Es decir que en lugar de hacer esto infinitas >> veces: >> >> #Librerias necesarias >> library(tm) >>...

Cannot allocate a vector of size...

2020 Feb 07

Cannot allocate a vector of size...

...t; Tienes alternativas para dividir la matriz? > > El vie., 7 feb. 2020 17:26, <miriam.alzate en unavarra.es> escribió: > >> Buenas tardes, >> >> Estoy haciendo un análisis de contenido con el paquete tm. A la hora de >> ejecutar este código: >> tdm<-TermDocumentMatrix(corpus,control=list(weighting =weightTf)) >> tdm.reviews.m<-as.matrix(tdm) >> >> La primera línea sí me la ejecuta bien pero en la segunda tengo este >> error: >> Error: cannot allocate vector of size 14.0 Gb >> >> ¿Cómo puedo corregirlo? Estoy usando l...

Problemas con tm

2014 Nov 22

Problemas con tm

Estimados compañeros tengo un problema con la librería tm o con windows 8.1 o con algo que no controlo. Hace tiempo con windows 7 y una versión anterior de R ejecutaba este código: library(tm) data("crude") crude <- tm_map(crude, tolower) tdm<-TermDocumentMatrix(crude) y sin problemas me creaba tdm. Ahora si lo ejecuto me da el siguiente error: Error: inherits(doc, "TextDocument") is not TRUE Pero si quito la línea de código crude <- tm_map(crude, tolower) Me crea tdm sin problema. ¿Qué está pasando? Muchas gracias Juan -- Juan Ant...

Loop sobre muchos data frames

2015 Apr 10

Loop sobre muchos data frames

...ecto de text mining y por razones de los recursos con que cuento tuve que separar los archivos de texto de input del proyecto en muchos archivos pequeños. Luego de transformar cada uno de estos archivos en un corpus separado, puedo aplicar limpieza sobre cada corpus, buscar n-gramas, construir cada termDocumentMatrix y finalmente reunir todo en una sola TDM. Pero estoy atorado en el paso de transformar cada uno de los archivos en corpus mediante un loop. Es decir que en lugar de hacer esto infinitas veces: #Librerias necesarias library(tm) corpus_001<-Corpus(VectorSource(qBlog001)) corpus_002<-Corpus(V...

Library (tm) Error: could not find function "TermDocMatrix".

2010 Apr 23

Library (tm) Error: could not find function "TermDocMatrix".

Hi List I have the next code and the error. I have try with other codes and I have the same problem. > reut21578 <- system.file("texts", "crude", package = "tm") > (r <- Corpus(DirSource(reut21578), readerControl = list(reader = > readReut21578XMLasPlain))) A corpus with 20 text documents > (r <- Corpus(DirSource(reut21578), readerControl =

DocumentTermMatrix error

2011 May 21

DocumentTermMatrix error

...ot;Heading", "local") <- c("test") meta(tekst[[1]]) >Available meta data pairs are: Author : DateTimeStamp: 2011-05-21 11:25:21 Description : Heading : test ID : test.txt Language : en Origin : test <- TermDocumentMatrix(tekst) > Error in tolower(txt) : > invalid input 'PROD Z LAHKO GNETNO MELJNO GLINO, ... in 'utf8towcs' Attached is a small sample (test.txt) on which i worked. Any help would be appreaciated, m -------------- next part -------------- An embedded and charset-...

TM reader with text

2012 Feb 29

TM reader with text

...t; "<U+FB01>nanciers" "<U+FB01>xe" Some french words are not well reading by TM with the reader readPlain. I try to use reader= reader PDF. But it doesn't work so I must transformed PDF text to text. And some words are not understand so when I use TermDocumentMatrix a word like inflation diseappear. It's a big probleme for me. I spend lot of time on this problem, any idea ? Thank's for you time. Best regard"s Micka?l -- View this message in context: http://r.789695.n4.nabble.com/TM-reader-with-text-tp4433394p4433394.html Sent from the R help...

tm package: handling contractions

2012 Jan 27

tm package: handling contractions

...ata/sotu2012.txt", what="character") sotu <- tolower(sotu) corp <-Corpus(VectorSource(paste(sotu, collapse=" "))) corp <- tm_map(corp, removePunctuation) corp <- tm_map(corp, stemDocument) corp <- tm_map(corp, function(x)removeWords(x,stopwords())) tdm <- TermDocumentMatrix(corp) m <- as.matrix(tdm) v <- sort(rowSums(m),decreasing=TRUE) d <- data.frame(word = names(v),freq=v) wordcloud(d$word,d$freq) I ended up with a large number of contractions that were split at the "?" character, e.g., "don?t" --> "don'" e.g., &gt...

Minería de texto

2012 Oct 25

Minería de texto

...rt"))) tw.corpus = tm_map(tw.corpus, removeWords, my.stopwords) tw.corpus = tm_map(tw.corpus, stripWhitespace) sw <- readLines("stopwords.es.txt",encoding="UTF-8") sw = iconv(sw, to="ASCII//TRANSLIT") tw.corpus = tm_map(tw.corpus, removeWords, sw) doc.m = TermDocumentMatrix(tw.corpus, control = list(minWordLength = 2)) dm = as.matrix(doc.m) # calculate the frequency of words v = sort(rowSums(dm), decreasing=TRUE) d = data.frame(word=names(v), freq=v) #Generate the wordcloud pal2 <- brewer.pal(8,"Dark2") wc=wordcloud(d$word, d$freq, min.freq=min.fre...

search for: termdocumentmatrix