thr3ads.net - similar to: "Min Frequency in findFreqTerms"

Displaying 20 results from an estimated 5000 matches similar to: "Min Frequency in findFreqTerms"

findFreqTerms vs minDocFreq in Package 'tm'

2011 Sep 12

findFreqTerms vs minDocFreq in Package 'tm'

I am using 'tm' package for text mining and facing an issue with finding the frequently occuring terms. From the definition it appears that findFreqTerms and minDocFreq are equivalent commands and both tries to identify the documents with terms appearing more than a specified threshold. However, I am getting drastically different results with both. I have given the results from both the

Problemas con tm

2014 Nov 22

Problemas con tm

Estimados compañeros tengo un problema con la librería tm o con windows 8.1 o con algo que no controlo. Hace tiempo con windows 7 y una versión anterior de R ejecutaba este código: library(tm) data("crude") crude <- tm_map(crude, tolower) tdm<-TermDocumentMatrix(crude) y sin problemas me creaba tdm. Ahora si lo ejecuto me da el siguiente error: Error: inherits(doc,

TM reader with text

2012 Feb 29

TM reader with text

Hello everybody, I work, I try, with TM but I have a problem with some special words in french. I think this is due to the manner to transform PDF to text, but I'm not perfectly sure. Let's see to the example : findFreqTerms(tdm1,30) [33] "<U+F0A3>" "<U+FB01>n" "<U+FB01>nancement" "<U+FB01>nancier"

wordcloud y tabla de palabras

2014 Jul 25

wordcloud y tabla de palabras

Buenas noches grupo. Saludos cordiales. He seguido en la búsqueda de una forma que me permita realizar la comparación de dos documentos pertenecientes a los años 2005 y 2013, y que pueda representar finalmente con wordcloud y con una table en la que las columnas sean los años de cada informe "2005" y "2013", y las filas sean las palabras con la frecuencia de cada una de ellas

Tamaño de la matriz de términos y memoria. Paquete TM

2012 Dec 13

Tamaño de la matriz de términos y memoria. Paquete TM

Hola a todos! Tengo algunos problemas con el tamaño de la matriz de términos que obtengo. Los comandos que utilizo son los siguientes: # carga librerias library(tm) library(wordcloud) library(Rstem) library(Snowball) # lee el documento UTF-8 y lo convierte a ASCII txt <-

wordcloud y tabla de palabras [Avanzando]

2014 Jul 29

wordcloud y tabla de palabras [Avanzando]

Buenas tardes grupo. Saludos cordiales Carlos J., muchas gracias por tu orientación. Efectivamente, me había dado cuenta que la razón por la que no se aplicaba colnames era porque no tenía columnas. La cuestión es que no logro visualizar completamente/claramente en qué parte del proceso de creación del corpus se puede hacer. Sin embargo, siguiendo el ejemplo de

topicmodels error

2010 Oct 11

topicmodels error

I try to fit a LDA model to a TermDocumentMatrix with the topicmodels package... but R says: > Error in LDA(TDM, k = k, method = "Gibbs", control = list(seed = SEED, : > x is of class ?TermDocumentMatrix??simple_triplet_matrix? > class(TDM) > [1] "TermDocumentMatrix" "simple_triplet_matrix" I try to use a matrix... but don't work: > MAT

wordcloud y tabla de palabras

2014 Jul 28

wordcloud y tabla de palabras

Hola, La referencia (gracias por proporcionarla) que has incluido es bastante clara y se puede seguir. ¿Has podido sobre tus dos discursos utilizar la misma lógica? La forma de salir de dudas, para empezar, es que adjuntaras el código que estás empleando por ver si hay algún error evidente. Aunque la forma adecuada para que te podamos ayudar es con un ejemplo reproducible: código + datos.

SVD Memory Issue

2011 Sep 13

SVD Memory Issue

I am trying to perform Singular Value Decomposition (SVD) on a Term Document Matrix I created using the 'tm' package. Eventually I want to do a Latent Semantic Analysis (LSA). There are 5677 documents with 771 terms (the DTM is 771 x 5677). When I try to do the SVD, it runs out of memory. I am using a 12GB Dual core Machine with Windows XP and don't think I can increase the memory

Loop sobre muchos data frames

2015 Apr 12

Loop sobre muchos data frames

Jorge, estimados colaboradores de R-help Estuve tratando de utilizar un script para uno de los pasos en mi análisis, que es transformar cada uno de los corpus en mi espacio de trabajo en un objeto TermDocumentMatrix Tengo un vector llamado bNames que lista todos los corpus que quiero pasar a TDM, y construí los siguientes comandos: tdm.n1 <- vector('list', length = length(bNames))

Cannot allocate a vector of size...

2020 Feb 10

Cannot allocate a vector of size...

Muchas gracias Xabier. He intentaddo trabajar con la sparse matrix pero al pasar tdm a matriz me dice también que "cannot allocate a vector of size 12 gb". He hecho tdm<-as.matrix(tdm) ¿Está bien hecho eso para trabajar con la sparse matrix? Gracias! El Lun, 10 de Febrero de 2020, 16:15, Xavier-Andoni Tibau Alberdi escribió: > La respuesta de Carlos creo que es mucho mas

Cannot allocate a vector of size...

2020 Feb 07

Cannot allocate a vector of size...

Buenas tardes, Estoy haciendo un análisis de contenido con el paquete tm. A la hora de ejecutar este código: tdm<-TermDocumentMatrix(corpus,control=list(weighting =weightTf)) tdm.reviews.m<-as.matrix(tdm) La primera línea sí me la ejecuta bien pero en la segunda tengo este error: Error: cannot allocate vector of size 14.0 Gb ¿Cómo puedo corregirlo? Estoy usando la versión de 64bits de

filtering out unwanted words in a Term Document Matrix

2011 May 11

filtering out unwanted words in a Term Document Matrix

Hi Y'all, I am using the text mining package (tm). I am trying to filter out all of the words in a Term Document Matrix that are not in a list of words that I am interested in. I am using the following code: z<-tm_intersect(txt.dtm, c("communications", "safety", "climate", "blood", "surface", "cleanliness",

Cannot allocate a vector of size...

2020 Feb 07

Cannot allocate a vector of size...

Es la primera vez que trabajo con este tipo de datos...No se si se puede dividir esa matriz. ¿Cómo lo podría hacer? Muchas gracias! El Vie, 7 de Febrero de 2020, 17:55, Xavier-Andoni Tibau Alberdi escribió: > Significa que tus datos són muy grandes y no se pueden guardar en la RAM. > Tienes alternativas para dividir la matriz? > > El vie., 7 feb. 2020 17:26, <miriam.alzate en

Cannot allocate a vector of size...

2020 Feb 10

Cannot allocate a vector of size...

Buenas, El archivo de R ocupa 33 megas. La matriz que quiero construir cupa 14 gb. En el disco local (C) tengo 400 gb disponibles de 670. No estoy muy puesta en trabajar con este tipo de datos. ¿Qué diferencia es trabajar con data.frame? Gracias! El Vie, 7 de Febrero de 2020, 18:07, Xavier-Andoni Tibau Alberdi escribió: > Depende de la operació que quieras hacer con la matriz. Si quitas filas

Library (tm) Error: could not find function "TermDocMatrix".

2010 Apr 23

Library (tm) Error: could not find function "TermDocMatrix".

Hi List I have the next code and the error. I have try with other codes and I have the same problem. > reut21578 <- system.file("texts", "crude", package = "tm") > (r <- Corpus(DirSource(reut21578), readerControl = list(reader = > readReut21578XMLasPlain))) A corpus with 20 text documents > (r <- Corpus(DirSource(reut21578), readerControl =

tm package: handling contractions

2012 Jan 27

tm package: handling contractions

I tried making a wordcloud of Obama's State of the Union address using the tm package to process the text sotu <- scan(file="c:/R/data/sotu2012.txt", what="character") sotu <- tolower(sotu) corp <-Corpus(VectorSource(paste(sotu, collapse=" "))) corp <- tm_map(corp, removePunctuation) corp <- tm_map(corp, stemDocument) corp <- tm_map(corp,

Problems with rJava and tm packages

2009 Oct 15

Problems with rJava and tm packages

I am looking to do some text analysis using R and have run into some issues with some of the packages. Im not sure if its my goofy Vista OS or what but using R 2.8.1 i s relatively successful loading the text but the rJava package was messed up somehow: library(tm) > library(rJava) Error in if (!nchar(javahome)) stop("JAVA_HOME is not set and could not be determined from the

error while usig "tm" package

2010 Mar 18

error while usig "tm" package

I have recently started using "tm" package by Feinerer, K. Hornik, and D. Meyer. While trying to create a term-document matrix from a corpus (approxly 440 docs) I get the following error: tdm <- TermDocumentMatrix(tmp, control=list(weighting=weightTfIdf, minDocFreq=2, minWordLength=3)) *Error in rowSums(m > 0) : 'x' must be an array of at least two dimensions* This error

Help using "tm" text mining package - preprocessing

2011 Feb 10

Help using "tm" text mining package - preprocessing

Thanks all for your help. I fear text mining is an abstract little corner of "R". I have imported 3228 text (.txt) files, each a news story, into R using [tm]: textd <- Corpus(DirSource("other/docs"), readerControl = list(reader =readPlain)) I can pre-process each individual document using tolower(textd[[1]]) however, when I try to run tmTolower() I get a no such command

similar to: Min Frequency in findFreqTerms