thr3ads.net - similar to: "findFreqTerms vs minDocFreq in Package 'tm'"

Displaying 20 results from an estimated 100 matches similar to: "findFreqTerms vs minDocFreq in Package 'tm'"

2011 Nov 09

Min Frequency in findFreqTerms

I am using 'tm' package for text mining. I use the function findFreqTerms to obtain the frequent words based on their frequency in the term document matrix. The following is the example given in the help page of this function: library("tm") data("crude") tdm <- TermDocumentMatrix(crude) findFreqTerms(tdm, 2, 3) The first three columns of the document term matrix

new to R: don't understand errors

2006 Oct 03

new to R: don't understand errors

Hello all, I'm brand new to the use of R, and I'm trying to quickly learning the rudiments for a couple of projects here at work. I'm working with the lsa package and trying to generate various semantic spaces. I seem to do well with small collections of clean text files, but now that I am trying to work with larger collections of less than perfection files, I'm getting errors

Problem with lsa package (data.frame) on Windows XP

2007 Aug 18

Problem with lsa package (data.frame) on Windows XP

Dear R team, The following piece of code (to use the lsa package) works fine on my mac os x, but when I run the same code on Windows XP, it doesn't work any more. ### code: library("lsa") matrix1 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE. 000\\LSA\\cuentos\\", stemming=TRUE, language="spanish", minWordLength=2, minDocFreq=1,

sorting matrix output alphabetically

2008 Oct 18

sorting matrix output alphabetically

Hello, I have been using the TM package to create a TermDocMatrix, which I have saved as a matrix so that I can view word frequencies. Below is a section of the code that I have used and an excerpt of the output: What I wanted to be able to do is to view the output alphabetically - rather than the results being sorted by frequency as below, that an alphabetical list would be generated. This

TM reader with text

2012 Feb 29

TM reader with text

Hello everybody, I work, I try, with TM but I have a problem with some special words in french. I think this is due to the manner to transform PDF to text, but I'm not perfectly sure. Let's see to the example : findFreqTerms(tdm1,30) [33] "<U+F0A3>" "<U+FB01>n" "<U+FB01>nancement" "<U+FB01>nancier"

tm package- remove stowords failling

2010 Mar 31

tm package- remove stowords failling

Hi, I just noticed that by inspecting the matrix term that no all stopwords are removed, does someone know how to fix that? library(tm) data("crude") d<-tm_map(crude, removeWords, stopwords(language='english')) dt<-DocumentTermMatrix(d,control=list(minWordLength=3, minDocFreq=2)) inspect( dt) I am using R version 2.10, tm package 0.5-3 cheers Welma [[alternative HTML

SVD Memory Issue

2011 Sep 13

SVD Memory Issue

I am trying to perform Singular Value Decomposition (SVD) on a Term Document Matrix I created using the 'tm' package. Eventually I want to do a Latent Semantic Analysis (LSA). There are 5677 documents with 771 terms (the DTM is 771 x 5677). When I try to do the SVD, it runs out of memory. I am using a 12GB Dual core Machine with Windows XP and don't think I can increase the memory

betareg question - keeping the mean fixed?

2011 Sep 01

betareg question - keeping the mean fixed?

Hello, I have a dataset with proportions that vary around a fixed mean, is it possible to use betareg to look at variance in the dispersion parameter while keeping the mean fixed? I am very new to R but have tried the following: svec<-c(qlogis(mean(data1$scaled)),0,0,0) f<-betareg(scaled~-1 | expt_label + grouped_hpi, data=data1, link.phi="log",

Tamaño de la matriz de términos y memoria. Paquete TM

2012 Dec 13

Tamaño de la matriz de términos y memoria. Paquete TM

Hola a todos! Tengo algunos problemas con el tamaño de la matriz de términos que obtengo. Los comandos que utilizo son los siguientes: # carga librerias library(tm) library(wordcloud) library(Rstem) library(Snowball) # lee el documento UTF-8 y lo convierte a ASCII txt <-

Problem on flexmix when trying to apply signature developed in one model to a new sample

2011 Mar 01

Problem on flexmix when trying to apply signature developed in one model to a new sample

Problem on flexmix when trying to apply signature developed in one model to a new sample. Dear R Users, R Core Team, I have a problem when trying to know the classification of the tested cases using two variables with the function of flexmix: After importing the database and creating a matrix: BM<-cbind(Data$var1,Data$var2) I see that the best model has 2 groups and use: ex2

[LLVMdev] Issue compiling LLVM 2.6 on Windows with MinGW

2009 Nov 01

[LLVMdev] Issue compiling LLVM 2.6 on Windows with MinGW

Hello, I downloaded LLVM 2.6 and was attempting to compile it with TDM-GCC 4.4.1-tdm2-sjlj + cmake 2.6.4 and this happened: =============Console=================== C:\projects\game-editor\LLVM\build-root>mingw32-make [ 2%] Built target LLVMSystem [ 5%] Built target LLVMSupport [ 7%] Built target tblgen [ 7%] Built target intrinsics_gen [ 10%] Built target LLVMCore [ 12%] Built target

Solution to: Error "... x must be atomic" when using lsa (latent semantic analysis) package

2008 Mar 25

Solution to: Error "... x must be atomic" when using lsa (latent semantic analysis) package

In case someone else runs into this, I found the problem, it was related to having some zero-length text files. Make sure you have valid (non-empty) data files for loading into the document-term matrix. Alex ---------- Forwarded message ---------- From: Alex McKenzie <ahmckenzie@gmail.com> Date: Mar 25, 2008 2:07 AM Subject: Error "... x must be atomic" when using lsa (latent

error while usig "tm" package

2010 Mar 18

error while usig "tm" package

I have recently started using "tm" package by Feinerer, K. Hornik, and D. Meyer. While trying to create a term-document matrix from a corpus (approxly 440 docs) I get the following error: tdm <- TermDocumentMatrix(tmp, control=list(weighting=weightTfIdf, minDocFreq=2, minWordLength=3)) *Error in rowSums(m > 0) : 'x' must be an array of at least two dimensions* This error

FW: new to R: don't understand errors

2006 Oct 04

FW: new to R: don't understand errors

Hello Jerad, > It was suggested I contact you for possible help with this issue. Well, > as you can see for the emails below, that is what I was told at R-help. > Any insight to my lsa problems (also listed below) would be of great > help. from what I see, the problem probably indeed lies within the textfiles: for performance reasons, it was not possible to include any

Error "... x must be atomic" when using lsa (latent semantic analysis) package

2008 Mar 25

Error "... x must be atomic" when using lsa (latent semantic analysis) package

Hello, I'm trying to use the "lsa" (latent semantic analysis) package, and running into a problem that seems to be related to the number of documents being processed. Here's the code I'm running (after loading the lsa and rstem packages), and the error message: > SnippetsPath <- "c:\\OED\\AuditExplain\\" # path where to find text snippets >

wordcloud y tabla de palabras [Avanzando]

2014 Jul 29

wordcloud y tabla de palabras [Avanzando]

Buenas tardes grupo. Saludos cordiales Carlos J., muchas gracias por tu orientación. Efectivamente, me había dado cuenta que la razón por la que no se aplicaba colnames era porque no tenía columnas. La cuestión es que no logro visualizar completamente/claramente en qué parte del proceso de creación del corpus se puede hacer. Sin embargo, siguiendo el ejemplo de

package "tm" fails to remove "the" with remove stopwords

2009 Nov 12

package "tm" fails to remove "the" with remove stopwords

I am using code that previously worked to remove stopwords using package "tm". Even manually adding "the" to the list does not work to remove "the". This package has undergone extensive redevelopment with changes to the function syntax, so perhaps I am just missing something. Please see my simple example, output, and sessionInfo() below. Thanks! Mark require(tm)

extracting p-values from the Manova function (car library)

2011 Feb 11

extracting p-values from the Manova function (car library)

hi, i am not able to extract the p-values from the Manova function in the car library. I need to use this function in a high-throughput setting and somehow need the p-values produced. Any ideas? Best regards Bettina Kulle Andreassen -- Bettina Kulle Andreassen University of Oslo Department of Biostatistics and Institute for Epi-Gen (Faculty Division Ahus) tel: +47 22851193 +47 67963923

*** caught segfault *** when using impute.knn (impute package)

2011 Mar 02

*** caught segfault *** when using impute.knn (impute package)

hi, i am getting an error when calling the impute.knn function (see the screenshot below). what is the problem here and how can it be solved? screenshot: ################## *** caught segfault *** address 0x513c7b84, cause 'memory not mapped' Traceback: 1: .Fortran("knnimp", x, ximp = x, p, n, imiss = imiss, irmiss, as.integer(k), double(p), double(n), integer(p),

calculating of linkage-disequilibrium measures?

2005 Mar 03

calculating of linkage-disequilibrium measures?

Hi , is it possible to calculate ld-measures D, D', r and perhaps corresponding p-values with r IF THE PHASE IS KNOWN? The genetics - package provides the LD function only for ambigious phase. Thank you very much Bettina Kulle

similar to: findFreqTerms vs minDocFreq in Package 'tm'