similar to: findFreqTerms vs minDocFreq in Package 'tm'

Displaying 20 results from an estimated 100 matches similar to: "findFreqTerms vs minDocFreq in Package 'tm'"

2011 Nov 09
0
Min Frequency in findFreqTerms
I am using 'tm' package for text mining. I use the function findFreqTerms to obtain the frequent words based on their frequency in the term document matrix. The following is the example given in the help page of this function: library("tm") data("crude") tdm <- TermDocumentMatrix(crude) findFreqTerms(tdm, 2, 3) The first three columns of the document term matrix
2006 Oct 03
1
new to R: don't understand errors
Hello all, I'm brand new to the use of R, and I'm trying to quickly learning the rudiments for a couple of projects here at work. I'm working with the lsa package and trying to generate various semantic spaces. I seem to do well with small collections of clean text files, but now that I am trying to work with larger collections of less than perfection files, I'm getting errors
2007 Aug 18
2
Problem with lsa package (data.frame) on Windows XP
Dear R team, The following piece of code (to use the lsa package) works fine on my mac os x, but when I run the same code on Windows XP, it doesn't work any more. ### code: library("lsa") matrix1 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE. 000\\LSA\\cuentos\\", stemming=TRUE, language="spanish", minWordLength=2, minDocFreq=1,
2008 Oct 18
2
sorting matrix output alphabetically
Hello, I have been using the TM package to create a TermDocMatrix, which I have saved as a matrix so that I can view word frequencies. Below is a section of the code that I have used and an excerpt of the output: What I wanted to be able to do is to view the output alphabetically - rather than the results being sorted by frequency as below, that an alphabetical list would be generated. This
2012 Feb 29
1
TM reader with text
Hello everybody, I work, I try, with TM but I have a problem with some special words in french. I think this is due to the manner to transform PDF to text, but I'm not perfectly sure. Let's see to the example : findFreqTerms(tdm1,30) [33] "<U+F0A3>" "<U+FB01>n" "<U+FB01>nancement" "<U+FB01>nancier"
2010 Mar 31
1
tm package- remove stowords failling
Hi, I just noticed that by inspecting the matrix term that no all stopwords are removed, does someone know how to fix that? library(tm) data("crude") d<-tm_map(crude, removeWords, stopwords(language='english')) dt<-DocumentTermMatrix(d,control=list(minWordLength=3, minDocFreq=2)) inspect( dt) I am using R version 2.10, tm package 0.5-3 cheers Welma [[alternative HTML
2011 Sep 13
1
SVD Memory Issue
I am trying to perform Singular Value Decomposition (SVD) on a Term Document Matrix I created using the 'tm' package. Eventually I want to do a Latent Semantic Analysis (LSA). There are 5677 documents with 771 terms (the DTM is 771 x 5677). When I try to do the SVD, it runs out of memory. I am using a 12GB Dual core Machine with Windows XP and don't think I can increase the memory
2011 Sep 01
3
betareg question - keeping the mean fixed?
Hello, I have a dataset with proportions that vary around a fixed mean, is it possible to use betareg to look at variance in the dispersion parameter while keeping the mean fixed? I am very new to R but have tried the following: svec<-c(qlogis(mean(data1$scaled)),0,0,0) f<-betareg(scaled~-1 | expt_label + grouped_hpi, data=data1, link.phi="log",
2012 Dec 13
2
Tamaño de la matriz de términos y memoria. Paquete TM
Hola a todos! Tengo algunos problemas con el tamaño de la matriz de términos que obtengo. Los comandos que utilizo son los siguientes: # carga librerias library(tm) library(wordcloud) library(Rstem) library(Snowball) # lee el documento UTF-8 y lo convierte a ASCII txt <-
2011 Mar 01
1
Problem on flexmix when trying to apply signature developed in one model to a new sample
Problem on flexmix when trying to apply signature developed in one model to a new sample. Dear R Users, R Core Team, I have a problem when trying to know the classification of the tested cases using two variables with the function of flexmix: After importing the database and creating a matrix: BM<-cbind(Data$var1,Data$var2) I see that the best model has 2 groups and use: ex2
2009 Nov 01
1
[LLVMdev] Issue compiling LLVM 2.6 on Windows with MinGW
Hello, I downloaded LLVM 2.6 and was attempting to compile it with TDM-GCC 4.4.1-tdm2-sjlj + cmake 2.6.4 and this happened: =============Console=================== C:\projects\game-editor\LLVM\build-root>mingw32-make [ 2%] Built target LLVMSystem [ 5%] Built target LLVMSupport [ 7%] Built target tblgen [ 7%] Built target intrinsics_gen [ 10%] Built target LLVMCore [ 12%] Built target
2008 Mar 25
0
Solution to: Error "... x must be atomic" when using lsa (latent semantic analysis) package
In case someone else runs into this, I found the problem, it was related to having some zero-length text files. Make sure you have valid (non-empty) data files for loading into the document-term matrix. Alex ---------- Forwarded message ---------- From: Alex McKenzie <ahmckenzie@gmail.com> Date: Mar 25, 2008 2:07 AM Subject: Error "... x must be atomic" when using lsa (latent
2010 Mar 18
0
error while usig "tm" package
I have recently started using "tm" package by Feinerer, K. Hornik, and D. Meyer. While trying to create a term-document matrix from a corpus (approxly 440 docs) I get the following error: tdm <- TermDocumentMatrix(tmp, control=list(weighting=weightTfIdf, minDocFreq=2, minWordLength=3)) *Error in rowSums(m > 0) : 'x' must be an array of at least two dimensions* This error
2006 Oct 04
0
FW: new to R: don't understand errors
Hello Jerad, > It was suggested I contact you for possible help with this issue. Well, > as you can see for the emails below, that is what I was told at R-help. > Any insight to my lsa problems (also listed below) would be of great > help. from what I see, the problem probably indeed lies within the textfiles: for performance reasons, it was not possible to include any
2008 Mar 25
0
Error "... x must be atomic" when using lsa (latent semantic analysis) package
Hello, I'm trying to use the "lsa" (latent semantic analysis) package, and running into a problem that seems to be related to the number of documents being processed. Here's the code I'm running (after loading the lsa and rstem packages), and the error message: > SnippetsPath <- "c:\\OED\\AuditExplain\\" # path where to find text snippets >
2014 Jul 29
2
wordcloud y tabla de palabras [Avanzando]
Buenas tardes grupo. Saludos cordiales Carlos J., muchas gracias por tu orientación. Efectivamente, me había dado cuenta que la razón por la que no se aplicaba colnames era porque no tenía columnas. La cuestión es que no logro visualizar completamente/claramente en qué parte del proceso de creación del corpus se puede hacer. Sin embargo, siguiendo el ejemplo de
2009 Nov 12
2
package "tm" fails to remove "the" with remove stopwords
I am using code that previously worked to remove stopwords using package "tm". Even manually adding "the" to the list does not work to remove "the". This package has undergone extensive redevelopment with changes to the function syntax, so perhaps I am just missing something. Please see my simple example, output, and sessionInfo() below. Thanks! Mark require(tm)
2011 Feb 11
2
extracting p-values from the Manova function (car library)
hi, i am not able to extract the p-values from the Manova function in the car library. I need to use this function in a high-throughput setting and somehow need the p-values produced. Any ideas? Best regards Bettina Kulle Andreassen -- Bettina Kulle Andreassen University of Oslo Department of Biostatistics and Institute for Epi-Gen (Faculty Division Ahus) tel: +47 22851193 +47 67963923
2011 Mar 02
2
*** caught segfault *** when using impute.knn (impute package)
hi, i am getting an error when calling the impute.knn function (see the screenshot below). what is the problem here and how can it be solved? screenshot: ################## *** caught segfault *** address 0x513c7b84, cause 'memory not mapped' Traceback: 1: .Fortran("knnimp", x, ximp = x, p, n, imiss = imiss, irmiss, as.integer(k), double(p), double(n), integer(p),
2005 Mar 03
1
calculating of linkage-disequilibrium measures?
Hi , is it possible to calculate ld-measures D, D', r and perhaps corresponding p-values with r IF THE PHASE IS KNOWN? The genetics - package provides the LD function only for ambigious phase. Thank you very much Bettina Kulle