Displaying 20 results from an estimated 100 matches similar to: "findFreqTerms vs minDocFreq in Package 'tm'"
2011 Nov 09
Min Frequency in findFreqTerms
I am using 'tm' package for text mining. I use the function findFreqTerms to
obtain the frequent words based on their frequency in the term document
The following is the example given in the help page of this function:
tdm <- TermDocumentMatrix(crude)
findFreqTerms(tdm, 2, 3)
The first three columns of the document term matrix
2006 Oct 03
new to R: don't understand errors
Hello all,
I'm brand new to the use of R, and I'm trying to quickly learning the
rudiments for a couple of projects here at work. I'm working with the
lsa package and trying to generate various semantic spaces. I seem to do
well with small collections of clean text files, but now that I am
trying to work with larger collections of less than perfection files,
I'm getting errors
2007 Aug 18
Problem with lsa package (data.frame) on Windows XP
Dear R team,
The following piece of code (to use the lsa package) works fine on my
mac os x, but when I run the same code on Windows XP, it doesn't work
any more.
### code:
matrix1 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE.
000\\LSA\\cuentos\\", stemming=TRUE, language="spanish",
minWordLength=2, minDocFreq=1,
2008 Oct 18
sorting matrix output alphabetically
I have been using the TM package to create a TermDocMatrix, which I
have saved as a matrix so that I can view word frequencies. Below is
a section of the code that I have used and an excerpt of the output:
What I wanted to be able to do is to view the output alphabetically -
rather than the results being sorted by frequency as below, that an
alphabetical list would be generated. This
2012 Feb 29
TM reader with text
Hello everybody,
I work, I try, with TM but I have a problem with some special words in
french. I think this is due to the manner to transform PDF to text, but I'm
not perfectly sure.
Let's see to the example :
[33] "<U+F0A3>" "<U+FB01>n" "<U+FB01>nancement"
2010 Mar 31
tm package- remove stowords failling
I just noticed that by inspecting the matrix term that no all stopwords are
removed, does someone know how to fix that?
d<-tm_map(crude, removeWords, stopwords(language='english'))
dt<-DocumentTermMatrix(d,control=list(minWordLength=3, minDocFreq=2))
inspect( dt)
I am using R version 2.10, tm package 0.5-3
[[alternative HTML
2011 Sep 13
SVD Memory Issue
I am trying to perform Singular Value Decomposition (SVD) on a Term Document
Matrix I created using the 'tm' package. Eventually I want to do a Latent
Semantic Analysis (LSA).
There are 5677 documents with 771 terms (the DTM is 771 x 5677). When I try
to do the SVD, it runs out of memory. I am using a 12GB Dual core Machine
with Windows XP and don't think I can increase the memory
2011 Sep 01
betareg question - keeping the mean fixed?
I have a dataset with proportions that vary around a fixed mean, is it
possible to use betareg to look at variance in the dispersion parameter
while keeping the mean fixed?
I am very new to R but have tried the following:
f<-betareg(scaled~-1 | expt_label + grouped_hpi, data=data1, link.phi="log",
2012 Dec 13
Tamaño de la matriz de términos y memoria. Paquete TM
Hola a todos!
Tengo algunos problemas con el tamaño de la matriz de términos que obtengo. Los comandos que utilizo son los siguientes:
# carga librerias
# lee el documento UTF-8 y lo convierte a ASCII
txt <-
2011 Mar 01
Problem on flexmix when trying to apply signature developed in one model to a new sample
Problem on flexmix when trying to apply signature developed in one model to a new sample.
R Users, R Core Team,
I have a problem when trying to know the
classification of the tested cases using two variables with the function of flexmix:
After importing the database and creating
a matrix:
I see that the best model has 2 groups and
2009 Nov 01
[LLVMdev] Issue compiling LLVM 2.6 on Windows with MinGW
I downloaded LLVM 2.6 and was attempting to compile it with TDM-GCC
4.4.1-tdm2-sjlj + cmake 2.6.4 and this happened:
[ 2%] Built target LLVMSystem
[ 5%] Built target LLVMSupport
[ 7%] Built target tblgen
[ 7%] Built target intrinsics_gen
[ 10%] Built target LLVMCore
[ 12%] Built target
2008 Mar 25
Solution to: Error "... x must be atomic" when using lsa (latent semantic analysis) package
In case someone else runs into this, I found the problem, it was related to
having some zero-length text files. Make sure you have valid (non-empty)
data files for loading into the document-term matrix.
---------- Forwarded message ----------
From: Alex McKenzie <ahmckenzie@gmail.com>
Date: Mar 25, 2008 2:07 AM
Subject: Error "... x must be atomic" when using lsa (latent
2010 Mar 18
error while usig "tm" package
I have recently started using "tm" package by Feinerer, K. Hornik, and D.
While trying to create a term-document matrix from a corpus (approxly 440
I get the following error:
tdm <- TermDocumentMatrix(tmp, control=list(weighting=weightTfIdf,
minDocFreq=2, minWordLength=3))
*Error in rowSums(m > 0) : 'x' must be an array of at least two dimensions*
This error
2006 Oct 04
FW: new to R: don't understand errors
Hello Jerad,
> It was suggested I contact you for possible help with this issue. Well,
> as you can see for the emails below, that is what I was told at R-help.
> Any insight to my lsa problems (also listed below) would be of great
> help.
from what I see, the problem probably indeed lies within the
textfiles: for performance reasons, it was not possible to
include any
2008 Mar 25
Error "... x must be atomic" when using lsa (latent semantic analysis) package
I'm trying to use the "lsa" (latent semantic analysis) package, and running
into a problem that seems to be related to the number of documents being
processed. Here's the code I'm running (after loading the lsa and rstem
packages), and the error message:
> SnippetsPath <- "c:\\OED\\AuditExplain\\" # path where to find text
2014 Jul 29
wordcloud y tabla de palabras [Avanzando]
Buenas tardes grupo. Saludos cordiales Carlos J., muchas gracias por
tu orientación. Efectivamente, me había dado cuenta que la razón por
la que no se aplicaba colnames era porque no tenía columnas. La
cuestión es que no logro visualizar completamente/claramente en qué
parte del proceso de creación del corpus se puede hacer.
Sin embargo, siguiendo el ejemplo de
2009 Nov 12
package "tm" fails to remove "the" with remove stopwords
I am using code that previously worked to remove stopwords using package
"tm". Even manually adding "the" to the list does not work to remove "the".
This package has undergone extensive redevelopment with changes to the
function syntax, so perhaps I am just missing something.
Please see my simple example, output, and sessionInfo() below.
2011 Feb 11
extracting p-values from the Manova function (car library)
i am not able to extract the p-values from the
Manova function in the car library. I need
to use this function in a high-throughput setting
and somehow need the p-values produced.
Any ideas?
Best regards
Bettina Kulle Andreassen
Bettina Kulle Andreassen
University of Oslo
Department of Biostatistics
Institute for Epi-Gen (Faculty Division Ahus)
+47 22851193
+47 67963923
2011 Mar 02
*** caught segfault *** when using impute.knn (impute package)
i am getting an error when calling the impute.knn
function (see the screenshot below).
what is the problem here and how can it be solved?
*** caught segfault ***
address 0x513c7b84, cause 'memory not mapped'
1: .Fortran("knnimp", x, ximp = x, p, n, imiss = imiss, irmiss,
as.integer(k), double(p), double(n), integer(p),
2005 Mar 03
calculating of linkage-disequilibrium measures?
Hi ,
is it possible to calculate ld-measures D, D', r and
perhaps corresponding p-values with r IF THE
The genetics - package provides the LD function
only for ambigious phase.
Thank you very much
Bettina Kulle