thr3ads.net - search: "mindocfreq"

Displaying 10 results from an estimated 10 matches for "mindocfreq".

findFreqTerms vs minDocFreq in Package 'tm'

2011 Sep 12

findFreqTerms vs minDocFreq in Package 'tm'

I am using 'tm' package for text mining and facing an issue with finding the frequently occuring terms. From the definition it appears that findFreqTerms and minDocFreq are equivalent commands and both tries to identify the documents with terms appearing more than a specified threshold. However, I am getting drastically different results with both. I have given the results from both the commands below: findFreqTerms identifies 3140 words that appear more than 5 t...

new to R: don't understand errors

2006 Oct 03

new to R: don't understand errors

...is some sort of limit on the number of files. Even when I only use the same number as previous working collections, I still get the errors. So I am wondering if it might be something in the files themselves... At any rate I routinely get these two errors. The first is generated when I include a minDocFreq=x, and it looks a little like this when I run it: > data(stopwords_en) > CCauto = textmatrix( "CultureMineTXT" , minWordLength=3, minDocFreq=50, stopwords=stopwords_en) > Error in data.frame(docs = basename(file), terms = names(tab), Freq = tab, : >...

Problem with lsa package (data.frame) on Windows XP

2007 Aug 18

Problem with lsa package (data.frame) on Windows XP

...ne on my mac os x, but when I run the same code on Windows XP, it doesn't work any more. ### code: library("lsa") matrix1 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE. 000\\LSA\\cuentos\\", stemming=TRUE, language="spanish", minWordLength=2, minDocFreq=1, stopwords=NULL, vocabulary=NULL) print(matrix1,bag_lines = 3, bag_cols = 3) matrix1 = lw_bintf(matrix1) * gw_idf(matrix1) space = lsa(matrix1, dims = dimcalc_share()) as.textmatrix(space) ### the following line fails on windows XP matrix2 = textmatrix("C:\\Documents and Settings\\tine stal...

tm package- remove stowords failling

2010 Mar 31

tm package- remove stowords failling

Hi, I just noticed that by inspecting the matrix term that no all stopwords are removed, does someone know how to fix that? library(tm) data("crude") d<-tm_map(crude, removeWords, stopwords(language='english')) dt<-DocumentTermMatrix(d,control=list(minWordLength=3, minDocFreq=2)) inspect( dt) I am using R version 2.10, tm package 0.5-3 cheers Welma [[alternative HTML version deleted]]

Error "... x must be atomic" when using lsa (latent semantic analysis) package

2008 Mar 25

Error "... x must be atomic" when using lsa (latent semantic analysis) package

...t;) 9: sort(unique.default(x), na.last = TRUE) 8: factor(a, exclude = exclude) 7: table(txt) 6: inherits(x, "factor") 5: is.factor(x) 4: sort(table(txt), decreasing = TRUE) 3: FUN(X[[238]], ...) 2: lapply(dir(mydir, full.names = TRUE), textvector, stemming, language, minWordLength, minDocFreq, stopwords, vocabulary) 1: textmatrix(SnippetsPath, stopwords = stopwords_en) Alex [[alternative HTML version deleted]]

Solution to: Error "... x must be atomic" when using lsa (latent semantic analysis) package

2008 Mar 25

Solution to: Error "... x must be atomic" when using lsa (latent semantic analysis) package

FW: new to R: don't understand errors

2006 Oct 04

FW: new to R: don't understand errors

...d you the alpha-release of the updated lsa package in a separate message which also includes a parameter called minGlobFreq which is filtering out terms that appear less than x times in the whole document collection. I guess that is what you were looking for. Considering the sanitizing: if you set minDocFreq to 1 and set minWordLength to 1, you should not get an error with your document collection as you then are basically taking everything (even a single character appearing only once). It probably is not so problematic as the LSA step will anyway group this low-frequency terms in a lower order factor....

error while usig "tm" package

2010 Mar 18

error while usig "tm" package

I have recently started using "tm" package by Feinerer, K. Hornik, and D. Meyer. While trying to create a term-document matrix from a corpus (approxly 440 docs) I get the following error: tdm <- TermDocumentMatrix(tmp, control=list(weighting=weightTfIdf, minDocFreq=2, minWordLength=3)) *Error in rowSums(m > 0) : 'x' must be an array of at least two dimensions* This error appears for option weighting=weightTfIdf and not for weighting=weightTf As Idf would need division by df, is this anything to do with nature of my data? May be I am doing somethin...

package "tm" fails to remove "the" with remove stopwords

2009 Nov 12

package "tm" fails to remove "the" with remove stopwords

I am using code that previously worked to remove stopwords using package "tm". Even manually adding "the" to the list does not work to remove "the". This package has undergone extensive redevelopment with changes to the function syntax, so perhaps I am just missing something. Please see my simple example, output, and sessionInfo() below. Thanks! Mark require(tm)

Partial comparison in string vector

2007 Aug 21

Partial comparison in string vector

...the same code on Windows XP, it doesn't work > any more. > > ### code: > library("lsa") > matrix1 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE. > 000\\LSA\\cuentos\\", stemming=TRUE, language="spanish", > minWordLength=2, minDocFreq=1, stopwords=NULL, vocabulary=NULL) > print(matrix1,bag_lines = 3, bag_cols = 3) > matrix1 = lw_bintf(matrix1) * gw_idf(matrix1) > space = lsa(matrix1, dims = dimcalc_share()) > as.textmatrix(space) > > ### the following line fails on windows XP > matrix2 = textmatrix("C:...

search for: mindocfreq