search for: minwordlength

Displaying 14 results from an estimated 14 matches for "minwordlength".

2007 Aug 18
2
Problem with lsa package (data.frame) on Windows XP
...package) works fine on my mac os x, but when I run the same code on Windows XP, it doesn't work any more. ### code: library("lsa") matrix1 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE. 000\\LSA\\cuentos\\", stemming=TRUE, language="spanish", minWordLength=2, minDocFreq=1, stopwords=NULL, vocabulary=NULL) print(matrix1,bag_lines = 3, bag_cols = 3) matrix1 = lw_bintf(matrix1) * gw_idf(matrix1) space = lsa(matrix1, dims = dimcalc_share()) as.textmatrix(space) ### the following line fails on windows XP matrix2 = textmatrix("C:\\Documents and Setti...
2006 Oct 03
1
new to R: don't understand errors
...ng if it might be something in the files themselves... At any rate I routinely get these two errors. The first is generated when I include a minDocFreq=x, and it looks a little like this when I run it: > data(stopwords_en) > CCauto = textmatrix( "CultureMineTXT" , minWordLength=3, minDocFreq=50, stopwords=stopwords_en) > Error in data.frame(docs = basename(file), terms = names(tab), Freq = tab, : > arguments imply differing number of rows: 1, 0 If I remove the minDocFreq, I get a different error: > data(stopwords_en) &...
2010 Mar 31
1
tm package- remove stowords failling
Hi, I just noticed that by inspecting the matrix term that no all stopwords are removed, does someone know how to fix that? library(tm) data("crude") d<-tm_map(crude, removeWords, stopwords(language='english')) dt<-DocumentTermMatrix(d,control=list(minWordLength=3, minDocFreq=2)) inspect( dt) I am using R version 2.10, tm package 0.5-3 cheers Welma [[alternative HTML version deleted]]
2011 Sep 13
1
SVD Memory Issue
...D, it runs out of memory. I am using a 12GB Dual core Machine with Windows XP and don't think I can increase the memory anymore. Are there any other memory efficient methods to find the SVD? The term document is obtained using: tdm2 <- TermDocumentMatrix(tr1,control=list(weighting=weightTf,minWordLength=3)) str(tdm2) List of 6 $ i : int [1:6438] 202 729 737 278 402 621 654 718 157 380 ... $ j : int [1:6438] 1 2 3 7 7 7 7 8 10 10 ... $ v : num [1:6438] 8 5 6 9 5 7 5 6 5 7 ... $ nrow : int 771 $ ncol : int 5677 $ dimnames:List of 2 ..$ Terms: chr [1:771] "access...
2012 Oct 25
2
MinerĂ­a de texto
...moveWords, my.stopwords) tw.corpus = tm_map(tw.corpus, stripWhitespace) sw <- readLines("stopwords.es.txt",encoding="UTF-8") sw = iconv(sw, to="ASCII//TRANSLIT") tw.corpus = tm_map(tw.corpus, removeWords, sw) doc.m = TermDocumentMatrix(tw.corpus, control = list(minWordLength = 2)) dm = as.matrix(doc.m) # calculate the frequency of words v = sort(rowSums(dm), decreasing=TRUE) d = data.frame(word=names(v), freq=v) #Generate the wordcloud pal2 <- brewer.pal(8,"Dark2") wc=wordcloud(d$word, d$freq, min.freq=min.freq, scale=c(8,.2), max.word...
2012 Jan 13
4
Troubles with stemming (tm + Snowball packages) under MacOS
Dear all, I have some troubles using the stemming algorithm provided by the tm (text mining) + Snowball packages. Here is my config: MacOS 10.5 R 2.12.0 / R 2.13.1 / R 2.14.1 (I have tried several versions) I have installed all the needed packages (tm, rJava, rWeka, Snowball) + dependencies. I have desactivated AWT (like written in
2008 Mar 25
0
Error "... x must be atomic" when using lsa (latent semantic analysis) package
...t be atomic") 9: sort(unique.default(x), na.last = TRUE) 8: factor(a, exclude = exclude) 7: table(txt) 6: inherits(x, "factor") 5: is.factor(x) 4: sort(table(txt), decreasing = TRUE) 3: FUN(X[[238]], ...) 2: lapply(dir(mydir, full.names = TRUE), textvector, stemming, language, minWordLength, minDocFreq, stopwords, vocabulary) 1: textmatrix(SnippetsPath, stopwords = stopwords_en) Alex [[alternative HTML version deleted]]
2008 Mar 25
0
Solution to: Error "... x must be atomic" when using lsa (latent semantic analysis) package
...t be atomic") 9: sort(unique.default(x), na.last = TRUE) 8: factor(a, exclude = exclude) 7: table(txt) 6: inherits(x, "factor") 5: is.factor(x) 4: sort(table(txt), decreasing = TRUE) 3: FUN(X[[238]], ...) 2: lapply(dir(mydir, full.names = TRUE), textvector, stemming, language, minWordLength, minDocFreq, stopwords, vocabulary) 1: textmatrix(SnippetsPath, stopwords = stopwords_en) Alex [[alternative HTML version deleted]]
2006 Oct 04
0
FW: new to R: don't understand errors
...of the updated lsa package in a separate message which also includes a parameter called minGlobFreq which is filtering out terms that appear less than x times in the whole document collection. I guess that is what you were looking for. Considering the sanitizing: if you set minDocFreq to 1 and set minWordLength to 1, you should not get an error with your document collection as you then are basically taking everything (even a single character appearing only once). It probably is not so problematic as the LSA step will anyway group this low-frequency terms in a lower order factor. Of course you will still g...
2010 Mar 18
0
error while usig "tm" package
I have recently started using "tm" package by Feinerer, K. Hornik, and D. Meyer. While trying to create a term-document matrix from a corpus (approxly 440 docs) I get the following error: tdm <- TermDocumentMatrix(tmp, control=list(weighting=weightTfIdf, minDocFreq=2, minWordLength=3)) *Error in rowSums(m > 0) : 'x' must be an array of at least two dimensions* This error appears for option weighting=weightTfIdf and not for weighting=weightTf As Idf would need division by df, is this anything to do with nature of my data? May be I am doing something silly. Can any...
2012 Feb 22
0
LSA package: problem with textmatrix()
...n between maternal and fetal plasma levels of glucose and free fatty acids . correlation coefficients have been determined between the the command I am using looks like this, with the resulting error below: > > dtm <- textmatrix(LSAwork, stemming=TRUE, stopwords=StopListm, minGlobFreq=1, minWordLength=2, removeNumbers=TRUE) Error in data.frame(docs = basename(file), terms = names(tab), Freq = tab, : arguments imply differing number of rows: 1, 0 In addition: Warning message: In FUN(c("LSAWork/med.000001", "LSAWork/med.000002", "LSAWork/med.000003", : [textvec...
2012 Feb 26
2
tm_map help
...veWords, myStopwords) dictCorpus <- myCorpus myCorpus <- tm_map(myCorpus, stemDocument) ################ERROR HAPPENS ON NEXT LINE################################## myCorpus <- tm_map(myCorpus, stemCompletion, dictionary=dictCorpus) myDtm <- TermDocumentMatrix(myCorpus, control = list(minWordLength = 1)) m <- as.matrix(myDtm) v <- sort(rowSums(m), decreasing=TRUE) myNames <- names(v) d <- data.frame(word=myNames, freq=v) wordcloud(d$word, d$freq, min.freq=minFreq) list(freq=v, TextMatrix=myDtm) } qantas=hashTag("#qantas", 7) [[alternative HTML version deleted]]
2011 Sep 12
1
findFreqTerms vs minDocFreq in Package 'tm'
...hting=weightBin)) > freq_terms <- findFreqTerms(tdm1, lowfreq =5, highfreq = Inf) > str(freq_terms) chr [1:3140] "abc" "abil" "abl" "abnorm" "abort" "absenc" ... > tdm2 <- TermDocumentMatrix(tr1,control=list(minDocFreq=5,minWordLength=1)) > str(tdm2) List of 6 $ i : int [1:4703] 173 616 624 241 350 534 563 609 129 333 ... $ j : int [1:4703] 1 2 3 7 7 7 7 8 10 10 ... $ v : num [1:4703] 7 5 6 9 5 7 5 5 5 7 ... $ nrow : int 659 $ ncol : int 5677 $ dimnames:List of 2 ..$ Terms: chr [1:659] "\0...
2007 Aug 21
2
Partial comparison in string vector
...x, but when I run the same code on Windows XP, it doesn't work > any more. > > ### code: > library("lsa") > matrix1 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE. > 000\\LSA\\cuentos\\", stemming=TRUE, language="spanish", > minWordLength=2, minDocFreq=1, stopwords=NULL, vocabulary=NULL) > print(matrix1,bag_lines = 3, bag_cols = 3) > matrix1 = lw_bintf(matrix1) * gw_idf(matrix1) > space = lsa(matrix1, dims = dimcalc_share()) > as.textmatrix(space) > > ### the following line fails on windows XP > matrix2 = textm...