thr3ads.net - search: "minwordlength"

Displaying 14 results from an estimated 14 matches for "minwordlength".

Problem with lsa package (data.frame) on Windows XP

2007 Aug 18

Problem with lsa package (data.frame) on Windows XP

...package) works fine on my mac os x, but when I run the same code on Windows XP, it doesn't work any more. ### code: library("lsa") matrix1 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE. 000\\LSA\\cuentos\\", stemming=TRUE, language="spanish", minWordLength=2, minDocFreq=1, stopwords=NULL, vocabulary=NULL) print(matrix1,bag_lines = 3, bag_cols = 3) matrix1 = lw_bintf(matrix1) * gw_idf(matrix1) space = lsa(matrix1, dims = dimcalc_share()) as.textmatrix(space) ### the following line fails on windows XP matrix2 = textmatrix("C:\\Documents and Setti...

new to R: don't understand errors

2006 Oct 03

new to R: don't understand errors

...ng if it might be something in the files themselves... At any rate I routinely get these two errors. The first is generated when I include a minDocFreq=x, and it looks a little like this when I run it: > data(stopwords_en) > CCauto = textmatrix( "CultureMineTXT" , minWordLength=3, minDocFreq=50, stopwords=stopwords_en) > Error in data.frame(docs = basename(file), terms = names(tab), Freq = tab, : > arguments imply differing number of rows: 1, 0 If I remove the minDocFreq, I get a different error: > data(stopwords_en) &...

tm package- remove stowords failling

2010 Mar 31

tm package- remove stowords failling

Hi, I just noticed that by inspecting the matrix term that no all stopwords are removed, does someone know how to fix that? library(tm) data("crude") d<-tm_map(crude, removeWords, stopwords(language='english')) dt<-DocumentTermMatrix(d,control=list(minWordLength=3, minDocFreq=2)) inspect( dt) I am using R version 2.10, tm package 0.5-3 cheers Welma [[alternative HTML version deleted]]

SVD Memory Issue

2011 Sep 13

SVD Memory Issue

...D, it runs out of memory. I am using a 12GB Dual core Machine with Windows XP and don't think I can increase the memory anymore. Are there any other memory efficient methods to find the SVD? The term document is obtained using: tdm2 <- TermDocumentMatrix(tr1,control=list(weighting=weightTf,minWordLength=3)) str(tdm2) List of 6 $ i : int [1:6438] 202 729 737 278 402 621 654 718 157 380 ... $ j : int [1:6438] 1 2 3 7 7 7 7 8 10 10 ... $ v : num [1:6438] 8 5 6 9 5 7 5 6 5 7 ... $ nrow : int 771 $ ncol : int 5677 $ dimnames:List of 2 ..$ Terms: chr [1:771] "access...

Minería de texto

2012 Oct 25

Minería de texto

...moveWords, my.stopwords) tw.corpus = tm_map(tw.corpus, stripWhitespace) sw <- readLines("stopwords.es.txt",encoding="UTF-8") sw = iconv(sw, to="ASCII//TRANSLIT") tw.corpus = tm_map(tw.corpus, removeWords, sw) doc.m = TermDocumentMatrix(tw.corpus, control = list(minWordLength = 2)) dm = as.matrix(doc.m) # calculate the frequency of words v = sort(rowSums(dm), decreasing=TRUE) d = data.frame(word=names(v), freq=v) #Generate the wordcloud pal2 <- brewer.pal(8,"Dark2") wc=wordcloud(d$word, d$freq, min.freq=min.freq, scale=c(8,.2), max.word...

Troubles with stemming (tm + Snowball packages) under MacOS

2012 Jan 13

Troubles with stemming (tm + Snowball packages) under MacOS

Dear all, I have some troubles using the stemming algorithm provided by the tm (text mining) + Snowball packages. Here is my config: MacOS 10.5 R 2.12.0 / R 2.13.1 / R 2.14.1 (I have tried several versions) I have installed all the needed packages (tm, rJava, rWeka, Snowball) + dependencies. I have desactivated AWT (like written in

Error "... x must be atomic" when using lsa (latent semantic analysis) package

2008 Mar 25

Error "... x must be atomic" when using lsa (latent semantic analysis) package

...t be atomic") 9: sort(unique.default(x), na.last = TRUE) 8: factor(a, exclude = exclude) 7: table(txt) 6: inherits(x, "factor") 5: is.factor(x) 4: sort(table(txt), decreasing = TRUE) 3: FUN(X[[238]], ...) 2: lapply(dir(mydir, full.names = TRUE), textvector, stemming, language, minWordLength, minDocFreq, stopwords, vocabulary) 1: textmatrix(SnippetsPath, stopwords = stopwords_en) Alex [[alternative HTML version deleted]]

Solution to: Error "... x must be atomic" when using lsa (latent semantic analysis) package

2008 Mar 25

Solution to: Error "... x must be atomic" when using lsa (latent semantic analysis) package

FW: new to R: don't understand errors

2006 Oct 04

FW: new to R: don't understand errors

...of the updated lsa package in a separate message which also includes a parameter called minGlobFreq which is filtering out terms that appear less than x times in the whole document collection. I guess that is what you were looking for. Considering the sanitizing: if you set minDocFreq to 1 and set minWordLength to 1, you should not get an error with your document collection as you then are basically taking everything (even a single character appearing only once). It probably is not so problematic as the LSA step will anyway group this low-frequency terms in a lower order factor. Of course you will still g...

error while usig "tm" package

2010 Mar 18

error while usig "tm" package

I have recently started using "tm" package by Feinerer, K. Hornik, and D. Meyer. While trying to create a term-document matrix from a corpus (approxly 440 docs) I get the following error: tdm <- TermDocumentMatrix(tmp, control=list(weighting=weightTfIdf, minDocFreq=2, minWordLength=3)) *Error in rowSums(m > 0) : 'x' must be an array of at least two dimensions* This error appears for option weighting=weightTfIdf and not for weighting=weightTf As Idf would need division by df, is this anything to do with nature of my data? May be I am doing something silly. Can any...

LSA package: problem with textmatrix()

2012 Feb 22

LSA package: problem with textmatrix()

...n between maternal and fetal plasma levels of glucose and free fatty acids . correlation coefficients have been determined between the the command I am using looks like this, with the resulting error below: > > dtm <- textmatrix(LSAwork, stemming=TRUE, stopwords=StopListm, minGlobFreq=1, minWordLength=2, removeNumbers=TRUE) Error in data.frame(docs = basename(file), terms = names(tab), Freq = tab, : arguments imply differing number of rows: 1, 0 In addition: Warning message: In FUN(c("LSAWork/med.000001", "LSAWork/med.000002", "LSAWork/med.000003", : [textvec...

tm_map help

2012 Feb 26

tm_map help

...veWords, myStopwords) dictCorpus <- myCorpus myCorpus <- tm_map(myCorpus, stemDocument) ################ERROR HAPPENS ON NEXT LINE################################## myCorpus <- tm_map(myCorpus, stemCompletion, dictionary=dictCorpus) myDtm <- TermDocumentMatrix(myCorpus, control = list(minWordLength = 1)) m <- as.matrix(myDtm) v <- sort(rowSums(m), decreasing=TRUE) myNames <- names(v) d <- data.frame(word=myNames, freq=v) wordcloud(d$word, d$freq, min.freq=minFreq) list(freq=v, TextMatrix=myDtm) } qantas=hashTag("#qantas", 7) [[alternative HTML version deleted]]

findFreqTerms vs minDocFreq in Package 'tm'

2011 Sep 12

findFreqTerms vs minDocFreq in Package 'tm'

...hting=weightBin)) > freq_terms <- findFreqTerms(tdm1, lowfreq =5, highfreq = Inf) > str(freq_terms) chr [1:3140] "abc" "abil" "abl" "abnorm" "abort" "absenc" ... > tdm2 <- TermDocumentMatrix(tr1,control=list(minDocFreq=5,minWordLength=1)) > str(tdm2) List of 6 $ i : int [1:4703] 173 616 624 241 350 534 563 609 129 333 ... $ j : int [1:4703] 1 2 3 7 7 7 7 8 10 10 ... $ v : num [1:4703] 7 5 6 9 5 7 5 5 5 7 ... $ nrow : int 659 $ ncol : int 5677 $ dimnames:List of 2 ..$ Terms: chr [1:659] "\0...

Partial comparison in string vector

2007 Aug 21

Partial comparison in string vector

...x, but when I run the same code on Windows XP, it doesn't work > any more. > > ### code: > library("lsa") > matrix1 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE. > 000\\LSA\\cuentos\\", stemming=TRUE, language="spanish", > minWordLength=2, minDocFreq=1, stopwords=NULL, vocabulary=NULL) > print(matrix1,bag_lines = 3, bag_cols = 3) > matrix1 = lw_bintf(matrix1) * gw_idf(matrix1) > space = lsa(matrix1, dims = dimcalc_share()) > as.textmatrix(space) > > ### the following line fails on windows XP > matrix2 = textm...

search for: minwordlength