Displaying 14 results from an estimated 14 matches for "minwordlength".
2007 Aug 18
2
Problem with lsa package (data.frame) on Windows XP
...package) works fine on my
mac os x, but when I run the same code on Windows XP, it doesn't work
any more.
### code:
library("lsa")
matrix1 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE.
000\\LSA\\cuentos\\", stemming=TRUE, language="spanish",
minWordLength=2, minDocFreq=1, stopwords=NULL, vocabulary=NULL)
print(matrix1,bag_lines = 3, bag_cols = 3)
matrix1 = lw_bintf(matrix1) * gw_idf(matrix1)
space = lsa(matrix1, dims = dimcalc_share())
as.textmatrix(space)
### the following line fails on windows XP
matrix2 = textmatrix("C:\\Documents and Setti...
2006 Oct 03
1
new to R: don't understand errors
...ng
if it might be something in the files themselves...
At any rate I routinely get these two errors. The first is generated
when I include a minDocFreq=x, and it looks a little like this when I
run it:
> data(stopwords_en)
> CCauto = textmatrix( "CultureMineTXT" , minWordLength=3,
minDocFreq=50, stopwords=stopwords_en)
> Error in data.frame(docs = basename(file), terms = names(tab),
Freq = tab, :
> arguments imply differing number of rows: 1, 0
If I remove the minDocFreq, I get a different error:
> data(stopwords_en)
&...
2010 Mar 31
1
tm package- remove stowords failling
Hi,
I just noticed that by inspecting the matrix term that no all stopwords are
removed, does someone know how to fix that?
library(tm)
data("crude")
d<-tm_map(crude, removeWords, stopwords(language='english'))
dt<-DocumentTermMatrix(d,control=list(minWordLength=3, minDocFreq=2))
inspect( dt)
I am using R version 2.10, tm package 0.5-3
cheers
Welma
[[alternative HTML version deleted]]
2011 Sep 13
1
SVD Memory Issue
...D, it runs out of memory. I am using a 12GB Dual core Machine
with Windows XP and don't think I can increase the memory anymore. Are there
any other memory efficient methods to find the SVD?
The term document is obtained using:
tdm2 <-
TermDocumentMatrix(tr1,control=list(weighting=weightTf,minWordLength=3))
str(tdm2)
List of 6
$ i : int [1:6438] 202 729 737 278 402 621 654 718 157 380 ...
$ j : int [1:6438] 1 2 3 7 7 7 7 8 10 10 ...
$ v : num [1:6438] 8 5 6 9 5 7 5 6 5 7 ...
$ nrow : int 771
$ ncol : int 5677
$ dimnames:List of 2
..$ Terms: chr [1:771] "access...
2012 Oct 25
2
MinerĂa de texto
...moveWords, my.stopwords) tw.corpus = tm_map(tw.corpus, stripWhitespace) sw <- readLines("stopwords.es.txt",encoding="UTF-8") sw = iconv(sw, to="ASCII//TRANSLIT") tw.corpus = tm_map(tw.corpus, removeWords, sw) doc.m = TermDocumentMatrix(tw.corpus, control = list(minWordLength = 2)) dm = as.matrix(doc.m) # calculate the frequency of words v = sort(rowSums(dm), decreasing=TRUE) d = data.frame(word=names(v), freq=v) #Generate the wordcloud pal2 <- brewer.pal(8,"Dark2") wc=wordcloud(d$word, d$freq, min.freq=min.freq, scale=c(8,.2), max.word...
2012 Jan 13
4
Troubles with stemming (tm + Snowball packages) under MacOS
Dear all,
I have some troubles using the stemming algorithm provided by the tm
(text mining) + Snowball packages.
Here is my config:
MacOS 10.5
R 2.12.0 / R 2.13.1 / R 2.14.1 (I have tried several versions)
I have installed all the needed packages (tm, rJava, rWeka, Snowball)
+ dependencies. I have desactivated AWT (like written in
2008 Mar 25
0
Error "... x must be atomic" when using lsa (latent semantic analysis) package
...t be atomic")
9: sort(unique.default(x), na.last = TRUE)
8: factor(a, exclude = exclude)
7: table(txt)
6: inherits(x, "factor")
5: is.factor(x)
4: sort(table(txt), decreasing = TRUE)
3: FUN(X[[238]], ...)
2: lapply(dir(mydir, full.names = TRUE), textvector, stemming, language,
minWordLength, minDocFreq, stopwords, vocabulary)
1: textmatrix(SnippetsPath, stopwords = stopwords_en)
Alex
[[alternative HTML version deleted]]
2008 Mar 25
0
Solution to: Error "... x must be atomic" when using lsa (latent semantic analysis) package
...t be atomic")
9: sort(unique.default(x), na.last = TRUE)
8: factor(a, exclude = exclude)
7: table(txt)
6: inherits(x, "factor")
5: is.factor(x)
4: sort(table(txt), decreasing = TRUE)
3: FUN(X[[238]], ...)
2: lapply(dir(mydir, full.names = TRUE), textvector, stemming, language,
minWordLength, minDocFreq, stopwords, vocabulary)
1: textmatrix(SnippetsPath, stopwords = stopwords_en)
Alex
[[alternative HTML version deleted]]
2006 Oct 04
0
FW: new to R: don't understand errors
...of the updated lsa package
in a separate message which also includes a parameter called
minGlobFreq which is filtering out terms that appear less
than x times in the whole document collection. I guess that is
what you were looking for.
Considering the sanitizing: if you set minDocFreq to 1
and set minWordLength to 1, you should not get an error
with your document collection as you then are basically
taking everything (even a single character appearing
only once). It probably is not so problematic as the
LSA step will anyway group this low-frequency terms
in a lower order factor. Of course you will still g...
2010 Mar 18
0
error while usig "tm" package
I have recently started using "tm" package by Feinerer, K. Hornik, and D.
Meyer.
While trying to create a term-document matrix from a corpus (approxly 440
docs)
I get the following error:
tdm <- TermDocumentMatrix(tmp, control=list(weighting=weightTfIdf,
minDocFreq=2, minWordLength=3))
*Error in rowSums(m > 0) : 'x' must be an array of at least two dimensions*
This error appears for option weighting=weightTfIdf and not for
weighting=weightTf
As Idf would need division by df, is this anything to do with nature of my
data?
May be I am doing something silly. Can any...
2012 Feb 22
0
LSA package: problem with textmatrix()
...n
between
maternal
and
fetal
plasma
levels
of
glucose
and
free
fatty
acids
.
correlation
coefficients
have
been
determined
between
the
the command I am using looks like this, with the resulting error below:
>
> dtm <- textmatrix(LSAwork, stemming=TRUE, stopwords=StopListm, minGlobFreq=1, minWordLength=2, removeNumbers=TRUE)
Error in data.frame(docs = basename(file), terms = names(tab), Freq = tab, :
arguments imply differing number of rows: 1, 0
In addition: Warning message:
In FUN(c("LSAWork/med.000001", "LSAWork/med.000002", "LSAWork/med.000003", :
[textvec...
2012 Feb 26
2
tm_map help
...veWords, myStopwords)
dictCorpus <- myCorpus
myCorpus <- tm_map(myCorpus, stemDocument)
################ERROR HAPPENS ON NEXT LINE##################################
myCorpus <- tm_map(myCorpus, stemCompletion, dictionary=dictCorpus)
myDtm <- TermDocumentMatrix(myCorpus, control = list(minWordLength = 1))
m <- as.matrix(myDtm)
v <- sort(rowSums(m), decreasing=TRUE)
myNames <- names(v)
d <- data.frame(word=myNames, freq=v)
wordcloud(d$word, d$freq, min.freq=minFreq)
list(freq=v, TextMatrix=myDtm)
}
qantas=hashTag("#qantas", 7)
[[alternative HTML version deleted]]
2011 Sep 12
1
findFreqTerms vs minDocFreq in Package 'tm'
...hting=weightBin))
> freq_terms <- findFreqTerms(tdm1, lowfreq =5, highfreq = Inf)
> str(freq_terms)
chr [1:3140] "abc" "abil" "abl" "abnorm" "abort" "absenc" ...
> tdm2 <- TermDocumentMatrix(tr1,control=list(minDocFreq=5,minWordLength=1))
> str(tdm2)
List of 6
$ i : int [1:4703] 173 616 624 241 350 534 563 609 129 333 ...
$ j : int [1:4703] 1 2 3 7 7 7 7 8 10 10 ...
$ v : num [1:4703] 7 5 6 9 5 7 5 5 5 7 ...
$ nrow : int 659
$ ncol : int 5677
$ dimnames:List of 2
..$ Terms: chr [1:659] "\0...
2007 Aug 21
2
Partial comparison in string vector
...x, but when I run the same code on Windows XP, it doesn't work
> any more.
>
> ### code:
> library("lsa")
> matrix1 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE.
> 000\\LSA\\cuentos\\", stemming=TRUE, language="spanish",
> minWordLength=2, minDocFreq=1, stopwords=NULL, vocabulary=NULL)
> print(matrix1,bag_lines = 3, bag_cols = 3)
> matrix1 = lw_bintf(matrix1) * gw_idf(matrix1)
> space = lsa(matrix1, dims = dimcalc_share())
> as.textmatrix(space)
>
> ### the following line fails on windows XP
> matrix2 = textm...