On 10/3/2006 1:32 PM, Wingfield, Jerad G. wrote:> Hello all,
>
>
>
> I'm brand new to the use of R, and I'm trying to quickly learning
the
> rudiments for a couple of projects here at work. I'm working with the
> lsa package and trying to generate various semantic spaces. I seem to do
> well with small collections of clean text files, but now that I am
> trying to work with larger collections of less than perfection files,
> I'm getting errors that I don't quite understand. So I'm hoping
some of
> you out there might recognize my issues and be able to point me in the
> right direction to resolve them.
>
>
>
> Currently, I have a corpus of ~12,000 text files. I've separated them
> out into other folder of varying sizes to check if there is some sort of
> limit on the number of files. Even when I only use the same number as
> previous working collections, I still get the errors. So I am wondering
> if it might be something in the files themselves...
>
>
>
> At any rate I routinely get these two errors. The first is generated
> when I include a minDocFreq=x, and it looks a little like this when I
> run it:
Those errors are coming from R, but they indicate errors in the
functions you are using, so you'll need to talk to the maintainer of
that package (Fridolin Wild, whose email address you can get by
library(help=lsa)) to find out the real cause.
Duncan Murdoch>
>
>
>> data(stopwords_en)
>
>> CCauto = textmatrix( "CultureMineTXT" , minWordLength=3,
> minDocFreq=50, stopwords=stopwords_en)
>
>> Error in data.frame(docs = basename(file), terms = names(tab),
> Freq = tab, :
>
>> arguments imply differing number of rows: 1, 0
>
>
>
> If I remove the minDocFreq, I get a different error:
>
>
>
>> data(stopwords_en)
>
>> CCauto = textmatrix( "CultureMineTXT" , minWordLength=3,
> stopwords=stopwords_en)
>
>> Error in as.vector(x, mode) : invalid argument 'mode'
>
>
>
> Any help would be greatly appreciated.
>
>
>
> Gabe Wingfield
>
> IT and Program Specialist I
>
> Center for Applied Social Research
>
> University of Oklahoma
>
> 3200 Marshall Avenue, Suite 201
>
> Norman, OK 73072
>
>
>
> P: 405-325-4786
>
> F: 405-321-6936
>
> gwingfield at ou.edu
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.