thr3ads.net - R help - [R] LSA package: problem with textmatrix() [Feb 2012]

If this information is useful, please help other people find it:
Share via:

Ashton, Triss

2012-Feb-22 22:24 UTC

[R] LSA package: problem with textmatrix()

I have a problem with the textmatrix() function of the LSA package whenever I
specify 'removeNumbers=TRUE'.  The data for the function are stored in a
directory LSAwork which consists of a series of files that houses the text in
column form.  As long as removeNumbers = FALSE or it is not present the
textmatrix function works just fine.  The error message I get seems to suggest
it is finding the files empty after filtering. However, all of the files are
primarily words with only a few numbers mixed in.  Any help appreciated.

The data I am using is the MEDLINE data set and the first file in the data set
med.000001 looks like this:
correlation
between
maternal
and
fetal
plasma
levels
of
glucose
and
free
fatty
acids
.
correlation
coefficients
have
been
determined
between
the

the command I am using looks like this, with the resulting error
below:> 
> dtm <- textmatrix(LSAwork, stemming=TRUE, stopwords=StopListm,
minGlobFreq=1, minWordLength=2, removeNumbers=TRUE)Error in data.frame(docs = basename(file), terms = names(tab), Freq = tab,  : 
  arguments imply differing number of rows: 1, 0
In addition: Warning message:
In FUN(c("LSAWork/med.000001", "LSAWork/med.000002",
"LSAWork/med.000003",  :
  [textvector] - the file LSAWork/med.000001 contains no terms after
filtering.>
Triss Ashton
University of North Texas

Seemingly Similar Threads

Search for more apparently analagous threads

R help - Feb 2012 - LSA package: problem with textmatrix()

[R] LSA package: problem with textmatrix()

Seemingly Similar Threads

Wisdom of the Ancients