Alex McKenzie
2008-Mar-25 16:50 UTC
[R] Solution to: Error "... x must be atomic" when using lsa (latent semantic analysis) package
In case someone else runs into this, I found the problem, it was related to having some zero-length text files. Make sure you have valid (non-empty) data files for loading into the document-term matrix. Alex ---------- Forwarded message ---------- From: Alex McKenzie <ahmckenzie@gmail.com> Date: Mar 25, 2008 2:07 AM Subject: Error "... x must be atomic" when using lsa (latent semantic analysis) package To: r-help@r-project.org Hello, I'm trying to use the "lsa" (latent semantic analysis) package, and running into a problem that seems to be related to the number of documents being processed. Here's the code I'm running (after loading the lsa and rstem packages), and the error message:> SnippetsPath <- "c:\\OED\\AuditExplain\\" # path where to find textsnippets> data(stopwords_en) > tdm <- textmatrix(SnippetsPath, stopwords=stopwords_en)I get this error message with ~ 280 documents: "Error in sort( unique.default(x), na.last = TRUE) : 'x' must be atomic" The error won't occur if I reduce the number of documents (say to 220, for instance). I'm not clear if this is memory/capacity issue or something else. A traceback returns the following, but interpreting this result is outside of my league ;-) Any idea of what could be the problem? I greatly appreciate your advice.> traceback()10: stop("'x' must be atomic") 9: sort(unique.default(x), na.last = TRUE) 8: factor(a, exclude = exclude) 7: table(txt) 6: inherits(x, "factor") 5: is.factor(x) 4: sort(table(txt), decreasing = TRUE) 3: FUN(X[[238]], ...) 2: lapply(dir(mydir, full.names = TRUE), textvector, stemming, language, minWordLength, minDocFreq, stopwords, vocabulary) 1: textmatrix(SnippetsPath, stopwords = stopwords_en) Alex [[alternative HTML version deleted]]