Guelman, Leo
2010-Aug-17 19:45 UTC
[R] TM Package - Corpus function - Memory Allocation Problems
I'm using R 2.11.1 on Win XP (32-bit) with 3 GB of RAM. My data has (only) 16.0 MB. I want to create a VCorpus object using the Corpus function in the tm package but I'm running into Memory allocation issues: "Error: cannot allocate vector of size 372 Kb". My data is stored in a csv file which I've imported with "read.csv" and then used the following to create the Corpus (but it failed with the error message above) txt <- Corpus(DataframeSource(txt)) I've even tried to subset ~ 10% of my data but I run into the same error. What is a the best way to solve this memory problem other than increasing a physical RAM? Thanks in advance for any help, Leo. _______________________________________________________________________ This e-mail may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this e-mail or the information it contains by other than an intended recipient is unauthorized. If you received this e-mail in error, please advise me (by return e-mail or otherwise) immediately. Ce courriel peut contenir des renseignements protégés et confidentiels. Lexpéditeur ne renonce pas aux droits et obligations qui sy rapportent. Toute diffusion, utilisation ou copie de ce courriel ou des renseignements quil contient par une personne autre que le destinataire désigné est interdite. Si vous recevez ce courriel par erreur, veuillez men aviser immédiatement, par retour de courriel ou par un autre moyen. [[alternative HTML version deleted]]
David Winsemius
2010-Aug-17 20:06 UTC
[R] TM Package - Corpus function - Memory Allocation Problems
On Aug 17, 2010, at 3:45 PM, Guelman, Leo wrote:> > I'm using R 2.11.1 on Win XP (32-bit) with 3 GB of RAM. My data has > (only) 16.0 MB.Probably more than that. Each numeric is 8 bytes even before overhead, so a csv file that was all single digit integers and commas would more that double in size unless they were declared to be integer in the read step.> > I want to create a VCorpus object using the Corpus function in the tm > package but I'm running into Memory allocation issues: "Error: cannot > allocate vector of size 372 Kb". > > My data is stored in a csv file which I've imported with "read.csv" > and > then used the following to create the Corpus (but it failed with the > error message above) > > txt <- Corpus(DataframeSource(txt))You probably have other objects in your workspace. When I want to know what is taking up the most space I use this function: getsizes <-function() {z <- sapply(ls(envir=globalenv()), function(x) object.size(get(x))) (tmp <- as.matrix(rev(sort(z))[1:10]))} Clearing out your workspace by removing everything might be the best approach, since the memory allocated to new objects needs to be contiguous. You ought to make sure that you are not running tons of other windoze applications that are restricting your default 2Gb: http://cran.r-project.org/bin/windows/base/rw-FAQ.html#There-seems-to-be-a-limit-on-the-memory-it-uses_0021> > I've even tried to subset ~ 10% of my data but I run into the same > error. What is a the best way to solve this memory problem other than > increasing a > physical RAM? > > Thanks in advance for any help, > > Leo.-- David Winsemius, MD West Hartford, CT