Hi, I try to use lm to fit a linear model with 600k rows and 70 attributes. But I can't even load the data into the R environment. The error message says the vector memory is used up. Is there anyone having experience with large datasets in R? (I bet) Please advise. thanks, Yun-Fang [[alternative HTML version deleted]]
Have you read the posting guide for R-help? You need to tell us more: What hardware/OS/version of R are you using? A rough calculation on storage needed:> 6e5 * 70 * 8 / 1024^2[1] 320.4346 So you need 320+ MB of RAM just to store the data as a matrix of doubles in R. You need enough RAM to make a couple of copies of this. If any of the variables are factors, the requirement goes up even more, as the design matrix used to fit the model will expand the factors into columns of contrasts. How much physical RAM do you have on the computer? There are more efficient ways to fit the model to data of this size, but you need to be able to at least fit the data into memory. There have been a few suggestions on R-help before on how to do this, so do search the archive. (I believe Prof. Koenker had a web page describing how to do this with mySQL and updating the X'X matrix by reading in data in chunks.) Andy> From: Yun-Fang Juan > > Hi, > I try to use lm to fit a linear model with 600k rows and 70 > attributes. > But I can't even load the data into the R environment. > The error message says the vector memory is used up. > > Is there anyone having experience with large datasets in R? (I bet) > > Please advise. > > thanks, > > Yun-Fang------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments,...{{dropped}}
Here is the exact error I got ---------------------- Read 73 items Error: cannot allocate vector of size 1953 Kb Execution halted ----------------------- I am running R on Freebsd 4.3 with double CPU and 2 GB memory Is that sufficient? hw.model: Pentium III/Pentium III Xeon/Celeron hw.ncpu: 2 hw.byteorder: 1234 hw.physmem: 2144411648 hw.usermem: 2009980928 thanks for your advice in advance, Yun-Fang ----- Original Message ----- From: "Yun-Fang Juan" <yunfang at yahoo-inc.com> To: <r-help at stat.math.ethz.ch> Sent: Thursday, January 29, 2004 7:03 PM Subject: [R] memory problem for R> Hi, > I try to use lm to fit a linear model with 600k rows and 70 attributes. > But I can't even load the data into the R environment. > The error message says the vector memory is used up. > > Is there anyone having experience with large datasets in R? (I bet) > > Please advise. > > > thanks, > > > Yun-Fang > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide!http://www.R-project.org/posting-guide.html> >
You still have not read the posting guide, have you? See more below.> From: Yun-Fang Juan[...]> I tried 10% sample and it turned out the matrix became > singular after I did that. > Ther reason is some of the attributes only have zero values > most of the time. > The data i am using is web log data and after some > transformation, they are all numeric. > Can we specify some parameters in read.table so that the > program will treat all the vars as numeric > (with this context, hopefully that will reduce the memory > consumption) ?and you clearly have not read my (private) reply, either, in which I told you *exactly* how to do that, via the colClasses argument to read.table(). Please take the help given to you seriously. If you want attention, you have to pay attention. Andy> thanks a lot, > > Yun-Fang------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments,...{{dropped}}