David L. Van Brunt, Ph.D.
2004-May-15 18:52 UTC
[R] " cannot allocate vector of length 1072693248"
Andy; Well, that about does it.... I'm copying this one back to the list for the benefit of those who may hit this thread while searching the archives. Your changes to the code run just fine on the my Windows machine, but gives the vector length error on the G4 whether I'm using the OS X build of R (as in Raqua) or the X11 build (for Darwin). It is worth noting that I have nearly twice as much RAM and HD on the OS X G4 as I have on the Pentium. So it's definitely a platform specific problem. What on earth does one do about that? It's not emergent, but whomever works on the R source would probably like to know. On 5/12/04 20:29, "Liaw, Andy" <andy_liaw at merck.com> wrote:> That's because I _attach()_ the .rda file you sent me, so that copy of > `USdata' is in search position 2. The subset statement makes a copy > containing the subset in the workspace (aka global environment). At the end > of the loop, that copy is rm()'ed, but the copy in search position 2 is > still accessible. > > One thing I could have done is: > > data.list <- split(USdata, USdata$symbol) > > then inside the loop, just use data.list[[i]]. > > Andy > >> -----Original Message----- >> From: David L. Van Brunt, Ph.D. [mailto:dvanbrunt at well-wired.com] >> Sent: Wednesday, May 12, 2004 9:14 PM >> To: Liaw, Andy >> Subject: Re: [R] " cannot allocate vector of length 1072693248" >> >> >> I took it for a spin. >> >> Odd, but looking at your code, it doesn??t look like it should run, and >> indeed it doesn??t, past the second loop. Early in the loop >> you overwrite >> ??USData?? as follows: >> >> USdata <- USdata[USdata$symbol == tickernames[i], -54] >> >> Then at the end of the loop your remove it with: >> >> rm(USdata) ## ,risk.rf,risk.pred,risk.rsq, >> >> So on my second pass, I get the following error: >> >>> finished 1 of 30 >>> Just loaded: 2 of 30 . AIG Assigning vectors and outcomes.... >>> Error: Object "USdata" not found >> >> >> I see you "attached" the dataset prior to the loop, but this >> seems to be >> circumvented in that you call "USdata$<somevar" in each case >> within the >> loop. >> >> I've had it happen that after noodling around, I'm working >> only by virtue of >> having leftovers from prior work, but with a fresh launch I >> discover that it >> won't work after all. >> >> Did I miss something? >> >> Anyway, I'll try to rework my code to more closely >> approximate yours (or >> rather, vise-versa) and let you know how that goes. >> >> Sorry so many delays on my end. Hell at work, just wiped out >> by the time I >> get home. Saw your post on the balancing of data, too, by the >> way... Very >> interesting, and very helpful. >> >> On 5/10/04 21:05, "Liaw, Andy" <andy_liaw at merck.com> wrote: >> >>> David, >>> >>> I changed the code a bit so that it runs one ticker symbol >> worth of data in >>> each iteration. Is that what you were doing? You can see >> from the attached >>> file that I still don't get any error. Memory usage was >> less, and the code >>> ran a lot faster (of course). >>> >>> Andy >>>> >>>> >>>> -----Original Message----- >>>> From: David L. Van Brunt, Ph.D. [mailto:dvanbrunt at well-wired.com] >>>> Sent: Monday, May 10, 2004 12:05 AM >>>> To: Liaw, Andy >>>> Subject: Re: [R] " cannot allocate vector of length 1072693248" >>>> >>>> Thanks. I did just the opposite today, ran through 500 >> loops using only the >>>> regression forests, to be sure there was no issue there. >>>> >>>> Worked fine. But the classifications crash even when run alone. >>>> >>>> I'm not sure what you mean in your first sentence. The >> data set I posted is >>>> the full data, which I would normally query out of within >> the loop to pull >>>> each ticker symbol out one at a time. >>>> >>>> Maybe that's the secret sauce.... I'm doing a MySQL query >> inside that loop, >>>> whereas you loaded the data from a file. Each time I do >> the run, I'm working >>>> with a different subset of the data, i.e., a different >> result set of the >>>> MySQL query. I think, unless I missed something, that you >> are repeating the >>>> same analysis on the same data 20 times. >>>> >>>> Running the code as you sent it back to me, I had similar >> results to what >>>> you did. But it wasn't the same analysis. Each result >> set- or group of >>>> results for each iteration of the loop-- should be on a >> new subset of data. >>>> Said differently, the first time through the loop, all >> the cases should have >>>> a value of "AA" for "symbol", next time through they >> should all be "AIG", >>>> etc. The file had 30 loops worth (the Dow), but it >> usually dies around 6 or >>>> 7. Don't know why just repeating them seems to work, though... >>>> >>>> I thought at one time to get all the data from outside the >> loop, then just >>>> subset differently (with the "testset" and "predset" >> definition) each time >>>> through... That's the way it was originally, and when the >> problem first >>>> showed up. I only moved the query inside the loop because >> I thought it would >>>> spare me the overhead of partially duplicating the data in memory. >>>> >>>> Man, this is a head-scratcher. >>>> >>>> >>>> On 5/9/04 21:05, "Liaw, Andy" <andy_liaw at merck.com> wrote: >>>> >>>> >>>>> David, >>>>> >>>>> I assume the data you posted is iteration worth in your >> for loop? I looped >>>>> over it 20 times and didn't get any errors (did have to >> change the code a >>>>> bit to make it run). Please look over the attached file >> to see if what I >>>>> tested is close to what you would expect. I ran it on >> an Opteron 248 with >>>>> 8GB of RAM. From `top', the maximum memory usage for >> the R process is >>>>> 366MB. It took just over an hour to run the 20 reps, so >> it was not using >>>>> anywhere close to 1GB of RAM as your error message would >> indicate. >>>>> >>>>> I would really appreciate it if you can strip the code >> down as much as >>>>> possible to only the part that produce the error. E.g., >> if none of the >>>>> regression runs were causing problems, comment them out >> and see if you >>>>> still get the error. Saves running time and eye-balling time. >>>>> >>>>> Best, >>>>> Andy >>>>> >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: David L. Van Brunt, Ph.D. >> [mailto:dvanbrunt at well-wired.com] >>>>>> Sent: Friday, May 07, 2004 11:21 PM >>>>>> To: Liaw, Andy >>>>>> Subject: Re: [R] " cannot allocate vector of length 1072693248" >>>>>> >>>>>> Good news/Bad news.... >>>>>> >>>>>> 4.2-1 installed without a hitch from source on OS X. >> But the same >>>>>> behavior occurred, and in the same place. In the >> syntax code I sent, I >>>>>> had commented out the prediction call after >> "checkpoint 1", but that only >>>>>> stays the execution... It dies at 5 of 30 if I leave >> those lines in, but >>>>>> dies anyway at 12 of 30 on "gain2" if I take 'em out. >>>>>> >>>>>> HTH. >>>>>> >>>>>> On 5/7/04 21:09, "Liaw, Andy" <andy_liaw at merck.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>>> David, Please see reply inline below. Andy >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: David L. Van Brunt, Ph.D. >> [mailto:dvanbrunt at well-wired.com] >>>>>>>> Sent: Friday, May 07, 2004 8:01 PM >>>>>>>> To: Liaw, Andy >>>>>>>> Subject: Re: [R] " cannot allocate vector of length >> 1072693248" >>>>>>>> >>>>>>>> >>>>>>>> Thanks very much for taking a crack at this. It's >> been a ton of fun, >>>>>>>> but this bump in the road sure has me stumped. >>>>>>>> >>>>>>>> I realized after emailing you earlier that I left >> out important >>>>>>>> information. Here goes... >>>>>>>> >>>>>>>> -Using R version 1.90 beta for OS X, but also did it >> on the latest >>>>>>>> Windows version (same behavior), and on version 8.1 >> for Windows. >>>>>>>> (same behavior) >>>>>>>> -randomForest 4.0-7 >>>>>>>> -No problems at all on regression.... Only happens with >>>>>>>> classification! >>>>>>>> [AL] Could you try version 4.2-1 at >>>>>>>> >> http://home.comcast.net/~andyliaw/randomForest_4.2-1.tar.gz >> (source) >>>>>>>> or >> http://home.comcast.net/~andyliaw/randomForest_4.2-1.zip (Windows >>>>>>>> binary) and see if that makes any difference? >>>>>>>> >>>>>>>> Here's the code .. If I remove one code block, it >> will give the same >>>>>>>> error on another code block, always failing with the >> memory overflow >>>>>>>> right after "checkpoint 1" >>>>>>>> >>>>>>>> Be gentle, >>>>>>>> [AL] Don't worry, I won't bite... >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I'm new at this! This is all a learning experience >> for me, and I >>>>>>>> thought some readily available data would make for a >> good exercise. >>>>>>>> This specific challenge is to loop through the Dow data, make >>>>>>>> predictions for each member, save out the results to >> a table. Yes, it >>>>>>>> is silly, but it's a great way to learn your way >> around a new program! >>>>>>>> >>>>>>>> The data file is quite large, which is why I use >> MySQL and only pull >>>>>>>> in a little bit at a time. That's what I initially >> thought was wrong >>>>>>>> (too much data in memory, as I read in the whole >> thing) and why I put >>>>>>>> the select query inside the loop to only pull out >> one member at a >>>>>>>> time. I'm attaching the data, so you can get the >> structure. I'm sure >>>>>>>> there are a lot of ways I could write this better, >> but it does work >>>>>>>> for the first few times through. Here's the code... >>>>>>>> [AL] If you sent the data as zip file, it would have >> been stripped off >>>>>>>> silently by our email servers. Could you post it >> on a web site >>>>>>>> somewhere that I can download, or use the bzip2 >> format instead? It's >>>>>>>> hard for me to diagnose without data. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >> -------------------------------------------------------------- >> ------------ >>>>>>> ---- >>>>>>> Notice: This e-mail message, together with any >> attachments, contains >>>>>>> information of Merck Co., Inc. (One Merck Drive, >> Whitehouse Station, >>>>>>> New Jersey, USA 08889), and/or its affiliates (which >> may be known outside >>>>>>> the United States as Merck Frosst, Merck Sharp Dohme >> or MSD and in >>>>>>> Japan, as Banyu) that may be confidential, proprietary >> copyrighted and/or >>>>>>> legally privileged. It is intended solely for the use >> of the individual >>>>>>> or entity named on this message. If you are not the >> intended recipient, >>>>>>> and have received this message in error, please notify >> us immediately by >>>>>>> reply e-mail and then delete it from your system. >>>>>>> >> -------------------------------------------------------------- >> ------------ >>>>>>> ---- >>>>>>> >>>>>> >>>> >> >> >> -- >> David L. Van Brunt, Ph.D. >> Outlier Consulting & Development >> mailto: <ocd at well-wired.com> >> >> >> >> > > > ------------------------------------------------------------------------------ > Notice: This e-mail message, together with any attachments, contains > information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New > Jersey, USA 08889), and/or its affiliates (which may be known outside the > United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as > Banyu) that may be confidential, proprietary copyrighted and/or legally > privileged. It is intended solely for the use of the individual or entity > named on this message. If you are not the intended recipient, and have > received this message in error, please notify us immediately by reply e-mail > and then delete it from your system. > -------------------------------------------------------------------------------- David L. Van Brunt, Ph.D. Outlier Consulting & Development mailto: <ocd at well-wired.com>