Dear R-help, I've run into a problem loading .RData: I was running a large computation, which supposedly produce a large R object. At the end of the session, I did a save.image() and then quit. The .RData has size 613,249,399 bytes. Now I can't get R to load this .RData file. Whenever I tried, I get "Error: vector memory exhausted (limit reached)". I tried adding "--min-vsize=1000M", but that didn't help. I also tried R --vanilla and then attach(".RData"), same error.>From what I can see, the file is not corrupted. How can I get R to load it?System info: R-1.4.1 on Mandrake Linux 7.1 (kernel 2.4.3) Dual P3-866 Xeon with 2GB RAM. Regards, Andy ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================= -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Patrick, I appreciate your comments, and practice everything that you preach. However, that workspace image contains only 2~3 R objects: the input and output of a single R command. I knew there could be problems, so I've stripped it down to the bare minimum. Yes, I also kept the commands in a script. That single command (in case you want to know: a random forest run with 4000 rows and nearly 7000 variables) took over 3 days to run. There's not a whole lot I can do here when the data is so large. Andy> -----Original Message----- > From: Patrick Connolly [mailto:P.Connolly at hortresearch.co.nz] > Sent: Tuesday, April 23, 2002 7:18 PM > To: andy_liaw at merck.com > Subject: Re: [R] error loading huge .RData > > > According to Liaw, Andy: > |> > |> Dear R-help, > |> > |> I've run into a problem loading .RData: I was running a > large computation, > |> which supposedly produce a large R object. At the end of > the session, I did > |> a save.image() and then quit. The .RData has size > 613,249,399 bytes. Now I > |> can't get R to load this .RData file. Whenever I tried, I > get "Error: > |> vector memory exhausted (limit reached)". I tried adding > |> "--min-vsize=1000M", but that didn't help. I also tried R > --vanilla and > |> then attach(".RData"), same error. > |> > |> From what I can see, the file is not corrupted. How can I > get R to load it? > > What would you look for to see if it's corrupted? That's a big file > to find it in. > > I've never used a .RData file greater than 100 Mb, so my experience is > not directly comparable, but I feel uneasy having too much in one > file. It's a safer practice to make sure you keep a file of the > commands you used to create your R objects so that it's easy (even if > not quick) to reproduce them. It's also good to keep not too many > objects in one Data file. I had only one occasion when I had a > corrupt .RData file, but it was easy to recover. The bigger the file, > the more opportunity for corruption to happen. How many objects would > there have been in your .RData file? > > > > Might not be much help for now, but it might help in the future. > > > best > > -- > ************************************************************* > ___ Patrick Connolly > {~._.~} HortResearch Great minds discuss ideas; > _( Y )_ Mt Albert Average minds discuss events; > (:_~*~_:) Auckland Small minds discuss people. > (_)-(_) New Zealand .... Anon > Ph: +64-9 815 4200 x 7188 > ************************************************************* > > > ______________________________________________________ > The contents of this e-mail are privileged and/or confidential to the > named recipient and are not to be used by any other person and/or > organisation. If you have received this e-mail in error, > please notify > the sender and delete all material pertaining to this e-mail. > ______________________________________________________ >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================= -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> Hmm. You could be running into some sort of situation where data > temporarily take up more space in memory than they need to. It does > sound like a bit of a bug if R can write images that are bigger than > it can read. Not sure how to proceed though. Does anyone on R-core > have a similarly big system and a spare gigabyte of disk? Is it > possible to create a mock-up of similarly organized data that displays > the same effect, but takes less than three days? > > BTW: Did we ever hear what system this is happening on?Yes, in the original post: R-1.4.1/Mandrake Linux 7.1 (kernel 2.4.3) Dual P3-866 Xeon with 2GB RAM and 2GB swap. Prof. Tierney has been trying to help me off-list. I monitored the R.bin process through ktop as Prof. Tierney suggested. The strange thing is that the memory usage for the R.bin process would reach nearly 1000MB and then R just quits with the vector heap exhausted error. I ran gdb on R, also as Prof. Tierney suggested, and he said that malloc was not able to get more memory. I check ulimit and it says "unlimited". Prof. Tierney also suggested running strace, but I haven't gotten arround to that. Will keep you folks posted... Regards, Andy ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================= -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Tue, Apr 23, 2002 at 11:36:48AM -0400, Liaw, Andy wrote:> Dear R-help, > > I've run into a problem loading .RData: I was running a large computation, > which supposedly produce a large R object. At the end of the session, I did > a save.image() and then quit. The .RData has size 613,249,399 bytes. Now I > can't get R to load this .RData file. Whenever I tried, I get "Error: > vector memory exhausted (limit reached)". I tried adding > "--min-vsize=1000M", but that didn't help. I also tried R --vanilla and > then attach(".RData"), same error. > > >From what I can see, the file is not corrupted. How can I get R to load it? > > System info: > R-1.4.1 on Mandrake Linux 7.1 (kernel 2.4.3) > Dual P3-866 Xeon with 2GB RAM. > > Regards, > Andy >Andy Liaw indicated privately that he decided to upgrade his Linux and the problem went away (because glibc 2.2's malloc behaves by default differently than earlier ones). In tracking this down I learned a little more about address space use on Linux than I wanted to know. But as this may come up again I'll report it for future reference. The details are probably not quite right, but I think the big picture is. It may be worth knowing if you want to use large amounts of memory on 32-bit Linux, or any other 32-bit OS--the details given here are specific to Linux, but the general issue is not: Trying to carve out room for one or two gigabytes of memory from a 32-bit address space, which is limited to 4G, is tricky. The details: In 32-bit Linux you have a 4G address space. The lower 3G of this is available for user mode. The bottom contains program text, data, and bss segments followed by the heap. The heap, which is what malloc traditionally uses, is a contiguous range of addresses that goes up to a point that can be adjusted with the brk system call. The brk can be adjusted to increase heap size as long as the resulting range does not intersect any range that has been used for memory mapping. Shared libraries are loaded by memory mapping their data, text and bss sections with mmap. You can find out what is mapped where by looking at /proc/<your process pid>/maps. Every Linux program needs ld.so, and that is always mapped to a range of addresses starting at 0x40000000. [This is configurable when you build a custom kernel, and it may now or soon be adjustable to some degree at boot or run time, but this is the default.] So the heap can grow at most to this point, which means the contiguous heap is limited to a size of a little under 1G, no matter how much swap space or memory you have. glibc malloc can either allocate only from the traditional contiguous heap, which implies a 1G max on total allocation, or it can also allocate using mmap, which allows it to use closer to the full 3G of user mode address space. The drawbacks of using mmap include a speed penalty, at least by some measurements, and fragmentation of the address space available for memory mapping large files. Whether and how glibc uses mmap is tunable by calling mallopt or by setting the environment variables MALLOC_MMAP_THRESHOLD_ and MALLOC_MMAP_MAX_. The default behavior seems to have changed between glibc 2.1 and 2.2, and may have changed again with a patch to 2.2. By default, 2.1 seems to only use mmap for allocations of size above the mmap threshold (which I think defaults to 128K), but glibc 2.2 will also use mmap for smaller allocations if it can't get them from the standard heap. So, because of the ld.so mapping, 1G is something of a critical threshold. In this particular example with the large .RData file, it would appear that the malloc used was not using mmap when the contiguous heap runs out. The process that created the file was probably quite close to the limit of 1G but did not exceed it, so it did not fail. Loading the file would push the required memory over the limit, and so could not be done with the standard malloc settings on this system. Upgrading to a newer glibc changed the default behavior and now the file can be loaded. With the old libc it might have been possible to get the file loaded by using environment variable settings something like MALLOC_MMAP_THRESHOLD_=2000 MALLOC_MMAP_MAX_=1000000 for the R process, though these may not be the safest choices and may degrade performance. But how much useful work one can do with this high a portion of the address space in use is not entirely clear. 64-bit systems are starting to look real attractive. A couple of references: http://www.linuxshowcase.org/full_papers/ezolt/ezolt.pdf http://mail.nl.linux.org/linux-mm/2000-07/msg00001.html http://www.linux-mag.com/2001-06/compile_01.html http://www.linux-mag.com/2001-07/compile_01.html http://www.gnu.org/manual/glibc-2.2.3 http://sources.redhat.com/ml/libc-hacker/2000-07/msg00273.html http://www.ussg.iu.edu/hypermail/linux/kernel/0101.1/0007.html <R source root>/src/gnuwin32/malloc.c luke -- Luke Tierney University of Minnesota Phone: 612-625-7843 School of Statistics Fax: 612-624-8868 313 Ford Hall, 224 Church St. S.E. email: luke at stat.umn.edu Minneapolis, MN 55455 USA WWW: http://www.stat.umn.edu -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._