Edwin Husni Sutanudjaja
2010-Jul-28 10:13 UTC
[R] memory problem for scatterplot using ggplot
Dear all, I have a memory problem in making a scatter plot of my 17.5 million-pair datasets. My intention to use the "ggplot" package and use the "bin2d". Please find the attached script for more details. Could somebody please give me any clues or tips to solve my problem?? please ... Just for additional information: I'm running my R script on my 32-bit machine: Ubuntu 9.10, hardware: AMD Athlon Dual Core Processor 5200B, memory: 1.7GB. Many thanks in advance. Kind Regards, -- Ir. Edwin H. Sutanudjaja Dept. of Physical Geography, Faculty of Geosciences, Utrecht University
It was my understanding that R wasn't really the best thing for absolutely huge datasets. 17.5 million points would probably fall under the category of "absolutely huge." I'm on a little netbook right now (atom/R32) and it failed, but I'll try it on my macbookPro/R64 later and see if it's able to handle the size better. For more information, my error is the following: Error: cannot allocate vector of size 66.8 Mb R(6725,0xa016e500) malloc: *** mmap(size=70000640) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug R(6725,0xa016e500) malloc: *** mmap(size=70000640) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug> sessionInfo()R version 2.11.1 (2010-05-31) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] sp_0.9-65 mapproj_1.1-8.2 maps_2.1-4 mgcv_1.6-2 ggplot2_0.8.8 [6] reshape_0.8.3 plyr_1.0.2 proto_0.3-8 loaded via a namespace (and not attached): [1] digest_0.4.2 lattice_0.18-8 Matrix_0.999375-39 nlme_3.1-96 [5] tools_2.11.1 On Wed, Jul 28, 2010 at 11:13, Edwin Husni Sutanudjaja < hsutanudjajacchms99@yahoo.com> wrote:> Dear all, > > I have a memory problem in making a scatter plot of my 17.5 million-pair > datasets. > My intention to use the "ggplot" package and use the "bin2d". Please find > the > attached script for more details. > > Could somebody please give me any clues or tips to solve my problem?? > please ... > Just for additional information: I'm running my R script on my 32-bit > machine: > Ubuntu 9.10, hardware: AMD Athlon Dual Core Processor 5200B, memory: 1.7GB. > > Many thanks in advance. > Kind Regards, > > -- > Ir. Edwin H. Sutanudjaja > Dept. of Physical Geography, Faculty of Geosciences, Utrecht University > > > > > > -- > You received this message because you are subscribed to the ggplot2 mailing > list. > Please provide a reproducible example: http://gist.github.com/270442 > > To post: email ggplot2@googlegroups.com > To unsubscribe: email ggplot2+unsubscribe@googlegroups.com<ggplot2%2Bunsubscribe@googlegroups.com> > More options: http://groups.google.com/group/ggplot2 >[[alternative HTML version deleted]]
On 07/28/2010 06:13 AM, Edwin Husni Sutanudjaja wrote:> Dear all, > > I have a memory problem in making a scatter plot of my 17.5 million-pair > datasets. > My intention to use the "ggplot" package and use the "bin2d". Please find the > attached script for more details. > > Could somebody please give me any clues or tips to solve my problem?? please ... > Just for additional information: I'm running my R script on my 32-bit machine: > Ubuntu 9.10, hardware: AMD Athlon Dual Core Processor 5200B, memory: 1.7GB. > > Many thanks in advance. > Kind Regards, > >You should try to get access to a fairly robust 64bit machine, say in the range of >=8GiB real memory and see what you can do. No chance on a 32 bit machine. No chance on a 64 bit machine without sufficient real memory (you will be doomed to die by swap). Does your institution have a virtualization lab with the ability to allocate machines with large memory footprints? There is always Amazon EC2. You could experiment with sizing before buying that new workstation you've had your eye on. Alternatively, you might take much smaller samples of your data and massively decrease the size of the working set. I assume this is not want you want though. Mark