Dubravko Dolic
2005-Jun-29 11:26 UTC
[R] Memory Management under Linux: Problems to allocate large amounts of data
Dear Group I'm still trying to bring many data into R (see older postings). After solving some troubles with the database I do most of the work in MySQL. But still I could be nice to work on some data using R. Therefore I can use a dedicated Server with Gentoo Linux as OS hosting only R. This Server is a nice machine with two CPU and 4GB RAM which should do the job: Dual Intel XEON 3.06 GHz 4 x 1 GB RAM PC2100 CL2 HP Proliant DL380-G3 I read the R-Online help on memory issues and the article on garbage collection from the R-News 01-2001 (Luke Tierney). Also the FAQ and some newsgroup postings were very helpful on understanding memory issues using R. Now I try to read data from a database. The data I wanted to read consists of 158902553 rows and one field (column) and is of type bigint(20) in the database. I received the message that R could not allocate the 2048000 Kb (almost 2GB) sized vector. As I have 4BG of RAM I could not imagine why this happened. In my understanding R under Linux (32bit) should be able to use the full RAM. As there is not much space used by OS and R as such ("free" shows the use of app. 670 MB after dbSendQuery and fetch) there are 3GB to be occupied by R. Is that correct? After that I started R by setting n/vsize explicitly R --min-vsize=10M --max-vsize=3G --min-nsize=500k --max-nsize=100M> mem.limits()nsize vsize 104857600 NA and received the same message. A garbage collection delivered the following information:> gc()used (Mb) gc trigger (Mb) limit (Mb) max used (Mb) Ncells 217234 5.9 500000 13.4 2800 500000 13.4 Vcells 87472 0.7 157650064 1202.8 3072 196695437 1500.7 Now I'm at a loss. Maybe anyone could give me a hint where I should read further or which Information can take me any further Dubravko Dolic Statistical Analyst Tel:?? ?????? +49 (0)89-55 27 44 - 4630 Fax: ?????? +49 (0)89-55 27 44 - 2463 Email: dubravko.dolic at komdat.com Komdat GmbH Nymphenburger Stra??e 86 80636 M??nchen --------------------------------------------- ONLINE MARKETING THAT WORKS --------------------------------------------- This electronic message contains information from Komdat Gmb...{{dropped}}
Prof Brian Ripley
2005-Jun-29 13:18 UTC
[R] Memory Management under Linux: Problems to allocate large amounts of data
Let's assume this is a 32-bit Xeon and a 32-bit OS (there are 64-bit-capable Xeons). Then a user process like R gets a 4GB address space, 1GB of which is reserved for the kernel. So R has a 3GB address space, and it is trying to allocate a 2GB contigous chunk. Because of memory fragmentation that is quite unlikely to succeed. We run 64-bit OSes on all our machines with 2GB or more RAM, for this reason. On Wed, 29 Jun 2005, Dubravko Dolic wrote:> Dear Group > > I'm still trying to bring many data into R (see older postings). After > solving some troubles with the database I do most of the work in MySQL. > But still I could be nice to work on some data using R. Therefore I can > use a dedicated Server with Gentoo Linux as OS hosting only R. This > Server is a nice machine with two CPU and 4GB RAM which should do the > job: > > Dual Intel XEON 3.06 GHz > 4 x 1 GB RAM PC2100 CL2 > HP Proliant DL380-G3 > > I read the R-Online help on memory issues and the article on garbage > collection from the R-News 01-2001 (Luke Tierney). Also the FAQ and some > newsgroup postings were very helpful on understanding memory issues > using R. > > Now I try to read data from a database. The data I wanted to read > consists of 158902553 rows and one field (column) and is of type > bigint(20) in the database. I received the message that R could not > allocate the 2048000 Kb (almost 2GB) sized vector. As I have 4BG of RAM > I could not imagine why this happened. In my understanding R under Linux > (32bit) should be able to use the full RAM. As there is not much space > used by OS and R as such ("free" shows the use of app. 670 MB after > dbSendQuery and fetch) there are 3GB to be occupied by R. Is that > correct?Not really. The R executable code and the Ncells are already in the address space, and this is a virtual memory OS, so the amount of RAM is not relevant (it would still be a 3GB limit with 12GB of RAM).> After that I started R by setting n/vsize explicitly > > R --min-vsize=10M --max-vsize=3G --min-nsize=500k --max-nsize=100M > >> mem.limits() > nsize vsize > 104857600 NA > > and received the same message. > > > A garbage collection delivered the following information: > >> gc() > used (Mb) gc trigger (Mb) limit (Mb) max used (Mb) > Ncells 217234 5.9 500000 13.4 2800 500000 13.4 > Vcells 87472 0.7 157650064 1202.8 3072 196695437 1500.7 > > > Now I'm at a loss. Maybe anyone could give me a hint where I should read > further or which Information can take me any further-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Dubravko Dolic
2005-Jun-30 07:15 UTC
[R] Memory Management under Linux: Problems to allocate large amounts of data
Dear Prof. Ripley. Thank You for Your quick answer. Your right by assuming that we run R on a 32bit System. My technician tried to install R on a emulated 64bit Opteron machine which led into some trouble. Maybe because the Opteron includes a 32bit Processor which emulates 64bit (AMD64 x86_64). As You seem to have good experience with running R on a 64bit OS I feel encouraged to have another try for this. -----Urspr??ngliche Nachricht----- Von: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] Gesendet: Mittwoch, 29. Juni 2005 15:18 An: Dubravko Dolic Cc: r-help at stat.math.ethz.ch Betreff: Re: [R] Memory Management under Linux: Problems to allocate large amounts of data Let's assume this is a 32-bit Xeon and a 32-bit OS (there are 64-bit-capable Xeons). Then a user process like R gets a 4GB address space, 1GB of which is reserved for the kernel. So R has a 3GB address space, and it is trying to allocate a 2GB contigous chunk. Because of memory fragmentation that is quite unlikely to succeed. We run 64-bit OSes on all our machines with 2GB or more RAM, for this reason. On Wed, 29 Jun 2005, Dubravko Dolic wrote:> Dear Group > > I'm still trying to bring many data into R (see older postings). After > solving some troubles with the database I do most of the work in MySQL. > But still I could be nice to work on some data using R. Therefore I can > use a dedicated Server with Gentoo Linux as OS hosting only R. This > Server is a nice machine with two CPU and 4GB RAM which should do the > job: > > Dual Intel XEON 3.06 GHz > 4 x 1 GB RAM PC2100 CL2 > HP Proliant DL380-G3 > > I read the R-Online help on memory issues and the article on garbage > collection from the R-News 01-2001 (Luke Tierney). Also the FAQ and some > newsgroup postings were very helpful on understanding memory issues > using R. > > Now I try to read data from a database. The data I wanted to read > consists of 158902553 rows and one field (column) and is of type > bigint(20) in the database. I received the message that R could not > allocate the 2048000 Kb (almost 2GB) sized vector. As I have 4BG of RAM > I could not imagine why this happened. In my understanding R under Linux > (32bit) should be able to use the full RAM. As there is not much space > used by OS and R as such ("free" shows the use of app. 670 MB after > dbSendQuery and fetch) there are 3GB to be occupied by R. Is that > correct?Not really. The R executable code and the Ncells are already in the address space, and this is a virtual memory OS, so the amount of RAM is not relevant (it would still be a 3GB limit with 12GB of RAM).> After that I started R by setting n/vsize explicitly > > R --min-vsize=10M --max-vsize=3G --min-nsize=500k --max-nsize=100M > >> mem.limits() > nsize vsize > 104857600 NA > > and received the same message. > > > A garbage collection delivered the following information: > >> gc() > used (Mb) gc trigger (Mb) limit (Mb) max used (Mb) > Ncells 217234 5.9 500000 13.4 2800 500000 13.4 > Vcells 87472 0.7 157650064 1202.8 3072 196695437 1500.7 > > > Now I'm at a loss. Maybe anyone could give me a hint where I should read > further or which Information can take me any further-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Dubravko Dolic
2005-Jun-30 10:29 UTC
[R] Memory Management under Linux: Problems to allocate large amounts of data
Dear Peter, AMD64 and EM64T (Intel) were designed as 32bit CPUs which are able to address 64bit registers. So they are nut "pure" 64bit Systems. This is why they are much cheaper than a real 64bit machine. -----Urspr??ngliche Nachricht----- Von: pd at pubhealth.ku.dk [mailto:pd at pubhealth.ku.dk] Im Auftrag von Peter Dalgaard Gesendet: Donnerstag, 30. Juni 2005 11:48 An: Prof Brian Ripley Cc: Dubravko Dolic; r-help at stat.math.ethz.ch Betreff: Re: [R] Memory Management under Linux: Problems to allocate large amounts of data Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:> On Thu, 30 Jun 2005, Dubravko Dolic wrote: > > > Dear Prof. Ripley. > > > > Thank You for Your quick answer. Your right by assuming that we run > > R on a 32bit System. My technician tried to install R on a emulated > > 64bit Opteron machine which led into some trouble. Maybe because the > > Opteron includes a 32bit Processor which emulates 64bit (AMD64 > > x86_64). As You seem to have good experience with running R on a > > 64bit OS I feel encouraged to have another try for this.Er? What is an "emulated Opteron" machine? Opterons are 64 bit.> It should work out of the box on an Opteron Linux systen: it does for > example on FC3 and SuSE 9.x. Some earlier Linux distros for x86_64 are > not fully 64-bit, but we ran R on FC2 (although some packages could > not be installed). > > Trying to build a 32-bit version of R on FC3 does not work for me: the > wrong libgcc_s is found. (One might want a 32-bit version for speed > on small tasks.)On FC4 it is even easier: "yum install R R-devel" gets you a working R 2.1.1 straight away (from Fedora Extras). Only if you want to include hardcore optimized BLAS or do not like the performance hit of having R as a shared library do you need to compile at all. -- O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907