Dear List: I am very much a unix neophyte, but recently had a Ubuntu box installed in my office. I commonly use Windows XP with 3 GB RAM on my machine and the Ubuntu machine is exactly the same as my windows box (e.g., processor and RAM) as far as I can tell. Now, I recently had to run a very large lmer analysis using my windows machine, but was unable to due to memory limitations, even after increasing all the memory limits in R (which I think is a 2gig max according to the FAQ for windows). So, to make this computationally feasible, I had to sample from my very big data set and then run the analysis. Even still, it would take something on the order of 45 mins to 1 hr to get parameter estimates. (BTW, SAS Proc nlmixed was even worse and kept giving execution errors until the data set was very small and then it ran for a long time) However, I just ran the same analysis on the Ubuntu machine with the full, complete data set, which is very big and lmer gave me back parameter estimates in less than 5 minutes. Because I have so little experience with Ubuntu, I am quite pleased and would like to understand this a bit better. Does this occur because R is a bit friendlier with unix somehow? Or, is this occuring because unix somehow has more efficient methods for memory allocation? I wish I knew enough to even ask the right questions. So, I welcome any enlightenment members may add.
My naive understanding of this (I switched to Ubuntu a year ago from WinXP for similar reasons) is that Ubuntu as an OS uses less memory than WinXP, thus leaving more memory for computation, swap space, etc. In other words, Ubuntu is "lighter" than XP on system resources. Abhijit Doran, Harold wrote:> Dear List: > > I am very much a unix neophyte, but recently had a Ubuntu box installed > in my office. I commonly use Windows XP with 3 GB RAM on my machine and > the Ubuntu machine is exactly the same as my windows box (e.g., > processor and RAM) as far as I can tell. > > Now, I recently had to run a very large lmer analysis using my windows > machine, but was unable to due to memory limitations, even after > increasing all the memory limits in R (which I think is a 2gig max > according to the FAQ for windows). So, to make this computationally > feasible, I had to sample from my very big data set and then run the > analysis. Even still, it would take something on the order of 45 mins to > 1 hr to get parameter estimates. (BTW, SAS Proc nlmixed was even worse > and kept giving execution errors until the data set was very small and > then it ran for a long time) > > However, I just ran the same analysis on the Ubuntu machine with the > full, complete data set, which is very big and lmer gave me back > parameter estimates in less than 5 minutes. > > Because I have so little experience with Ubuntu, I am quite pleased and > would like to understand this a bit better. Does this occur because R is > a bit friendlier with unix somehow? Or, is this occuring because unix > somehow has more efficient methods for memory allocation? > > I wish I knew enough to even ask the right questions. So, I welcome any > enlightenment members may add. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Doran, Harold wrote:> Dear List: > > I am very much a unix neophyte, but recently had a Ubuntu box installed > in my office. I commonly use Windows XP with 3 GB RAM on my machine and > the Ubuntu machine is exactly the same as my windows box (e.g., > processor and RAM) as far as I can tell. > > Now, I recently had to run a very large lmer analysis using my windows > machine, but was unable to due to memory limitations, even after > increasing all the memory limits in R (which I think is a 2gig max > according to the FAQ for windows). So, to make this computationally > feasible, I had to sample from my very big data set and then run the > analysis. Even still, it would take something on the order of 45 mins to > 1 hr to get parameter estimates. (BTW, SAS Proc nlmixed was even worse > and kept giving execution errors until the data set was very small and > then it ran for a long time) > > However, I just ran the same analysis on the Ubuntu machine with the > full, complete data set, which is very big and lmer gave me back > parameter estimates in less than 5 minutes. > > Because I have so little experience with Ubuntu, I am quite pleased and > would like to understand this a bit better. Does this occur because R is > a bit friendlier with unix somehow? Or, is this occuring because unix > somehow has more efficient methods for memory allocation? >Probably partly the latter and not the former (we try to make the most of what the OS offers in either case), but a more important difference is that we can run in 64 bit address space on non-Windows platforms (assuming that you run a 64 bit Ubuntu). Even with 64 bit Windows we do not have the 64 bit toolchain in place to build R except as a 32 bit program. Creating such a toolchain is beyond our reach, and although progress is being made, it is painfully slow (http://sourceforge.net/projects/mingw-w64/). Every now and then, the prospect of using commercial tools comes up, but they are not "plug-compatible" and using them would leave end users without the possibility of building packages with C code, unless they go out and buy the same toolchain.> I wish I knew enough to even ask the right questions. So, I welcome any > enlightenment members may add. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
On Tue, 22 Apr 2008, Peter Dalgaard wrote:> Doran, Harold wrote: >> Dear List: >> >> I am very much a unix neophyte, but recently had a Ubuntu box installed >> in my office. I commonly use Windows XP with 3 GB RAM on my machine and >> the Ubuntu machine is exactly the same as my windows box (e.g., >> processor and RAM) as far as I can tell. >> >> Now, I recently had to run a very large lmer analysis using my windows >> machine, but was unable to due to memory limitations, even after >> increasing all the memory limits in R (which I think is a 2gig max >> according to the FAQ for windows). So, to make this computationally >> feasible, I had to sample from my very big data set and then run the >> analysis. Even still, it would take something on the order of 45 mins to >> 1 hr to get parameter estimates. (BTW, SAS Proc nlmixed was even worse >> and kept giving execution errors until the data set was very small and >> then it ran for a long time) >> >> However, I just ran the same analysis on the Ubuntu machine with the >> full, complete data set, which is very big and lmer gave me back >> parameter estimates in less than 5 minutes. >> >> Because I have so little experience with Ubuntu, I am quite pleased and >> would like to understand this a bit better. Does this occur because R is >> a bit friendlier with unix somehow? Or, is this occuring because unix >> somehow has more efficient methods for memory allocation? >> > Probably partly the latter and not the former (we try to make the most > of what the OS offers in either case), but a more important difference > is that we can run in 64 bit address space on non-Windows platforms > (assuming that you run a 64 bit Ubuntu). > > Even with 64 bit Windows we do not have the 64 bit toolchain in place to > build R except as a 32 bit program. Creating such a toolchain is beyond > our reach, and although progress is being made, it is painfully slow > (http://sourceforge.net/projects/mingw-w64/). Every now and then, the > prospect of using commercial tools comes up, but they are not > "plug-compatible" and using them would leave end users without the > possibility of building packages with C code, unless they go out and buy > the same toolchain.There is another possibility. lmer is heavy on matrix algebra, and so usually benefits considerably from an optimized BLAS. Under Windows you need to download one of those on CRAN (or build your own). I believe that under Ubuntu R will make use of one if it is already installed. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On Tue, 22 Apr 2008, Doran, Harold wrote:> Dear List: > > I am very much a unix neophyte, but recently had a Ubuntu box installed > in my office. I commonly use Windows XP with 3 GB RAM on my machine and > the Ubuntu machine is exactly the same as my windows box (e.g., > processor and RAM) as far as I can tell. > > Now, I recently had to run a very large lmer analysis using my windows > machine, but was unable to due to memory limitations, even after > increasing all the memory limits in R (which I think is a 2gig max > according to the FAQ for windows). So, to make this computationally > feasible, I had to sample from my very big data set and then run the > analysis. Even still, it would take something on the order of 45 mins to > 1 hr to get parameter estimates. (BTW, SAS Proc nlmixed was even worse > and kept giving execution errors until the data set was very small and > then it ran for a long time) > > However, I just ran the same analysis on the Ubuntu machine with the > full, complete data set, which is very big and lmer gave me back > parameter estimates in less than 5 minutes. > > Because I have so little experience with Ubuntu, I am quite pleased and > would like to understand this a bit better. Does this occur because R is > a bit friendlier with unix somehow? Or, is this occuring because unix > somehow has more efficient methods for memory allocation?On the same hardware the differences between windows and linux performance are generally minor, but there are many things that can cause very poor performance on either platform.> I wish I knew enough to even ask the right questions. So, I welcome any > enlightenment members may add.I have seen very big differences in performance on computational benchmarks for hardware with similar basic specifications (CPU type and clock, RAM, etc). Often the difference is a symptom of broken hardware or some misconfiguration. Can you see the difference in performance in other applications? Here are some things to consider: 1. anti-virus scanning and other background tasks -- I've seen systems configured to scan gigabyte network drives. Windows task manager and linux "top", etc. can give an idea of what is using a lot of CPU, but they are not so helpful if the issue involves I/O bottlenecks. 2. incorrect hardware configuration in the system BIOS. This happens far too often, even with big name vendors. I like to run some benchmarks on every new system to make sure there aren't some basic configuration errors, and to have as a reference if I suspect problems after the systems have been in use. 3. network problems. Where I work, so PC's (both linux and Windows) get the ethernet duplex setting wrong when booted. This can result in poor performance when using networked disks without other symptoms. On windows, the "repair network connection" button often clears the problem. On linux, ethtool can display and change ethernet settings. 4. all sorts of hardware issues -- sometimes useful data appear in the system logs. Use "event viewer" on Windows, look at /var/log/messages and /var/log/dmesg on linux. 5. does the slow system exhibit a lot more disk activity? Sometimes this is hard to detect, but most systems do provide some statistics. Try running some I/O intensive benchmark at the same time your R job is running. -- George N. White III <aa056 at chebucto.ns.ca>