I'm looking to buy a new desktop which will primarily be used for analyses of large datasets (100s of MB). I've seen postings from several years back re the 'optimal' platform for running R, but nothing more recently. Specifically, I want to know: 1) if I run R under Windows, does having a dual-processor machine help speed things up? And 2) is it still true that R performs about as well under Windows as Linux? Thanks, Greg Gregory Wellenius, ScD Cardiology Research Fellow Beth Israel Deaconess Medical Center 330 Brookline Avenue, Deaconess 306 Boston, MA 02215 617-632-7680 (phone) 617-632-7698 (fax) [[alternative HTML version deleted]]
On 3/9/2006 4:47 PM, gwelleni at bidmc.harvard.edu wrote:> I'm looking to buy a new desktop which will primarily be used for > analyses of large datasets (100s of MB). I've seen postings from several > years back re the 'optimal' platform for running R, but nothing more > recently. > > > > Specifically, I want to know: 1) if I run R under Windows, does having a > dual-processor machine help speed things up? And 2) is it still true > that R performs about as well under Windows as Linux?For a big dataset, you're better off with a 64 bit version of R. There isn't one for Windows yet, and won't be for quite a while, since the build tools (gcc, etc.) don't exist. So you're probably better off in Linux. Duncan Murdoch> > > > Thanks, > > > > Greg > > > > > > Gregory Wellenius, ScD > > Cardiology Research Fellow > > Beth Israel Deaconess Medical Center > > 330 Brookline Avenue, Deaconess 306 > > Boston, MA 02215 > > 617-632-7680 (phone) > > 617-632-7698 (fax) > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
gwelleni at bidmc.harvard.edu wrote on 3/9/2006 4:47 PM:> I'm looking to buy a new desktop which will primarily be used for > analyses of large datasets (100s of MB). I've seen postings from several > years back re the 'optimal' platform for running R, but nothing more > recently. > > > Specifically, I want to know: 1) if I run R under Windows, does having a > dual-processor machine help speed things up? And 2) is it still true > that R performs about as well under Windows as Linux? >Duncan Murdoch has already answered your questions about operating systems. I would like to add that if there is significant I/O in what you are doing, fast SCSI disks are a worthwhile investment. The speed increase over ATA or SATA disks has been, in my experience, quite noticeable. -- Michael H. Prager, Ph.D. Population Dynamics Team NOAA Center for Coastal Habitat and Fisheries Research NMFS Southeast Fisheries Science Center Beaufort, North Carolina 28516 USA http://shrimp.ccfhrb.noaa.gov/~mprager/
On Thu, 9 Mar 2006, gwelleni at bidmc.harvard.edu wrote:> I'm looking to buy a new desktop which will primarily be used for > analyses of large datasets (100s of MB). I've seen postings from several > years back re the 'optimal' platform for running R, but nothing more > recently.It is a subject which comes up every few months. Many of the developers are running dual (or dual-core) Opterons/Athlon 64s under Linux these days.> Specifically, I want to know: 1) if I run R under Windows, does having a > dual-processor machine help speed things up? And 2) is it still true > that R performs about as well under Windows as Linux?Duncan Murdoch has already mentioned the 64-bit advantage if you need large datasets, but there is also a speed penalty if you do not. Your description seems on the margins (depends how many 100s and what the format is and what you want to do). One advantage of AMD64 Linux is that I can run either 32- or 64-bit versions of R and choose to have speed or space for any given task. A dual processor will be of little help in running R faster. R's interpreter is single-threaded, and although you can get some advantage in using multi-threaded BLAS libraries in large matrix computations these are not readily available for R under Windows, and the advantage is often small under Linux. Running two or more instances of R will take advantage of dual processers, and I have been running dual CPU machines for a decade. As for Windows vs Linux, R runs on the same hardware at about the same speed when comparing the standard Windows build with a shared library version on Linux (standard for e.g. the RH RPMs), but the standard Linux build is 10-20% faster. For one set of comparisons see http://sekhon.berkeley.edu/macosx/ -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
I've found memory management sometimes problematic under Windows. I've a calculation that runs without difficulty in 512MB under Mac OS X (10.3 or 10.4); I'd expect the same under Linux. Under Windows (XP professional) with 512MB, it requires a freshly booted system. But maybe the new machines will have so much memory that memory management will not be an issue. John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Mathematical Sciences Institute, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. On 10 Mar 2006, at 10:00 PM, r-help-request at stat.math.ethz.ch wrote:> From: Prof Brian Ripley <ripley at stats.ox.ac.uk> > Date: 10 March 2006 6:50:03 PM > To: gwelleni at bidmc.harvard.edu > Cc: r-help at stat.math.ethz.ch > Subject: Re: [R] Optimal platform for R > > > On Thu, 9 Mar 2006, gwelleni at bidmc.harvard.edu wrote: > >> I'm looking to buy a new desktop which will primarily be used for >> analyses of large datasets (100s of MB). I've seen postings from >> several >> years back re the 'optimal' platform for running R, but nothing more >> recently. > > It is a subject which comes up every few months. Many of the > developers are running dual (or dual-core) Opterons/Athlon 64s > under Linux these days. > >> Specifically, I want to know: 1) if I run R under Windows, does >> having a >> dual-processor machine help speed things up? And 2) is it still true >> that R performs about as well under Windows as Linux? > > Duncan Murdoch has already mentioned the 64-bit advantage if you > need large datasets, but there is also a speed penalty if you do > not. Your description seems on the margins (depends how many 100s > and what the format is and what you want to do). One advantage of > AMD64 Linux is that I can run either 32- or 64-bit versions of R > and choose to have speed or space for any given task. > > A dual processor will be of little help in running R faster. R's > interpreter is single-threaded, and although you can get some > advantage in using multi-threaded BLAS libraries in large matrix > computations these are not readily available for R under Windows, > and the advantage is often small under Linux. Running two or more > instances of R will take advantage of dual processers, and I have > been running dual CPU machines for a decade. > > As for Windows vs Linux, R runs on the same hardware at about the > same speed when comparing the standard Windows build with a shared > library version on Linux (standard for e.g. the RH RPMs), but the > standard Linux build is 10-20% faster. For one set of comparisons see > > http://sekhon.berkeley.edu/macosx/ > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 >
> John Maindonald john.maindonald at anu.edu.au wrote > Sat Mar 11 01:53:31 CET 2006 > > I've found memory management sometimes problematic under Windows. > I've a calculation that runs without difficulty in 512MB under Mac OSX> (10.3 or 10.4); I'd expect the same under Linux. Under Windows > (XP professional) with 512MB, it requires a freshly booted system. > But maybe the new machines will have so much memory that memory > management will not be an issue.I've been running a lot of generalized additive models on large datasets (mgvc package;R 2.2.0; Win XP) and I find that the first few models run much much faster than later models. A reboot seems to make things go fast again. Simply deleting everything from the root workspace doesn't help. Greg Greg Wellenius Cardiovascular Epidemiology Research Unit Beth Israel Deaconess Medical Center 330 Brookline Avenue, Deac 306 Boston, MA 02215