Scott Gilpin
2005-Jun-10 20:57 UTC
[R] Performance difference between 32-bit build and 64-bit build on Solaris 8
Hi everyone - I'm seeing a 32-bit build perform significantly faster (up to 3x) than a 64 bit build on Solaris 8. I'm running R version 2.1.0. Here are some of my system details, and some resulting timings:>uname -aSunOS lonetree 5.8 Generic_117350-16 sun4u sparc SUNW,Sun-Fire-V440 lonetree /home/sgilpin >gcc -v Reading specs from /usr/local/lib/gcc/sparc-sun-solaris2.8/3.4.2/specs Configured with: ../configure --with-as=/usr/ccs/bin/as --with-ld=/usr/ccs/bin/ld --disable-nls Thread model: posix gcc version 3.4.2 I built the 32 bit version of R with no changes to config.site. I built the 64 bit version with the following in config.site: CC="gcc -m64" FFLAGS="-m64 -g -02" LDFLAGS="-L/usr/local/lib/sparcv9 -L/usr/local/lib" CXXFLAGS="-m64 -g -02" neither build uses a BLAS. Both builds are installed on the same machine, and the same disk. The machine has virtually no load; R is one of the only processes running during these timings: First comparison: solve on a large matrix>echo 'set.seed(1);M<-matrix(rnorm(9e6),3e3);system.time(solve(M))' |/disk/loneres01/R-2.1.0-32bit/bin/R -q --vanilla> set.seed(1);M<-matrix(rnorm(9e6),3e3);system.time(solve(M))[1] 713.45 0.38 713.93 0.00 0.00>>echo 'set.seed(1);M<-matrix(rnorm(9e6),3e3);system.time(solve(M))' |/disk/loneres01/R-2.1.0-64bit/bin/R -q --vanilla> set.seed(1);M<-matrix(rnorm(9e6),3e3);system.time(solve(M))[1] 2277.05 0.31 2278.38 0.00 0.00>Second comparison: linear regression lonetree /home/sgilpin/R >echo 'set.seed(1); y<-matrix(rnorm(10000*500),500); x<-matrix(runif(500*100),500); system.time(fit<-lm(y~x))' | /disk/loneres01/R-2.1.0-32bit/bin/R -q --vanilla> set.seed(1);y<-matrix(rnorm(10000*500),500);x<-matrix(runif(500*100),500);system.time(fit<-lm(y~x))[1] 23.34 0.80 24.17 0.00 0.00>lonetree /home/sgilpin/R >echo 'set.seed(1); y<-matrix(rnorm(10000*500),500); x<-matrix(runif(500*100),500); system.time(fit<-lm(y~x))' | /disk/loneres01/R-2.1.0-64bit/bin/R -q --vanilla> set.seed(1);y<-matrix(rnorm(10000*500),500);x<-matrix(runif(500*100),500);system.time(fit<-lm(y~x))[1] 55.34 0.70 56.21 0.00 0.00>Final comparison: stats-Ex.R (from R-devel) lonetree /home/sgilpin/R >time /disk/loneres01/R-2.1.0-32bit/bin/R -q --vanilla CMD BATCH stats-Ex.R real 1m4.042s user 0m47.400s sys 0m10.390s lonetree /home/sgilpin/R >time /disk/loneres01/R-2.1.0-64bit/bin/R -q --vanilla CMD BATCH stats-Ex.R real 1m20.017s user 1m3.590s sys 0m10.130s I've seen Prof. Ripley and others comment that a 64 bit build will be a little slower because the pointers are larger, and gc() will take longer, but these timings seem out of this range. Any thoughts?
Prof Brian Ripley
2005-Jun-11 07:14 UTC
[R] Performance difference between 32-bit build and 64-bit build on Solaris 8
Your tests are of problems where you really should be using an optimized BLAS. But because those pointers are twice the size, the L1 cache will hold half as many and so I am not surprised at a factor of three on a naive implementation. For linear algebra on large matrices the key to good performance is to keep L1 cache misses to a minimum. Using SunPerf and a 1000x1000 problem I got 32-bit [1] 4.99 0.03 5.02 0.00 0.00 64-bit [1] 5.25 0.03 5.29 0.00 0.00 and for your regression problem 32-bit [1] 24.97 0.96 26.15 0.00 0.00 64-bit [1] 26.25 1.06 27.52 0.00 0.00 So the moral appears to be to take the advice in the R-admin manual and tune your linear algebra system. On Fri, 10 Jun 2005, Scott Gilpin wrote:> Hi everyone - > > I'm seeing a 32-bit build perform significantly faster (up to 3x) than > a 64 bit build on Solaris 8. I'm running R version 2.1.0. Here are > some of my system details, and some resulting timings: > >> uname -a > SunOS lonetree 5.8 Generic_117350-16 sun4u sparc SUNW,Sun-Fire-V440 > > lonetree /home/sgilpin >gcc -v > Reading specs from /usr/local/lib/gcc/sparc-sun-solaris2.8/3.4.2/specs > Configured with: ../configure --with-as=/usr/ccs/bin/as > --with-ld=/usr/ccs/bin/ld --disable-nls > Thread model: posix > gcc version 3.4.2 > > I built the 32 bit version of R with no changes to config.site. I > built the 64 bit version with the following in config.site: > > CC="gcc -m64" > FFLAGS="-m64 -g -02" > LDFLAGS="-L/usr/local/lib/sparcv9 -L/usr/local/lib" > CXXFLAGS="-m64 -g -02" > > neither build uses a BLAS. Both builds are installed on the same > machine, and the same disk. The machine has virtually no load; R is > one of the only processes running during these timings: > > First comparison: solve on a large matrix > >> echo 'set.seed(1);M<-matrix(rnorm(9e6),3e3);system.time(solve(M))' | > /disk/loneres01/R-2.1.0-32bit/bin/R -q --vanilla >> set.seed(1);M<-matrix(rnorm(9e6),3e3);system.time(solve(M)) > [1] 713.45 0.38 713.93 0.00 0.00 >> > >> echo 'set.seed(1);M<-matrix(rnorm(9e6),3e3);system.time(solve(M))' | > /disk/loneres01/R-2.1.0-64bit/bin/R -q --vanilla >> set.seed(1);M<-matrix(rnorm(9e6),3e3);system.time(solve(M)) > [1] 2277.05 0.31 2278.38 0.00 0.00 >> > > Second comparison: linear regression > > lonetree /home/sgilpin/R >echo 'set.seed(1); > y<-matrix(rnorm(10000*500),500); > x<-matrix(runif(500*100),500); > system.time(fit<-lm(y~x))' | /disk/loneres01/R-2.1.0-32bit/bin/R -q --vanilla >> set.seed(1);y<-matrix(rnorm(10000*500),500);x<-matrix(runif(500*100),500);system.time(fit<-lm(y~x)) > [1] 23.34 0.80 24.17 0.00 0.00 >> > > lonetree /home/sgilpin/R >echo 'set.seed(1); > y<-matrix(rnorm(10000*500),500); > x<-matrix(runif(500*100),500); > system.time(fit<-lm(y~x))' | /disk/loneres01/R-2.1.0-64bit/bin/R -q --vanilla >> set.seed(1);y<-matrix(rnorm(10000*500),500);x<-matrix(runif(500*100),500);system.time(fit<-lm(y~x)) > [1] 55.34 0.70 56.21 0.00 0.00 >> > > Final comparison: stats-Ex.R (from R-devel) > lonetree /home/sgilpin/R >time /disk/loneres01/R-2.1.0-32bit/bin/R -q > --vanilla CMD BATCH stats-Ex.R > > real 1m4.042s > user 0m47.400s > sys 0m10.390s > lonetree /home/sgilpin/R >time /disk/loneres01/R-2.1.0-64bit/bin/R -q > --vanilla CMD BATCH stats-Ex.R > > real 1m20.017s > user 1m3.590s > sys 0m10.130s > > I've seen Prof. Ripley and others comment that a 64 bit build will be > a little slower because the pointers are larger, and gc() will take > longer, but these timings seem out of this range. > > Any thoughts? > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595