[This is a follow up on gcc3 vs. gcc4 discussion. Background: R
benchmark tests ( http://www.sciviews.org/benchmark/index.html ) show
a dramatic difference in "Escoufier's method on a 37x37 matrix
(mixed)" test when comparing binaries for PowerPC compiled with gcc3
vs gcc4.]
On Oct 16, 2006, at 11:29 AM, Ren? J.V. Bertin wrote:
> Anyway, it has nothing to do with the G4 optimisations, as the
> generic 2.4.0 on CRAN also shows the same performance drop.
>
Thanks for the example. I think I have a clarification on this. On a
higher level it's happening in "do_cov", but the underlying issue
is
the use of "long double" computations. First the results:
The timings I get (on 2xG5 2.7GHz) are:
gcc3: 0.8s
gcc4: 4.5s (dynamic libgcc)
gcc4: 4.2s (static libgcc)
Basically any calls that use long double will be affected:
qadd: 4.5s (gcc3 opt), 6.7s (Agcc4 opt), 7.4s (gcc3), 7.9s (gcc4 opt
+dyngcc), 10.5s (Agcc4), 10.6s (gcc4 dyngcc)
(this test basically runs 500x 1M long double additions on an array -
it's even more extreme if you run it on short arrays : 250kx1k will
give 2s on gcc3 and 7.7s on gcc4)
Now, the actual reason is that gcc3 simply ignores "long double" and
performs all computation using regular double precision (sizeof(long
double)=8 in gcc3 and 16 in gcc4). What this means is that you lose
precision in gcc3. To illustrate the impact, changing "long double"
to "double" in gcc4 will bring the 250kx1k test down from 7.7s to
2.1s which is almost the same as gcc3.
Thus, restricting R to double computations I get for the 37x37 test
with gcc 4.0.3:
gcc4nld: 0.7s
which is actually even faster than the gcc3 result.
Attached you will find the R benchmarks 2.3 results (ran with R
2.4.0) - there is pretty much no difference between the binaries
except for the 37x37 test and the explanation is above.
Cheers,
Simon
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: rbench24.txt
Url:
https://stat.ethz.ch/pipermail/r-devel/attachments/20061017/e2067bcc/attachment-0004.txt