Hi Hal, I was looking into why this fails with dragonegg, and noticed the following: if I compile with GCC (-O0) then I get as output: Running each loop 3125 times... Loop Time(Sec) Checksum S421 0.00 32010.620068485 S1421 0.00 16000 S422 0.00 3.7377231414078 S423 0.00 32000.736895702 S424 0.00 32822.36069424 This is the same as the reference output. If I run exactly the same program under valgrind then I get: Running each loop 3125 times... Loop Time(Sec) Checksum S421 0.00 32010.620068485 S1421 0.00 17208.404325315 S422 0.00 3.7377231414078 S423 0.00 32000.736895702 S424 0.00 32822.36069424 This is the same except for the S1421 line. When built in the testsuite with dragonegg (which means optimized) I get: Running each loop 3125 times... Loop Time(Sec) Checksum S421 0.00 32010.620068485 S1421 0.00 17208.404325315 S422 0.00 3.7377231414078 S423 0.00 32000.736895702 S424 0.00 32822.36069424 Which is *exactly* the same as when using valgrind! Interestingly, the main difference between valgrind emulated floating point and the real behaviour of the processor is that valgrind doesn't support 80 bit extended precision floating point: it does everything in 64 bits instead. So I wonder if these differences are basically due to whether operations are going in and out of memory (-> 64 bits) or using 80 bit precision, or something else that may change rounding... Any thoughts? Ciao, Duncan.
Oops, I ran the testsuite wrong: read clang output for dragonegg output.
----- Original Message -----> From: "Duncan Sands" <duncan.sands at gmail.com> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: llvmdev at cs.uiuc.edu > Sent: Friday, October 5, 2012 12:10:03 PM > Subject: Re: TSVC/Equivalencing-dbl > > Oops, I ran the testsuite wrong: read clang output for dragonegg > output.Okay, can you resummarize? Do you mean that? gcc -O0: S1421 0.00 16000 gcc -O0 under valgrind: S1421 0.00 17208.404325315 clang: S1421 0.00 17208.404325315 This is all on Darwin, right? I would certainly tend to suspect an 80-bit-intermediate issue, but, both gcc and clang give 16000 on PowerPC (which has no 80-bit). It could be a rounding issue, but would Darwin really have a different default rounding mode? The computation being performed here is [in s1421() in tsc.inc]: for (int i = 0; i < LEN/2; i++) { b[i] = xx[i] + a[i]; } So *if* we're adding up the same numbers in the same order, the answer should be the same everywhere ;) Can you put in some print statements and confirm? Thanks again, Hal>-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory