----- Original Message -----> From: "Duncan Sands" <duncan.sands at gmail.com> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: llvmdev at cs.uiuc.edu > Sent: Friday, October 5, 2012 12:10:03 PM > Subject: Re: TSVC/Equivalencing-dbl > > Oops, I ran the testsuite wrong: read clang output for dragonegg > output.Okay, can you resummarize? Do you mean that? gcc -O0: S1421 0.00 16000 gcc -O0 under valgrind: S1421 0.00 17208.404325315 clang: S1421 0.00 17208.404325315 This is all on Darwin, right? I would certainly tend to suspect an 80-bit-intermediate issue, but, both gcc and clang give 16000 on PowerPC (which has no 80-bit). It could be a rounding issue, but would Darwin really have a different default rounding mode? The computation being performed here is [in s1421() in tsc.inc]: for (int i = 0; i < LEN/2; i++) { b[i] = xx[i] + a[i]; } So *if* we're adding up the same numbers in the same order, the answer should be the same everywhere ;) Can you put in some print statements and confirm? Thanks again, Hal>-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
Hi Hal, On 05/10/12 20:32, Hal Finkel wrote:> ----- Original Message ----- >> From: "Duncan Sands" <duncan.sands at gmail.com> >> To: "Hal Finkel" <hfinkel at anl.gov> >> Cc: llvmdev at cs.uiuc.edu >> Sent: Friday, October 5, 2012 12:10:03 PM >> Subject: Re: TSVC/Equivalencing-dbl >> >> Oops, I ran the testsuite wrong: read clang output for dragonegg >> output. > > Okay, can you resummarize? Do you mean that? > > gcc -O0: > S1421 0.00 16000 > > gcc -O0 under valgrind: > S1421 0.00 17208.404325315 > > clang: > S1421 0.00 17208.404325315exactly. For "clang" this is only when building like the testsuite does (i.e. with link-time optimization + llc): if you directly do: clang tsc.c dummy.c -std=gnu99 -O3 then you get 16000.> > This is all on Darwin, right?No, this is on x86-64 (ubuntu) linux.> > I would certainly tend to suspect an 80-bit-intermediate issue, but, both gcc and clang give 16000 on PowerPC (which has no 80-bit).Not sure what you are saying here. The issue is the x86 internally uses 80 bits for the 64 bit (double) type, so as long as everything is in registers you get lots more precision, but the moment you store to memory only 64 bits are stored. The fact that gcc and clang give the same on powerpc confirms that it is coming from x86 using an extra 16 bits of precision beyond what you would expect. It could be a rounding issue, but would Darwin really have a different default rounding mode? As I'm seeing this on linux, I guess not :)> > The computation being performed here is [in s1421() in tsc.inc]: > for (int i = 0; i < LEN/2; i++) { > b[i] = xx[i] + a[i]; > }> So *if* we're adding up the same numbers in the same order, the answer should be the same everywhere ;)No, why would it be the same everywhere? If the whole thing is done in double registers, and x86 processor will maintain 80 bits of precision even though these are 64 bit (double) types, while if things are loaded and stored to memory at every step instead then only 64 bits will be used. This can lead to very different results. Can you put in some print statements and confirm? Not sure what you want me to confirm, but anyway I now have 1/2 an hour to look into this some more :) Ciao, Duncan.> > Thanks again, > Hal > >> >
PS: Here's how I can reproduce with clang on linux: clang -S -o tsc.ll -O0 -flto -std=gnu99 tsc.c ; clang -S -o dummy.ll -O0 -flto -std=gnu99 dummy.c ; opt -std-compile-opts tsc.ll -S -o tsc.1.ll ; opt -std-compile-opts dummy.ll -S -o dummy.1.ll ; llvm-link tsc.1.ll dummy.1.ll -S -o total.ll ; opt -std-link-opts total.ll -S -o total.1.ll ; llc total.1.ll ; gcc -o z total.1.s The program z shows the problem. Note that it is essential to have clang use -O0 (not -O3). Ciao, Duncan.
----- Original Message -----> From: "Duncan Sands" <duncan.sands at gmail.com> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: llvmdev at cs.uiuc.edu > Sent: Friday, October 5, 2012 2:50:06 PM > Subject: Re: TSVC/Equivalencing-dbl > > Hi Hal, > > On 05/10/12 20:32, Hal Finkel wrote: > > ----- Original Message ----- > >> From: "Duncan Sands" <duncan.sands at gmail.com> > >> To: "Hal Finkel" <hfinkel at anl.gov> > >> Cc: llvmdev at cs.uiuc.edu > >> Sent: Friday, October 5, 2012 12:10:03 PM > >> Subject: Re: TSVC/Equivalencing-dbl > >> > >> Oops, I ran the testsuite wrong: read clang output for dragonegg > >> output. > > > > Okay, can you resummarize? Do you mean that? > > > > gcc -O0: > > S1421 0.00 16000 > > > > gcc -O0 under valgrind: > > S1421 0.00 17208.404325315 > > > > clang: > > S1421 0.00 17208.404325315 > > exactly. For "clang" this is only when building like the testsuite > does > (i.e. with link-time optimization + llc): if you directly do: > clang tsc.c dummy.c -std=gnu99 -O3 > then you get 16000. > > > > > This is all on Darwin, right? > > No, this is on x86-64 (ubuntu) linux.OIC, interesting!> > > > > I would certainly tend to suspect an 80-bit-intermediate issue, > > but, both gcc and clang give 16000 on PowerPC (which has no > > 80-bit). > > Not sure what you are saying here. The issue is the x86 internally > uses 80 bits > for the 64 bit (double) type, so as long as everything is in > registers you get > lots more precision, but the moment you store to memory only 64 bits > are stored. > The fact that gcc and clang give the same on powerpc confirms that it > is coming > from x86 using an extra 16 bits of precision beyond what you would > expect. > > It could be a rounding issue, but would Darwin really have a > different default > rounding mode? > > As I'm seeing this on linux, I guess not :) > > > > > The computation being performed here is [in s1421() in tsc.inc]: > > for (int i = 0; i < LEN/2; i++) { > > b[i] = xx[i] + a[i]; > > } > > > > So *if* we're adding up the same numbers in the same order, the > > answer should be the same everywhere ;) > > No, why would it be the same everywhere? If the whole thing is done > in > double registers, and x86 processor will maintain 80 bits of > precision > even though these are 64 bit (double) types, while if things are > loaded > and stored to memory at every step instead then only 64 bits will be > used. > This can lead to very different results.Right.> > Can you put in some print statements and confirm? > > Not sure what you want me to confirm, but anyway I now have 1/2 an > hour to > look into this some more :)For test s1421, we have: for (int i = 0; i < LEN/2; i++) { b[i] = xx[i] + a[i]; } in this case xx is set to the second half of the b array. a is initialized to 1/(i+1)^2. The b array, however, does not seem to be explicitly initialized for this test. When all of the tests are run in order, it is initialized for the last test in the previous group, s353... so maybe I screwed this up in breaking apart the tests. Thanks again, Hal> > Ciao, Duncan. > > > > > Thanks again, > > Hal > > > >> > > > >-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
Hi, There was a out of bound array access in the test S1421. This is fixed and uploaded at TSVC site by the TSVC maintainers. With this fix and Hal's fix of proper initialization of arrays in broken tests, the test should work fine now. Regards, Shivaram -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Duncan Sands Sent: Saturday, October 06, 2012 1:39 AM To: Hal Finkel Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] TSVC/Equivalencing-dbl PS: Here's how I can reproduce with clang on linux: clang -S -o tsc.ll -O0 -flto -std=gnu99 tsc.c ; clang -S -o dummy.ll -O0 -flto -std=gnu99 dummy.c ; opt -std-compile-opts tsc.ll -S -o tsc.1.ll ; opt -std-compile-opts dummy.ll -S -o dummy.1.ll ; llvm-link tsc.1.ll dummy.1.ll -S -o total.ll ; opt -std-link-opts total.ll -S -o total.1.ll ; llc total.1.ll ; gcc -o z total.1.s The program z shows the problem. Note that it is essential to have clang use -O0 (not -O3). Ciao, Duncan. _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev