I just remembered an anomalous result that I stumbled upon whilst tweaking the command-line options to llvm-gcc. Specifically, the -msse3 flag does a great job improving the performance of floating point intensive code on the SciMark2 benchmark but it also degrades the performance of the int-intensive Monte Carlo part of the test: $ llvm-gcc -Wall -lm -O3 *.c -o scimark2 $ ./scimark2 Using 2.00 seconds min time per kenel. Composite Score: 432.84 FFT Mflops: 358.90 (N=1024) SOR Mflops: 473.45 (100 x 100) MonteCarlo: Mflops: 210.54 Sparse matmult Mflops: 354.25 (N=1000, nz=5000) LU Mflops: 767.04 (M=100, N=100) $ llvm-gcc -Wall -lm -O3 -msse3 *.c -o scimark2 $ ./scimark2 Composite Score: 548.53 FFT Mflops: 609.87 (N=1024) SOR Mflops: 497.92 (100 x 100) MonteCarlo: Mflops: 126.62 Sparse matmult Mflops: 604.02 (N=1000, nz=5000) LU Mflops: 904.19 (M=100, N=100) The relevant code is: double Random_nextDouble(Random R) { int k; int I = R->i; int J = R->j; int *m = R->m; k = m[I] - m[J]; if (k < 0) k += m1; R->m[J] = k; if (I == 0) I = 16; else I--; R->i = I; if (J == 0) J = 16 ; else J--; R->j = J; if (R->haveRange) return R->left + dm1 * (double) k * R->width; else return dm1 * (double) k; } double MonteCarlo_integrate(int Num_samples) { Random R = new_Random_seed(SEED); int under_curve = 0; int count; for (count=0; count<Num_samples; count++) { double x= Random_nextDouble(R); double y= Random_nextDouble(R); if ( x*x + y*y <= 1.0) under_curve ++; } Random_delete(R); return ((double) under_curve / Num_samples) * 4.0; } -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e
On Fri, Jan 30, 2009 at 5:43 PM, Jon Harrop <jon at ffconsultancy.com> wrote:> > I just remembered an anomalous result that I stumbled upon whilst tweaking the > command-line options to llvm-gcc. Specifically, the -msse3 flagThe -msse3 flag? Does the -msse2 flag have a similar effect? -Eli
On Saturday 31 January 2009 03:42:04 Eli Friedman wrote:> On Fri, Jan 30, 2009 at 5:43 PM, Jon Harrop <jon at ffconsultancy.com> wrote: > > I just remembered an anomalous result that I stumbled upon whilst > > tweaking the command-line options to llvm-gcc. Specifically, the -msse3 > > flag > > The -msse3 flag? Does the -msse2 flag have a similar effect?Yes: $ llvm-gcc -Wall -lm -O3 -msse2 *.c -o scimark2 $ ./scimark2 Composite Score: 525.99 FFT Mflops: 538.35 (N=1024) SOR Mflops: 472.29 (100 x 100) MonteCarlo: Mflops: 120.92 Sparse matmult Mflops: 585.14 (N=1000, nz=5000) LU Mflops: 913.27 (M=100, N=100) But -msse does not: $ llvm-gcc -Wall -lm -O3 -msse *.c -o scimark2 $ ./scimark2 Composite Score: 540.08 FFT Mflops: 535.04 (N=1024) SOR Mflops: 469.99 (100 x 100) MonteCarlo: Mflops: 197.38 Sparse matmult Mflops: 587.77 (N=1000, nz=5000) LU Mflops: 910.22 (M=100, N=100) That was x64 and I get similar results for x86. Is there some kind of contention between the integer and SSE registers? -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e