I just remembered an anomalous result that I stumbled upon whilst tweaking the
command-line options to llvm-gcc. Specifically, the -msse3 flag does a great
job improving the performance of floating point intensive code on the
SciMark2 benchmark but it also degrades the performance of the int-intensive
Monte Carlo part of the test:
$ llvm-gcc -Wall -lm -O3 *.c -o scimark2
$ ./scimark2
Using 2.00 seconds min time per kenel.
Composite Score: 432.84
FFT Mflops: 358.90 (N=1024)
SOR Mflops: 473.45 (100 x 100)
MonteCarlo: Mflops: 210.54
Sparse matmult Mflops: 354.25 (N=1000, nz=5000)
LU Mflops: 767.04 (M=100, N=100)
$ llvm-gcc -Wall -lm -O3 -msse3 *.c -o scimark2
$ ./scimark2
Composite Score: 548.53
FFT Mflops: 609.87 (N=1024)
SOR Mflops: 497.92 (100 x 100)
MonteCarlo: Mflops: 126.62
Sparse matmult Mflops: 604.02 (N=1000, nz=5000)
LU Mflops: 904.19 (M=100, N=100)
The relevant code is:
double Random_nextDouble(Random R)
{
int k;
int I = R->i;
int J = R->j;
int *m = R->m;
k = m[I] - m[J];
if (k < 0) k += m1;
R->m[J] = k;
if (I == 0)
I = 16;
else I--;
R->i = I;
if (J == 0)
J = 16 ;
else J--;
R->j = J;
if (R->haveRange)
return R->left + dm1 * (double) k * R->width;
else
return dm1 * (double) k;
}
double MonteCarlo_integrate(int Num_samples)
{
Random R = new_Random_seed(SEED);
int under_curve = 0;
int count;
for (count=0; count<Num_samples; count++)
{
double x= Random_nextDouble(R);
double y= Random_nextDouble(R);
if ( x*x + y*y <= 1.0)
under_curve ++;
}
Random_delete(R);
return ((double) under_curve / Num_samples) * 4.0;
}
--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e
On Fri, Jan 30, 2009 at 5:43 PM, Jon Harrop <jon at ffconsultancy.com> wrote:> > I just remembered an anomalous result that I stumbled upon whilst tweaking the > command-line options to llvm-gcc. Specifically, the -msse3 flagThe -msse3 flag? Does the -msse2 flag have a similar effect? -Eli
On Saturday 31 January 2009 03:42:04 Eli Friedman wrote:> On Fri, Jan 30, 2009 at 5:43 PM, Jon Harrop <jon at ffconsultancy.com> wrote: > > I just remembered an anomalous result that I stumbled upon whilst > > tweaking the command-line options to llvm-gcc. Specifically, the -msse3 > > flag > > The -msse3 flag? Does the -msse2 flag have a similar effect?Yes: $ llvm-gcc -Wall -lm -O3 -msse2 *.c -o scimark2 $ ./scimark2 Composite Score: 525.99 FFT Mflops: 538.35 (N=1024) SOR Mflops: 472.29 (100 x 100) MonteCarlo: Mflops: 120.92 Sparse matmult Mflops: 585.14 (N=1000, nz=5000) LU Mflops: 913.27 (M=100, N=100) But -msse does not: $ llvm-gcc -Wall -lm -O3 -msse *.c -o scimark2 $ ./scimark2 Composite Score: 540.08 FFT Mflops: 535.04 (N=1024) SOR Mflops: 469.99 (100 x 100) MonteCarlo: Mflops: 197.38 Sparse matmult Mflops: 587.77 (N=1000, nz=5000) LU Mflops: 910.22 (M=100, N=100) That was x64 and I get similar results for x86. Is there some kind of contention between the integer and SSE registers? -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e