thr3ads.net - llvm dev - [LLVMdev] -msse3 can degrade performance [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Jon Harrop

2009-Jan-31 01:43 UTC

[LLVMdev] -msse3 can degrade performance

I just remembered an anomalous result that I stumbled upon whilst tweaking the 
command-line options to llvm-gcc. Specifically, the -msse3 flag does a great 
job improving the performance of floating point intensive code on the 
SciMark2 benchmark but it also degrades the performance of the int-intensive 
Monte Carlo part of the test:

$ llvm-gcc -Wall -lm -O3 *.c -o scimark2
$ ./scimark2
Using       2.00 seconds min time per kenel.
Composite Score:          432.84
FFT             Mflops:   358.90    (N=1024)
SOR             Mflops:   473.45    (100 x 100)
MonteCarlo:     Mflops:   210.54
Sparse matmult  Mflops:   354.25    (N=1000, nz=5000)
LU              Mflops:   767.04    (M=100, N=100)

$ llvm-gcc -Wall -lm -O3 -msse3 *.c -o scimark2
$ ./scimark2
Composite Score:          548.53
FFT             Mflops:   609.87    (N=1024)
SOR             Mflops:   497.92    (100 x 100)
MonteCarlo:     Mflops:   126.62
Sparse matmult  Mflops:   604.02    (N=1000, nz=5000)
LU              Mflops:   904.19    (M=100, N=100)

The relevant code is:

  double Random_nextDouble(Random R)
  {
      int k;
  
      int I = R->i;
      int J = R->j;
      int *m = R->m;
  
      k = m[I] - m[J];
      if (k < 0) k += m1;
      R->m[J] = k;
  
      if (I == 0)
          I = 16;
      else I--;
      R->i = I;
  
      if (J == 0)
          J = 16 ;
      else J--;
      R->j = J;
  
      if (R->haveRange)
          return  R->left +  dm1 * (double) k * R->width;
      else
          return dm1 * (double) k;
  
  }

  double MonteCarlo_integrate(int Num_samples)
  {
      Random R = new_Random_seed(SEED);

      int under_curve = 0;
      int count;

      for (count=0; count<Num_samples; count++)
      {
          double x= Random_nextDouble(R);
          double y= Random_nextDouble(R);

          if ( x*x + y*y <= 1.0)
                under_curve ++;
      }

      Random_delete(R);

      return ((double) under_curve / Num_samples) * 4.0;
  }

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

Eli Friedman

2009-Jan-31 03:42 UTC

head link

[LLVMdev] -msse3 can degrade performance

On Fri, Jan 30, 2009 at 5:43 PM, Jon Harrop <jon at ffconsultancy.com>
wrote:>
> I just remembered an anomalous result that I stumbled upon whilst tweaking
the
> command-line options to llvm-gcc. Specifically, the -msse3 flag
The -msse3 flag?  Does the -msse2 flag have a similar effect?

-Eli

Jon Harrop

2009-Jan-31 05:43 UTC

head link

[LLVMdev] -msse3 can degrade performance

On Saturday 31 January 2009 03:42:04 Eli Friedman wrote:> On Fri, Jan 30, 2009 at 5:43 PM, Jon Harrop <jon at
ffconsultancy.com> wrote:
> > I just remembered an anomalous result that I stumbled upon whilst
> > tweaking the command-line options to llvm-gcc. Specifically, the
-msse3
> > flag
>
> The -msse3 flag?  Does the -msse2 flag have a similar effect?
Yes:

$ llvm-gcc -Wall -lm -O3 -msse2 *.c -o scimark2
$ ./scimark2
Composite Score:          525.99
FFT             Mflops:   538.35    (N=1024)
SOR             Mflops:   472.29    (100 x 100)
MonteCarlo:     Mflops:   120.92
Sparse matmult  Mflops:   585.14    (N=1000, nz=5000)
LU              Mflops:   913.27    (M=100, N=100)

But -msse does not:

$ llvm-gcc -Wall -lm -O3 -msse *.c -o scimark2
$ ./scimark2
Composite Score:          540.08
FFT             Mflops:   535.04    (N=1024)
SOR             Mflops:   469.99    (100 x 100)
MonteCarlo:     Mflops:   197.38
Sparse matmult  Mflops:   587.77    (N=1000, nz=5000)
LU              Mflops:   910.22    (M=100, N=100)

That was x64 and I get similar results for x86.

Is there some kind of contention between the integer and SSE registers?

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Jan 2009 - [LLVMdev] -msse3 can degrade performance

[LLVMdev] -msse3 can degrade performance

[LLVMdev] -msse3 can degrade performance

[LLVMdev] -msse3 can degrade performance

Maybe Matching Threads