thr3ads.net - Speex dev - [speex-dev] Notes on 1.1.4 Windows. Testing of SSE Intrinics Code and others [Aug 2004]

If this information is useful, please help other people find it:
Share via:

Aron Rosenberg

2004-Aug-06 15:01 UTC

[speex-dev] Notes on 1.1.4 Windows. Testing of SSE Intrinics Code and others

Here are our notes on 1.1.4 testing on Windows

1. Compile Error with regular mode (FIXED_POINT undefined) at lsp.c line 104
        static inline spx_word16_t spx_cos(spx_word16_t x)    . VS6 does not
like
the inline keyword here. Removing it allows compiling.

    same with cb_search_sse.h  line 34.

2. Compile Error with quant_lsp.c  line 55.  M_PI is undefined. Either it 
needs to be included in that file or placed in a header.

3. denoise.c doesn't seem to be in tar.gz, it is in the visual studio 
project file though.

Now onto the actual SSE tests.

We ran the SSE intrinics code through some test on windows over here and 
all I can say is - it sucks. A room filled with Monkeys could generate 
better SSE code. Having stated that let me describe why.

We use Visual Studio 6, SP5 with the processor pack as the main development 
platform. For some unknown reason, it decides that it only ever wants to 
use XMM0 for its SSE operations. If it is dealing with a two paramater SSE 
call, then it will use XMM1, but thats it. Between succesive calls, it 
won't keep things in an xmm register, even if the next call is using it.

To check this, I converted some of the MMX code in our regular application 
to intrinics and it does the same thing, only uses mm0 and mm1. It actually 
runs slower than a c code version of the same function.

Now, this could be different on Visual Studio .NET and .NET 2003, but that 
is what happens with Visual Studio 6. Just so you understand, I am pasting 
below some of the generated SSE code for the fir_mem2_10 function. I got 
this by compiling the speexenc and loading it up in the debugger.

Skipped a bit of the initial function stuff the block starts inside the for 
loop. For those who don't know, Win32 asm is backwords from GCC, it 
is    OPERATION DEST, SOURCE

254:     for (i=0;i<N;i++)
255:     {
256:        __m128 xx;
257:        __m128 yy;
258:        /* Compute next filter result */
259:        xx = _mm_load_ps1(x+i);
00413483   mov         eax,dword ptr [ebp-64h]
00413486   mov         ecx,dword ptr [ebx+8]
00413489   lea         edx,[ecx+eax*4]
0041348C   movss       xmm0,dword ptr [edx]
00413490   shufps      xmm0,xmm0,0
00413494   movaps      xmmword ptr [xx],xmm0
260:        yy = _mm_add_ss(xx, mem[0]);
00413498   movaps      xmm0,xmmword ptr [ebp-60h]
0041349C   movaps      xmm1,xmmword ptr [xx]
004134A0   addss       xmm1,xmm0
004134A4   movaps      xmmword ptr [yy],xmm1
261:        _mm_store_ss(y+i, yy);
004134AB   movaps      xmm0,xmmword ptr [yy]
004134B2   mov         eax,dword ptr [ebp-64h]
004134B5   mov         ecx,dword ptr [ebx+10h]
004134B8   lea         edx,[ecx+eax*4]
004134BB   movss       dword ptr [edx],xmm0
262:        yy = _mm_shuffle_ps(yy, yy, 0);
004134BF   movaps      xmm0,xmmword ptr [yy]
004134C6   movaps      xmm1,xmmword ptr [yy]
004134CD   shufps      xmm1,xmm0,0
004134D1   movaps      xmmword ptr [yy],xmm1
263:
264:        /* Update memory */
265:        mem[0] = _mm_move_ss(mem[0], mem[1]);
004134D8   movaps      xmm0,xmmword ptr [ebp-50h]
004134DC   movaps      xmm1,xmmword ptr [ebp-60h]
004134E0   movss       xmm1,xmm0
004134E4   movaps      xmmword ptr [ebp-60h],xmm1
266:        mem[0] = _mm_shuffle_ps(mem[0], mem[0], 0x39);
004134E8   movaps      xmm0,xmmword ptr [ebp-60h]
004134EC   movaps      xmm1,xmmword ptr [ebp-60h]
004134F0   shufps      xmm1,xmm0,39h
004134F4   movaps      xmmword ptr [ebp-60h],xmm1
267:
268:        mem[0] = _mm_add_ps(mem[0], _mm_mul_ps(xx, num[0]));
004134F8   movaps      xmm0,xmmword ptr [ebp-30h]
004134FC   movaps      xmm1,xmmword ptr [xx]
00413500   mulps       xmm1,xmm0
00413503   movaps      xmm0,xmmword ptr [ebp-60h]
00413507   addps       xmm0,xmm1
0041350A   movaps      xmmword ptr [ebp-60h],xmm0
269:
270:        mem[1] = _mm_move_ss(mem[1], mem[2]);
0041350E   movaps      xmm0,xmmword ptr [ebp-40h]
00413512   movaps      xmm1,xmmword ptr [ebp-50h]
00413516   movss       xmm1,xmm0
0041351A   movaps      xmmword ptr [ebp-50h],xmm1
271:        mem[1] = _mm_shuffle_ps(mem[1], mem[1], 0x39);
0041351E   movaps      xmm0,xmmword ptr [ebp-50h]
00413522   movaps      xmm1,xmmword ptr [ebp-50h]
00413526   shufps      xmm1,xmm0,39h
0041352A   movaps      xmmword ptr [ebp-50h],xmm1
272:
273:        mem[1] = _mm_add_ps(mem[1], _mm_mul_ps(xx, num[1]));
0041352E   movaps      xmm0,xmmword ptr [ebp-20h]
00413532   movaps      xmm1,xmmword ptr [xx]
00413536   mulps       xmm1,xmm0
00413539   movaps      xmm0,xmmword ptr [ebp-50h]
0041353D   addps       xmm0,xmm1
00413540   movaps      xmmword ptr [ebp-50h],xmm0
274:
275:        mem[2] = _mm_shuffle_ps(mem[2], mem[2], 0xfd);
00413544   movaps      xmm0,xmmword ptr [ebp-40h]
00413548   movaps      xmm1,xmmword ptr [ebp-40h]
0041354C   shufps      xmm1,xmm0,0FDh
00413550   movaps      xmmword ptr [ebp-40h],xmm1
276:
277:        mem[2] = _mm_add_ps(mem[2], _mm_mul_ps(xx, num[2]));
00413554   movaps      xmm0,xmmword ptr [ebp-10h]
00413558   movaps      xmm1,xmmword ptr [xx]
0041355C   mulps       xmm1,xmm0
0041355F   movaps      xmm0,xmmword ptr [ebp-40h]
00413563   addps       xmm0,xmm1
00413566   movaps      xmmword ptr [ebp-40h],xmm0
278:     }

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'speex-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Jean-Marc Valin

2004-Aug-06 15:01 UTC

head link

[speex-dev] Notes on 1.1.4 Windows. Testing of SSE Intrinics Code and others

> 1. Compile Error with regular mode (FIXED_POINT undefined) at lsp.c line
104
> 	static inline spx_word16_t spx_cos(spx_word16_t x)    . VS6 does not like 
> the inline keyword here. Removing it allows compiling.
> 
>     same with cb_search_sse.h  line 34.
It seems like your compiler simply doesn't like "inline". I
suggest
doing a -Dinline= which is what autoconf does when it detects that the
compiler doesn't understand the inline keyword.
> 2. Compile Error with quant_lsp.c  line 55.  M_PI is undefined. Either it 
> needs to be included in that file or placed in a header.
I'll fix that.
> 3. denoise.c doesn't seem to be in tar.gz, it is in the visual studio 
> project file though.
The project file isn't up-to-date (I've never even compiled Speex in
Win32). The file's been renamed to preprocess.h
> We ran the SSE intrinics code through some test on windows over here and 
> all I can say is - it sucks. A room filled with Monkeys could generate 
> better SSE code. Having stated that let me describe why.
You mean a room filled with monkeys could generate a better compiler? :)
> We use Visual Studio 6, SP5 with the processor pack as the main development
> platform. For some unknown reason, it decides that it only ever wants to 
> use XMM0 for its SSE operations. If it is dealing with a two paramater SSE 
> call, then it will use XMM1, but thats it. Between succesive calls, it 
> won't keep things in an xmm register, even if the next call is using
it.
I just checked with gcc. gcc uses all of the xmm registers available
(should check on an Opteron, which has 16 of them). Overall, enabling
SSE can give up to 30% improvement (20% is typical).
> To check this, I converted some of the MMX code in our regular application 
> to intrinics and it does the same thing, only uses mm0 and mm1. It actually
> runs slower than a c code version of the same function.
Well, there's always the option to use gcc to generate the assembly for
the few SSE functions.
> Now, this could be different on Visual Studio .NET and .NET 2003, but that 
> is what happens with Visual Studio 6. Just so you understand, I am pasting 
> below some of the generated SSE code for the fir_mem2_10 function. I got 
> this by compiling the speexenc and loading it up in the debugger.
Yes, that code sucks. Bad. Actually, I can get the same kind of code by
turning the optimizer off in gcc (-O0). Maybe you've got it turned off
too (I think VS is unable to optimize in debug mode, is that right?).
Oterwise, VS really sucks.

        Jean-Marc


-- 
Jean-Marc Valin, M.Sc.A., ing. jr.
LABORIUS (http://www.gel.usherb.ca/laborius)
Université de Sherbrooke, Québec, Canada


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: Ceci est une partie de message numériquement signée.
Url :
http://lists.xiph.org/pipermail/speex-dev/attachments/20040121/31cd87f2/signature-0001.pgp

Aron Rosenberg

2004-Aug-06 15:01 UTC

head link

[speex-dev] Notes on 1.1.4 Windows. Testing of SSE Intrinics Code and others

Jean-Marc,

         Good catch on the debug mode. After compiling the same code in 
release mode it does appear to be using all the registers correctly. Give 
us a few days to integrate our run-time flags into 1.1.4 and I will let you 
know how are testing turns out.

Aron Rosenberg
SightSpeed

At 08:54 PM 1/21/2004, you wrote:> > 1. Compile Error with regular mode (FIXED_POINT undefined) at lsp.c 
> line 104
> >       static inline spx_word16_t spx_cos(spx_word16_t x)    . VS6 does
> not like
> > the inline keyword here. Removing it allows compiling.
> >
> >     same with cb_search_sse.h  line 34.
>
>It seems like your compiler simply doesn't like "inline". I
suggest
>doing a -Dinline= which is what autoconf does when it detects that the
>compiler doesn't understand the inline keyword.
>
> > 2. Compile Error with quant_lsp.c  line 55.  M_PI is undefined. Either
it
> > needs to be included in that file or placed in a header.
>
>I'll fix that.
>
> > 3. denoise.c doesn't seem to be in tar.gz, it is in the visual
studio
> > project file though.
>
>The project file isn't up-to-date (I've never even compiled Speex in
>Win32). The file's been renamed to preprocess.h
>
> > We ran the SSE intrinics code through some test on windows over here
and
> > all I can say is - it sucks. A room filled with Monkeys could generate
> > better SSE code. Having stated that let me describe why.
>
>You mean a room filled with monkeys could generate a better compiler? :)
>
> > We use Visual Studio 6, SP5 with the processor pack as the main 
> development
> > platform. For some unknown reason, it decides that it only ever wants
to
> > use XMM0 for its SSE operations. If it is dealing with a two paramater
SSE
> > call, then it will use XMM1, but thats it. Between succesive calls, it
> > won't keep things in an xmm register, even if the next call is
using it.
>
>I just checked with gcc. gcc uses all of the xmm registers available
>(should check on an Opteron, which has 16 of them). Overall, enabling
>SSE can give up to 30% improvement (20% is typical).
>
> > To check this, I converted some of the MMX code in our regular
application
> > to intrinics and it does the same thing, only uses mm0 and mm1. It 
> actually
> > runs slower than a c code version of the same function.
>
>Well, there's always the option to use gcc to generate the assembly for
>the few SSE functions.
>
> > Now, this could be different on Visual Studio .NET and .NET 2003, but
that
> > is what happens with Visual Studio 6. Just so you understand, I am
pasting
> > below some of the generated SSE code for the fir_mem2_10 function. I
got
> > this by compiling the speexenc and loading it up in the debugger.
>
>Yes, that code sucks. Bad. Actually, I can get the same kind of code by
>turning the optimizer off in gcc (-O0). Maybe you've got it turned off
>too (I think VS is unable to optimize in debug mode, is that right?).
>Oterwise, VS really sucks.
>
>         Jean-Marc
>
>--
>Jean-Marc Valin, M.Sc.A., ing. jr.
>LABORIUS (http://www.gel.usherb.ca/laborius)
>Université de Sherbrooke, Québec, Canada
<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'speex-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Maybe Matching Threads

Search for more apparently analagous threads

Speex dev - Aug 2004 - Notes on 1.1.4 Windows. Testing of SSE Intrinics Code and others

[speex-dev] Notes on 1.1.4 Windows. Testing of SSE Intrinics Code and others

[speex-dev] Notes on 1.1.4 Windows. Testing of SSE Intrinics Code and others

[speex-dev] Notes on 1.1.4 Windows. Testing of SSE Intrinics Code and others

Maybe Matching Threads