thr3ads.net - Speex dev - [speex-dev] [PATCH] Make SSE Run Time option. [Aug 2004]

If this information is useful, please help other people find it:
Share via:

Jean-Marc Valin

2004-Aug-06 15:01 UTC

[speex-dev] [PATCH] Make SSE Run Time option.

Le jeu 15/01/2004 à 15:30, Daniel Vogel a écrit :> Unrelated, but please use SSE/MMX/... intrinsics on Windows instead of
using
> inline assembly so you also get the speed benefit on Win64.
OK, so here's a first start. I've translated to intrinsics the asm I
sent 1-2 days ago. The result is about 5% slower than the pure asm
approach, so it's not too bad (SSE asm is 2x faster than x87). Note that
unlike the previous version which had a kludge to work with order 8
(required for wideband), this version only works with order 10, so it
will only work for narrowband.

<p>void filter_mem2(float *x, float *_num, float *_den, float *y, int N,
int ord, float *_mem)
{
   __m128 num[3], den[3], mem[3];
   int i;

   /* Copy numerator, denominator and memory to aligned xmm */
   for (i=0;i<2;i++)
   {
      mem[i] = _mm_loadu_ps(_mem+4*i);
      num[i] = _mm_loadu_ps(_num+4*i+1);
      den[i] = _mm_loadu_ps(_den+4*i+1);
   }
   mem[2] = _mm_setr_ps(_mem[8], _mem[9], 0, 0);
   num[2] = _mm_setr_ps(_num[9], _num[10], 0, 0);
   den[2] = _mm_setr_ps(_den[9], _den[10], 0, 0);
   
   for (i=0;i<N;i++)
   {
      __m128 xx;
      __m128 yy;
      /* Compute next filter result */
      xx = _mm_load_ps1(x+i);
      yy = _mm_add_ss(xx, mem[0]);
      _mm_store_ss(y+i, yy);
      yy = _mm_shuffle_ps(yy, yy, 0);
      
      /* Update memory */
      mem[0] = _mm_move_ss(mem[0], mem[1]);
      mem[0] = _mm_shuffle_ps(mem[0], mem[0], 0x39);

      mem[0] = _mm_add_ps(mem[0], _mm_mul_ps(xx, num[0]));
      mem[0] = _mm_sub_ps(mem[0], _mm_mul_ps(yy, den[0]));

      mem[1] = _mm_move_ss(mem[1], mem[2]);
      mem[1] = _mm_shuffle_ps(mem[1], mem[1], 0x39);

      mem[1] = _mm_add_ps(mem[1], _mm_mul_ps(xx, num[1]));
      mem[1] = _mm_sub_ps(mem[1], _mm_mul_ps(yy, den[1]));

      mem[2] = _mm_shuffle_ps(mem[2], mem[2], 0xfd);

      mem[2] = _mm_add_ps(mem[2], _mm_mul_ps(xx, num[2]));
      mem[2] = _mm_sub_ps(mem[2], _mm_mul_ps(yy, den[2]));
   }
   /* Put memory back in its place */
   _mm_storeu_ps(_mem, mem[0]);
   _mm_storeu_ps(_mem+4, mem[1]);
   _mm_store_ss(_mem+8, mem[2]);
   mem[2] = _mm_shuffle_ps(mem[2], mem[2], 0x55);
   _mm_store_ss(_mem+9, mem[2]);
}

<p>        Jean-Marc


-- 
Jean-Marc Valin, M.Sc.A., ing. jr.
LABORIUS (http://www.gel.usherb.ca/laborius)
Université de Sherbrooke, Québec, Canada


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: Ceci est une partie de message numériquement signée.
Url :
http://lists.xiph.org/pipermail/speex-dev/attachments/20040116/444ce574/signature-0001.pgp

Jean-Marc Valin

2004-Aug-06 15:01 UTC

head link

[speex-dev] [PATCH] Make SSE Run Time option.

If anyone's interested in doing some testing, I just checked in an
improved SSE implementation for filter_mem2, fir_mem2 and iir_mem2. The
implementation should also work for wideband now. Give it a try.

I'm attaching this new implementation. It's the first time I try coding
with intrinsics, so please point out any error or inefficiency. 

        Jean-Marc


-- 
Jean-Marc Valin, M.Sc.A., ing. jr.
LABORIUS (http://www.gel.usherb.ca/laborius)
Université de Sherbrooke, Québec, Canada


-------------- next part --------------
A non-text attachment was scrubbed...
Name: filters_sse.h__charset_iso-8859-1
Type: text/x-c-header
Size: 10000 bytes
Desc: filters_sse.h__charset_iso-8859-1
Url :
http://lists.xiph.org/pipermail/speex-dev/attachments/20040116/1d09d876/filters_sse-0001.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: Ceci est une partie de message numériquement signée.
Url :
http://lists.xiph.org/pipermail/speex-dev/attachments/20040116/1d09d876/signature-0001.pgp

Petr Tomasek

2004-Aug-06 15:01 UTC

head link

[speex-dev] [PATCH] Make SSE Run Time option.

On Fri, Jan 16, 2004 at 01:35:34AM -0500, Jean-Marc Valin
wrote:> Le jeu 15/01/2004 ? 15:30, Daniel Vogel a écrit :
> > Unrelated, but please use SSE/MMX/... intrinsics on Windows instead of
using
> > inline assembly so you also get the speed benefit on Win64.
> 
> OK, so here's a first start. I've translated to intrinsics the asm
I
> sent 1-2 days ago. The result is about 5% slower than the pure asm
> approach, so it's not too bad (SSE asm is 2x faster than x87). Note
that
> unlike the previous version which had a kludge to work with order 8
> (required for wideband), this version only works with order 10, so it
> will only work for narrowband.
Will this work on linux as well?
> 	Jean-Marc
> 

-- 
Petr Tomasek, http://www.etf.cuni.cz/~tomasek/

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'speex-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Jean-Marc Valin

2004-Aug-06 15:01 UTC

head link

[speex-dev] [PATCH] Make SSE Run Time option.

> > OK, so here's a first start. I've translated to intrinsics the
asm I
> > sent 1-2 days ago. The result is about 5% slower than the pure asm
> > approach, so it's not too bad (SSE asm is 2x faster than x87).
Note that
> > unlike the previous version which had a kludge to work with order 8
> > (required for wideband), this version only works with order 10, so it
> > will only work for narrowband.
> 
> Will this work on linux as well?
I'm developing this with Linux, so there's no problem for that (you need
to compile with -march=pentium3). It should also work work for Windows
(now that the inline asm has been removed), but I haven't tested. Of
course, I'd like more testing both on Linux and Windows to make sure I
haven't broken anything.

        Jean-Marc


-- 
Jean-Marc Valin, M.Sc.A., ing. jr.
LABORIUS (http://www.gel.usherb.ca/laborius)
Université de Sherbrooke, Québec, Canada


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: Ceci est une partie de message numériquement signée.
Url :
http://lists.xiph.org/pipermail/speex-dev/attachments/20040118/fe2f0c4c/signature-0001.pgp

Possibly Parallel Threads

Search for more possibly parallel threads

Speex dev - Aug 2004 - [PATCH] Make SSE Run Time option.

[speex-dev] [PATCH] Make SSE Run Time option.

[speex-dev] [PATCH] Make SSE Run Time option.

[speex-dev] [PATCH] Make SSE Run Time option.

[speex-dev] [PATCH] Make SSE Run Time option.

Possibly Parallel Threads