Displaying 4 results from an estimated 4 matches for "addsubp".
Did you mean:
addsub
2004 Aug 06
2
[PATCH] Make SSE Run Time option.
...; especially on a processor that is already weak on permute. In my opinion,
> >
> > Actually, the new instructions make it possible to do complex multiplies
> > without the need to permute and separate the add and subtract. The
> > really useful instruction here is the "addsubps".
>
> Would you like to prove it with a code sample?
I suppose if I make such a demand that it would only be sporting if I
provide what I believe to be the more efficient competing method that uses
only SSE/SSE2. Double precision is shown. For Single precision simply
replace all "...
2004 Aug 06
0
[PATCH] Make SSE Run Time option.
Actually, I'm not denying you can do pretty fast complex multiplies by
separating real from imaginary. What I'm saying is that with addsubps,
you can do a better job when you have the complex numbers packed, then
you can do with SSE1 only. I still think AMD got it better with its
pfpnacc instruction and Intel should have gone much further.
<p>Le jeu 15/01/2004 à 19:28, Ian Ollmann a écrit :
> On Thu, 15 Jan 2004, Ian Ollmann...
2004 Aug 06
5
[PATCH] Make SSE Run Time option.
...ute overhead that is inefficient --
> especially on a processor that is already weak on permute. In my opinion,
Actually, the new instructions make it possible to do complex multiplies
without the need to permute and separate the add and subtract. The
really useful instruction here is the "addsubps".
> I find it hard to believe you will never need SSE2. There are some
> instructions that are legitimately useful to single precision floating
> point work, such as cvtps2dq and cvttps2dq.
There are so few conversions in Speex in the first place that it's not
even bothering w...
2004 Aug 06
2
[PATCH] Make SSE Run Time option.
> Please note that dot products of simple vector floats are usually
> faster
> in the scalar units. The add across and transfer to scalar is just too
> expensive.
Or do four at once, with some shuffling (which is basically free);
almost the same code as a 4x4 matrix/vector multiply.
<p>Segher
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: