thr3ads.net - search: "addsubp"

Displaying 4 results from an estimated 4 matches for "addsubp".

Did you mean: addsub

2004 Aug 06

[PATCH] Make SSE Run Time option.

...; especially on a processor that is already weak on permute. In my opinion, > > > > Actually, the new instructions make it possible to do complex multiplies > > without the need to permute and separate the add and subtract. The > > really useful instruction here is the "addsubps". > > Would you like to prove it with a code sample? I suppose if I make such a demand that it would only be sporting if I provide what I believe to be the more efficient competing method that uses only SSE/SSE2. Double precision is shown. For Single precision simply replace all &quot...

[PATCH] Make SSE Run Time option.

2004 Aug 06

[PATCH] Make SSE Run Time option.

Actually, I'm not denying you can do pretty fast complex multiplies by separating real from imaginary. What I'm saying is that with addsubps, you can do a better job when you have the complex numbers packed, then you can do with SSE1 only. I still think AMD got it better with its pfpnacc instruction and Intel should have gone much further. <p>Le jeu 15/01/2004 à 19:28, Ian Ollmann a écrit : > On Thu, 15 Jan 2004, Ian Ollmann...

[PATCH] Make SSE Run Time option.

2004 Aug 06

[PATCH] Make SSE Run Time option.

...ute overhead that is inefficient -- > especially on a processor that is already weak on permute. In my opinion, Actually, the new instructions make it possible to do complex multiplies without the need to permute and separate the add and subtract. The really useful instruction here is the "addsubps". > I find it hard to believe you will never need SSE2. There are some > instructions that are legitimately useful to single precision floating > point work, such as cvtps2dq and cvttps2dq. There are so few conversions in Speex in the first place that it's not even bothering w...

[PATCH] Make SSE Run Time option.

2004 Aug 06

[PATCH] Make SSE Run Time option.

> Please note that dot products of simple vector floats are usually > faster > in the scalar units. The add across and transfer to scalar is just too > expensive. Or do four at once, with some shuffling (which is basically free); almost the same code as a 4x4 matrix/vector multiply. <p>Segher --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage:

search for: addsubp