Displaying 2 results from an estimated 2 matches for "dotpr".
Did you mean:
otpr
2004 Aug 06
0
[PATCH] Make SSE Run Time option.
...y to do it. It is essentially what you'd do to
make longer vector dot products such as your 40-160 sample dots work
quickly. Do them as 4 parallel partial vector dots and then sum across the
vector containing the four results. On MacOS X, there is also a hand tuned
dot product in vecLib/vDSP.h, dotpr(), if you'd rather just call that.
Personally, I don't think much of PNI. The complex arithmetic stuff they
added sets you up for a lot of permute overhead that is inefficient --
especially on a processor that is already weak on permute. In my opinion,
its a big ISA trojan horse. The bett...
2004 Aug 06
2
[PATCH] Make SSE Run Time option.
> Please note that dot products of simple vector floats are usually
> faster
> in the scalar units. The add across and transfer to scalar is just too
> expensive.
Or do four at once, with some shuffling (which is basically free);
almost the same code as a 4x4 matrix/vector multiply.
<p>Segher
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: