thr3ads.net - search: "dotpr"

Displaying 2 results from an estimated 2 matches for "dotpr".

Did you mean: otpr

2004 Aug 06

[PATCH] Make SSE Run Time option.

...y to do it. It is essentially what you'd do to make longer vector dot products such as your 40-160 sample dots work quickly. Do them as 4 parallel partial vector dots and then sum across the vector containing the four results. On MacOS X, there is also a hand tuned dot product in vecLib/vDSP.h, dotpr(), if you'd rather just call that. Personally, I don't think much of PNI. The complex arithmetic stuff they added sets you up for a lot of permute overhead that is inefficient -- especially on a processor that is already weak on permute. In my opinion, its a big ISA trojan horse. The bett...

[PATCH] Make SSE Run Time option.

2004 Aug 06

[PATCH] Make SSE Run Time option.

> Please note that dot products of simple vector floats are usually > faster > in the scalar units. The add across and transfer to scalar is just too > expensive. Or do four at once, with some shuffling (which is basically free); almost the same code as a 4x4 matrix/vector multiply. <p>Segher --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage:

search for: dotpr