search for: pirch

Displaying 3 results from an estimated 3 matches for "pirch".

Did you mean: pinch
2004 Aug 06
0
[PATCH] Make SSE Run Time option. Add Win32 SSE code
...SE also requires 16-byte alignment for most instructions (except movups, which is slow anyway). That's why I have those kludges with the pointer masks in the current code. I think we should find a general solution for the problem. Also, there's one place (inner_prod, called by the open-loop pirch estimator) where non 16-byte-aligned loads are really required. It's probably possible to work around that, but it might require 4 copies of the data (with 4-byte offsets). > ALIGN(16) unsigned int myVar; > or > static ALIGN(16) float myArray[16]; I think the ALIGN...
2004 Aug 06
2
[PATCH] Make SSE Run Time option. Add Win32 SSE code
Jean-Marc, There is a big difference between SSE and SSEFP. The SSEFP means that the CPU supports the xmm registers. All Intel chips with SSE support do, however no current 32 bit AMD chips support the XMM registers. They will support the SSE instructions but not those registers. You are right about the SSE2 not being used. The AMD Opterons are the first AMD CPU's which support
2004 Aug 06
2
[PATCH] Make SSE Run Time option. Add Win32 SSE code
...16-byte alignment for most instructions >(except movups, which is slow anyway). That's why I have those kludges >with the pointer masks in the current code. I think we should find a >general solution for the problem. Also, there's one place (inner_prod, >called by the open-loop pirch estimator) where non 16-byte-aligned loads >are really required. It's probably possible to work around that, but it >might require 4 copies of the data (with 4-byte offsets). Agreed, although the inner_prod isn't that big a deal since you can do clever vector swaps in Altivec to redu...