thr3ads.net - search: "vec

[PATCH 0/7] PowerPC64 performance improvements

2018 Jul 10

9

[PATCH 0/7] PowerPC64 performance improvements

The following series adds initial vector support for PowerPC64. On POWER9, flac --best is about 3.3x faster. Amitay Isaacs (2): Add m4 macro to check for C __attribute__ features Check if compiler supports target attribute on ppc64 Anton Blanchard (5): configure.ac: Remove SPE detection code configure.ac: Add VSX enable/disable configure.ac: Fix FLAC__CPU_PPC on little endian, and add

run time assembler patch for altivec, sse + bug fixes

2005 Dec 02

0

run time assembler patch for altivec, sse + bug fixes

...{ // This (unfortunately) is the common case. maska = vec_lvsl(0, a); maskb = vec_lvsl(0, b); MSQa = vec_ld(0, a); MSQb = vec_ld(0, b); for (i = 0; i < len; i+=8) { a += 4; LSQa = vec_ld(0, a); vec_a = vec_perm(MSQa, LSQa, maska); b += 4; LSQb = vec_ld(0, b); vec_b = vec_perm(MSQb, LSQb, maskb); vec_result = vec_madd(vec_a, vec_b, vec_result); a += 4; MSQa = vec_ld(0, a); vec_a = vec_perm(LSQa, MSQa, maska);...

[PATCH] Make SSE Run Time option.

2004 Aug 06

6

[PATCH] Make SSE Run Time option.

...) is the common case. maska = vec_lvsl(0, a); maskb = vec_lvsl(0, b); MSQa = vec_ld(0, a); MSQb = vec_ld(0, b); for (i = 0; i < len; i+=8) { a += 4; LSQa = vec_ld(0, a); vec_a = vec_perm(MSQa, LSQa, maska); b += 4; LSQb = vec_ld(0, b); vec_b = vec_perm(MSQb, LSQb, maskb); vec_result = vec_madd(vec_a, vec_b, vec_result); a += 4; MSQa = vec_ld(0, a); vec_a = vec_p...

[PATCH] Make SSE Run Time option. Add Win32 SSE code

2004 Aug 06

2

[PATCH] Make SSE Run Time option. Add Win32 SSE code

Jean-Marc, >I'm still not sure I get it. On an Athlon XP, I can do something like >"mulps xmm0, xmm1", which means that the xmm registers are indeed >supported. Besides, without the xmm registers, you can't use much of >SSE. In the Atholon XP 2400+ that we have in our QA lab (Win2000 ) if you run that code it generates an Illegal Instruction Error. In addition,

A couple of points about flac 1.1.1 on ppc/linux/altivec

2005 Jan 29

4

A couple of points about flac 1.1.1 on ppc/linux/altivec

On Thu, 27 Jan 2005, John Steele Scott wrote: > That looks fine to me as well. However, the best solution is something which > Luca suggested a few months ago, which is to use the functions defined in > altivec.h. These are C functions which map directly to Altivec machine > instructions. I am willing to help out, but I don't find the current lpc_asm.s > very easy to follow, and

[PATCH] Make SSE Run Time option.

2004 Aug 06

0

[PATCH] Make SSE Run Time option.

...pdf/PNI_LEGAL3.pdf Likewise, all that branching is probably going to cause more trouble than it saves. Try this: vector float a0 = vec_ld( 0, a ); vector float a1 = vec_ld( 15, a ); vector float b0 = vec_ld( 0, b ); vector float b1 = vec_ld( 15, b ); a0 = vec_perm( a0, a1, vec_lvsl( 0, a ) ); b0 = vec_perm( b0, b1, vec_lvsl( 0, b ) ); a0 = vec_madd( a0, b0, (vector float) vec_splat_u32(0) ) ; a0 = vec_add( a0, vec_sld( a0, a0, 8 ) ); a0 = vec_add( a0, vec_sld( a0, a0, 4 ) ); vec_ste( a0, 0, &sum ); return...

search for: vec_perm