thr3ads.net - search: "vec

run time assembler patch for altivec, sse + bug fixes

2005 Dec 02

0

run time assembler patch for altivec, sse + bug fixes

...SQa, LSQa, MSQb, LSQb; __vector unsigned char maska, maskb; __vector float vec_a, vec_b; __vector float vec_result; vec_result = (__vector float)vec_splat_u8(0); if ((!a_aligned) && (!b_aligned)) { // This (unfortunately) is the common case. maska = vec_lvsl(0, a); maskb = vec_lvsl(0, b); MSQa = vec_ld(0, a); MSQb = vec_ld(0, b); for (i = 0; i < len; i+=8) { a += 4; LSQa = vec_ld(0, a); vec_a = vec_perm(MSQa, LSQa, maska); b += 4; LSQb = vec_ld(0, b...

[PATCH] Make SSE Run Time option.

2004 Aug 06

6

[PATCH] Make SSE Run Time option.

...__vector unsigned char maska, maskb; __vector float vec_a, vec_b; __vector float vec_result; vec_result = (__vector float)vec_splat_u8(0); if ((!a_aligned) && (!b_aligned)) { // This (unfortunately) is the common case. maska = vec_lvsl(0, a); maskb = vec_lvsl(0, b); MSQa = vec_ld(0, a); MSQb = vec_ld(0, b); for (i = 0; i < len; i+=8) { a += 4; LSQa = vec_ld(0, a); vec_a = vec_perm(MSQa, LSQa, maska); b +=...

[PATCH] Make SSE Run Time option.

2004 Aug 06

0

[PATCH] Make SSE Run Time option.

...Likewise, all that branching is probably going to cause more trouble than it saves. Try this: vector float a0 = vec_ld( 0, a ); vector float a1 = vec_ld( 15, a ); vector float b0 = vec_ld( 0, b ); vector float b1 = vec_ld( 15, b ); a0 = vec_perm( a0, a1, vec_lvsl( 0, a ) ); b0 = vec_perm( b0, b1, vec_lvsl( 0, b ) ); a0 = vec_madd( a0, b0, (vector float) vec_splat_u32(0) ) ; a0 = vec_add( a0, vec_sld( a0, a0, 8 ) ); a0 = vec_add( a0, vec_sld( a0, a0, 4 ) ); vec_ste( a0, 0, &sum ); return sum; Please note...

[PATCH] Make SSE Run Time option. Add Win32 SSE code

2004 Aug 06

2

[PATCH] Make SSE Run Time option. Add Win32 SSE code

Jean-Marc, >I'm still not sure I get it. On an Athlon XP, I can do something like >"mulps xmm0, xmm1", which means that the xmm registers are indeed >supported. Besides, without the xmm registers, you can't use much of >SSE. In the Atholon XP 2400+ that we have in our QA lab (Win2000 ) if you run that code it generates an Illegal Instruction Error. In addition,

A couple of points about flac 1.1.1 on ppc/linux/altivec

2005 Jan 29

4

A couple of points about flac 1.1.1 on ppc/linux/altivec

On Thu, 27 Jan 2005, John Steele Scott wrote: > That looks fine to me as well. However, the best solution is something which > Luca suggested a few months ago, which is to use the functions defined in > altivec.h. These are C functions which map directly to Altivec machine > instructions. I am willing to help out, but I don't find the current lpc_asm.s > very easy to follow, and

search for: vec_lvsl