search for: vec_ld

Displaying 5 results from an estimated 5 matches for "vec_ld".

2005 Dec 02
0
run time assembler patch for altivec, sse + bug fixes
...__vector float vec_a, vec_b; __vector float vec_result; vec_result = (__vector float)vec_splat_u8(0); if ((!a_aligned) && (!b_aligned)) { // This (unfortunately) is the common case. maska = vec_lvsl(0, a); maskb = vec_lvsl(0, b); MSQa = vec_ld(0, a); MSQb = vec_ld(0, b); for (i = 0; i < len; i+=8) { a += 4; LSQa = vec_ld(0, a); vec_a = vec_perm(MSQa, LSQa, maska); b += 4; LSQb = vec_ld(0, b); vec_b = vec_perm(MSQb, LSQb, maskb);...
2004 Aug 06
2
[PATCH] Make SSE Run Time option. Add Win32 SSE code
Jean-Marc, >I'm still not sure I get it. On an Athlon XP, I can do something like >"mulps xmm0, xmm1", which means that the xmm registers are indeed >supported. Besides, without the xmm registers, you can't use much of >SSE. In the Atholon XP 2400+ that we have in our QA lab (Win2000 ) if you run that code it generates an Illegal Instruction Error. In addition,
2004 Aug 06
6
[PATCH] Make SSE Run Time option.
...; __vector float vec_result; vec_result = (__vector float)vec_splat_u8(0); if ((!a_aligned) && (!b_aligned)) { // This (unfortunately) is the common case. maska = vec_lvsl(0, a); maskb = vec_lvsl(0, b); MSQa = vec_ld(0, a); MSQb = vec_ld(0, b); for (i = 0; i < len; i+=8) { a += 4; LSQa = vec_ld(0, a); vec_a = vec_perm(MSQa, LSQa, maska); b += 4; LSQb = vec_ld(0, b); vec_b = vec_p...
2005 Jan 29
4
A couple of points about flac 1.1.1 on ppc/linux/altivec
On Thu, 27 Jan 2005, John Steele Scott wrote: > That looks fine to me as well. However, the best solution is something which > Luca suggested a few months ago, which is to use the functions defined in > altivec.h. These are C functions which map directly to Altivec machine > instructions. I am willing to help out, but I don't find the current lpc_asm.s > very easy to follow, and
2004 Aug 06
0
[PATCH] Make SSE Run Time option.
...el SSE2 instructions > #define CPU_MODE_ALTIVEC 64 // PowerPC Altivec support. You may wish to save space for PNI. http://cedar.intel.com/media/pdf/PNI_LEGAL3.pdf Likewise, all that branching is probably going to cause more trouble than it saves. Try this: vector float a0 = vec_ld( 0, a ); vector float a1 = vec_ld( 15, a ); vector float b0 = vec_ld( 0, b ); vector float b1 = vec_ld( 15, b ); a0 = vec_perm( a0, a1, vec_lvsl( 0, a ) ); b0 = vec_perm( b0, b1, vec_lvsl( 0, b ) ); a0 = vec_madd( a0, b0, (vector float) vec_splat_u3...