Displaying 4 results from an estimated 4 matches for "vec_madd".
Did you mean:
vec_add
2005 Dec 02
0
run time assembler patch for altivec, sse + bug fixes
...SQb = vec_ld(0, b);
for (i = 0; i < len; i+=8)
{
a += 4;
LSQa = vec_ld(0, a);
vec_a = vec_perm(MSQa, LSQa, maska);
b += 4;
LSQb = vec_ld(0, b);
vec_b = vec_perm(MSQb, LSQb, maskb);
vec_result = vec_madd(vec_a, vec_b, vec_result);
a += 4;
MSQa = vec_ld(0, a);
vec_a = vec_perm(LSQa, MSQa, maska);
b += 4;
MSQb = vec_ld(0, b);
vec_b = vec_perm(LSQb, MSQb, maskb);
vec_result = vec_madd(vec_a, vec_b, vec_result);...
2004 Aug 06
6
[PATCH] Make SSE Run Time option.
...for (i = 0; i < len; i+=8) {
a += 4;
LSQa = vec_ld(0, a);
vec_a = vec_perm(MSQa, LSQa, maska);
b += 4;
LSQb = vec_ld(0, b);
vec_b = vec_perm(MSQb, LSQb, maskb);
vec_result = vec_madd(vec_a, vec_b, vec_result);
a += 4;
MSQa = vec_ld(0, a);
vec_a = vec_perm(LSQa, MSQa, maska);
b += 4;
MSQb = vec_ld(0, b);
vec_b = vec_perm(LSQb, MSQb, maskb);
vec_result = vec_m...
2004 Aug 06
2
[PATCH] Make SSE Run Time option. Add Win32 SSE code
Jean-Marc,
>I'm still not sure I get it. On an Athlon XP, I can do something like
>"mulps xmm0, xmm1", which means that the xmm registers are indeed
>supported. Besides, without the xmm registers, you can't use much of
>SSE.
In the Atholon XP 2400+ that we have in our QA lab (Win2000 ) if you run
that code it generates an Illegal Instruction Error. In addition,
2004 Aug 06
0
[PATCH] Make SSE Run Time option.
...Try this:
vector float a0 = vec_ld( 0, a );
vector float a1 = vec_ld( 15, a );
vector float b0 = vec_ld( 0, b );
vector float b1 = vec_ld( 15, b );
a0 = vec_perm( a0, a1, vec_lvsl( 0, a ) );
b0 = vec_perm( b0, b1, vec_lvsl( 0, b ) );
a0 = vec_madd( a0, b0, (vector float) vec_splat_u32(0) ) ;
a0 = vec_add( a0, vec_sld( a0, a0, 8 ) );
a0 = vec_add( a0, vec_sld( a0, a0, 4 ) );
vec_ste( a0, 0, &sum );
return sum;
Please note that dot products of simple vector floats are usually faster
in the scalar units. Th...