Displaying 5 results from an estimated 5 matches for "vec_lvsl".
2005 Dec 02
0
run time assembler patch for altivec, sse + bug fixes
...SQa, LSQa, MSQb, LSQb;
__vector unsigned char maska, maskb;
__vector float vec_a, vec_b;
__vector float vec_result;
vec_result = (__vector float)vec_splat_u8(0);
if ((!a_aligned) && (!b_aligned))
{
// This (unfortunately) is the common case.
maska = vec_lvsl(0, a);
maskb = vec_lvsl(0, b);
MSQa = vec_ld(0, a);
MSQb = vec_ld(0, b);
for (i = 0; i < len; i+=8)
{
a += 4;
LSQa = vec_ld(0, a);
vec_a = vec_perm(MSQa, LSQa, maska);
b += 4;
LSQb = vec_ld(0, b...
2004 Aug 06
6
[PATCH] Make SSE Run Time option.
...__vector unsigned char maska, maskb;
__vector float vec_a, vec_b;
__vector float vec_result;
vec_result = (__vector float)vec_splat_u8(0);
if ((!a_aligned) && (!b_aligned)) {
// This (unfortunately) is the common case.
maska = vec_lvsl(0, a);
maskb = vec_lvsl(0, b);
MSQa = vec_ld(0, a);
MSQb = vec_ld(0, b);
for (i = 0; i < len; i+=8) {
a += 4;
LSQa = vec_ld(0, a);
vec_a = vec_perm(MSQa, LSQa, maska);
b +=...
2004 Aug 06
0
[PATCH] Make SSE Run Time option.
...Likewise, all that branching is probably going to cause more trouble than
it saves. Try this:
vector float a0 = vec_ld( 0, a );
vector float a1 = vec_ld( 15, a );
vector float b0 = vec_ld( 0, b );
vector float b1 = vec_ld( 15, b );
a0 = vec_perm( a0, a1, vec_lvsl( 0, a ) );
b0 = vec_perm( b0, b1, vec_lvsl( 0, b ) );
a0 = vec_madd( a0, b0, (vector float) vec_splat_u32(0) ) ;
a0 = vec_add( a0, vec_sld( a0, a0, 8 ) );
a0 = vec_add( a0, vec_sld( a0, a0, 4 ) );
vec_ste( a0, 0, &sum );
return sum;
Please note...
2004 Aug 06
2
[PATCH] Make SSE Run Time option. Add Win32 SSE code
Jean-Marc,
>I'm still not sure I get it. On an Athlon XP, I can do something like
>"mulps xmm0, xmm1", which means that the xmm registers are indeed
>supported. Besides, without the xmm registers, you can't use much of
>SSE.
In the Atholon XP 2400+ that we have in our QA lab (Win2000 ) if you run
that code it generates an Illegal Instruction Error. In addition,
2005 Jan 29
4
A couple of points about flac 1.1.1 on ppc/linux/altivec
On Thu, 27 Jan 2005, John Steele Scott wrote:
> That looks fine to me as well. However, the best solution is something which
> Luca suggested a few months ago, which is to use the functions defined in
> altivec.h. These are C functions which map directly to Altivec machine
> instructions. I am willing to help out, but I don't find the current lpc_asm.s
> very easy to follow, and