Displaying 5 results from an estimated 5 matches for "vec_ld".
2005 Dec 02
0
run time assembler patch for altivec, sse + bug fixes
...__vector float vec_a, vec_b;
__vector float vec_result;
vec_result = (__vector float)vec_splat_u8(0);
if ((!a_aligned) && (!b_aligned))
{
// This (unfortunately) is the common case.
maska = vec_lvsl(0, a);
maskb = vec_lvsl(0, b);
MSQa = vec_ld(0, a);
MSQb = vec_ld(0, b);
for (i = 0; i < len; i+=8)
{
a += 4;
LSQa = vec_ld(0, a);
vec_a = vec_perm(MSQa, LSQa, maska);
b += 4;
LSQb = vec_ld(0, b);
vec_b = vec_perm(MSQb, LSQb, maskb);...
2004 Aug 06
2
[PATCH] Make SSE Run Time option. Add Win32 SSE code
Jean-Marc,
>I'm still not sure I get it. On an Athlon XP, I can do something like
>"mulps xmm0, xmm1", which means that the xmm registers are indeed
>supported. Besides, without the xmm registers, you can't use much of
>SSE.
In the Atholon XP 2400+ that we have in our QA lab (Win2000 ) if you run
that code it generates an Illegal Instruction Error. In addition,
2004 Aug 06
6
[PATCH] Make SSE Run Time option.
...;
__vector float vec_result;
vec_result = (__vector float)vec_splat_u8(0);
if ((!a_aligned) && (!b_aligned)) {
// This (unfortunately) is the common case.
maska = vec_lvsl(0, a);
maskb = vec_lvsl(0, b);
MSQa = vec_ld(0, a);
MSQb = vec_ld(0, b);
for (i = 0; i < len; i+=8) {
a += 4;
LSQa = vec_ld(0, a);
vec_a = vec_perm(MSQa, LSQa, maska);
b += 4;
LSQb = vec_ld(0, b);
vec_b = vec_p...
2005 Jan 29
4
A couple of points about flac 1.1.1 on ppc/linux/altivec
On Thu, 27 Jan 2005, John Steele Scott wrote:
> That looks fine to me as well. However, the best solution is something which
> Luca suggested a few months ago, which is to use the functions defined in
> altivec.h. These are C functions which map directly to Altivec machine
> instructions. I am willing to help out, but I don't find the current lpc_asm.s
> very easy to follow, and
2004 Aug 06
0
[PATCH] Make SSE Run Time option.
...el SSE2 instructions
> #define CPU_MODE_ALTIVEC 64 // PowerPC Altivec support.
You may wish to save space for PNI.
http://cedar.intel.com/media/pdf/PNI_LEGAL3.pdf
Likewise, all that branching is probably going to cause more trouble than
it saves. Try this:
vector float a0 = vec_ld( 0, a );
vector float a1 = vec_ld( 15, a );
vector float b0 = vec_ld( 0, b );
vector float b1 = vec_ld( 15, b );
a0 = vec_perm( a0, a1, vec_lvsl( 0, a ) );
b0 = vec_perm( b0, b1, vec_lvsl( 0, b ) );
a0 = vec_madd( a0, b0, (vector float) vec_splat_u3...