Displaying 6 results from an estimated 6 matches for "vec_perm".
2018 Jul 10
9
[PATCH 0/7] PowerPC64 performance improvements
The following series adds initial vector support for PowerPC64.
On POWER9, flac --best is about 3.3x faster.
Amitay Isaacs (2):
Add m4 macro to check for C __attribute__ features
Check if compiler supports target attribute on ppc64
Anton Blanchard (5):
configure.ac: Remove SPE detection code
configure.ac: Add VSX enable/disable
configure.ac: Fix FLAC__CPU_PPC on little endian, and add
2005 Dec 02
0
run time assembler patch for altivec, sse + bug fixes
...{
// This (unfortunately) is the common case.
maska = vec_lvsl(0, a);
maskb = vec_lvsl(0, b);
MSQa = vec_ld(0, a);
MSQb = vec_ld(0, b);
for (i = 0; i < len; i+=8)
{
a += 4;
LSQa = vec_ld(0, a);
vec_a = vec_perm(MSQa, LSQa, maska);
b += 4;
LSQb = vec_ld(0, b);
vec_b = vec_perm(MSQb, LSQb, maskb);
vec_result = vec_madd(vec_a, vec_b, vec_result);
a += 4;
MSQa = vec_ld(0, a);
vec_a = vec_perm(LSQa, MSQa, maska);...
2004 Aug 06
6
[PATCH] Make SSE Run Time option.
...) is the common case.
maska = vec_lvsl(0, a);
maskb = vec_lvsl(0, b);
MSQa = vec_ld(0, a);
MSQb = vec_ld(0, b);
for (i = 0; i < len; i+=8) {
a += 4;
LSQa = vec_ld(0, a);
vec_a = vec_perm(MSQa, LSQa, maska);
b += 4;
LSQb = vec_ld(0, b);
vec_b = vec_perm(MSQb, LSQb, maskb);
vec_result = vec_madd(vec_a, vec_b, vec_result);
a += 4;
MSQa = vec_ld(0, a);
vec_a = vec_p...
2004 Aug 06
2
[PATCH] Make SSE Run Time option. Add Win32 SSE code
Jean-Marc,
>I'm still not sure I get it. On an Athlon XP, I can do something like
>"mulps xmm0, xmm1", which means that the xmm registers are indeed
>supported. Besides, without the xmm registers, you can't use much of
>SSE.
In the Atholon XP 2400+ that we have in our QA lab (Win2000 ) if you run
that code it generates an Illegal Instruction Error. In addition,
2005 Jan 29
4
A couple of points about flac 1.1.1 on ppc/linux/altivec
On Thu, 27 Jan 2005, John Steele Scott wrote:
> That looks fine to me as well. However, the best solution is something which
> Luca suggested a few months ago, which is to use the functions defined in
> altivec.h. These are C functions which map directly to Altivec machine
> instructions. I am willing to help out, but I don't find the current lpc_asm.s
> very easy to follow, and
2004 Aug 06
0
[PATCH] Make SSE Run Time option.
...pdf/PNI_LEGAL3.pdf
Likewise, all that branching is probably going to cause more trouble than
it saves. Try this:
vector float a0 = vec_ld( 0, a );
vector float a1 = vec_ld( 15, a );
vector float b0 = vec_ld( 0, b );
vector float b1 = vec_ld( 15, b );
a0 = vec_perm( a0, a1, vec_lvsl( 0, a ) );
b0 = vec_perm( b0, b1, vec_lvsl( 0, b ) );
a0 = vec_madd( a0, b0, (vector float) vec_splat_u32(0) ) ;
a0 = vec_add( a0, vec_sld( a0, a0, 8 ) );
a0 = vec_add( a0, vec_sld( a0, a0, 4 ) );
vec_ste( a0, 0, &sum );
return...