thr3ads.net - search: "vspltish"

Displaying 3 results from an estimated 3 matches for "vspltish".

Did you mean: vspltisb

2004 Sep 10

Altivec Optimizations

Hi, I have been playing with Altivec, and I rewrote a couple of the routines in assembly. Looking at the archives, I noticed that there may already be some effort on this. Anyways... Right now, I have two routines working. They need to be cleaned up, made relocatable, and documented; otherwise, they seem to work fairly well. I see an overall ~27% speed improvement when encoding with the

flac-1.1.1 completely broken on linux/ppc and on macosx if built with the standard toolchain (not xcode)

2004 Oct 06

flac-1.1.1 completely broken on linux/ppc and on macosx if built with the standard toolchain (not xcode)

Sadly the latest optimization broke completely everything. The asm code isn't gas compliant. the libFLAC linker script has a typo, disabling the asm optimization and/or altivec won't let a correct build anyway. Instant fixes for the asm stuff: sed -i -e"s:;:\#:" on the lpc_asm.s to load address instead of addis+ori you could use lis and la and PLEASE use the @l(register)

altivec lpc_restore_signal

2004 Sep 10

altivec lpc_restore_signal

...d v6,v6,v18 addis r31,0,hi16(L1301) ori r31,r31,lo16(L1301) b L1199 L1107: addi r5,r5,16 lvx v19,0,r5 vperm v7,v7,v19,v17 addi r11,r11,-16 lvx v19,0,r11 vperm v15,v19,v15,v16 vand v7,v7,v18 addis r31,0,hi16(L1300) ori r31,r31,lo16(L1300) L1199: mtctr r31 ; set up invariant vectors vspltish v16,0 ; v16: zero vector li r10,-12 lvsr v17,r10,r8 ; v17: result shift vector lvsl v18,r10,r3 ; v18: residual shift back vector li r10,-4 stw r7,-4(r9) lvewx v19,r10,r9 ; v19: lp_quantization vector L1200: vmulosh v20,v0,v8 ; v20: sum vector bcctr 20,0 L1300: vmulosh v21,v7,v15 vsldo...

search for: vspltish