Displaying 3 results from an estimated 3 matches for "vspltish".
Did you mean:
vspltisb
2004 Sep 10
4
Altivec Optimizations
Hi,
I have been playing with Altivec, and I rewrote a couple of the routines
in assembly. Looking at the archives, I noticed that there may already
be some effort on this. Anyways...
Right now, I have two routines working. They need to be cleaned up,
made
relocatable, and documented; otherwise, they seem to work fairly well.
I
see an overall ~27% speed improvement when encoding with the
2004 Oct 06
3
flac-1.1.1 completely broken on linux/ppc and on macosx if built with the standard toolchain (not xcode)
Sadly the latest optimization broke completely everything.
The asm code isn't gas compliant. the libFLAC linker script has a typo,
disabling the asm optimization and/or altivec won't let a correct build
anyway.
Instant fixes for the asm stuff:
sed -i -e"s:;:\#:" on the lpc_asm.s
to load address instead of addis+ori you could use
lis and la and PLEASE use the @l(register)
2004 Sep 10
1
altivec lpc_restore_signal
...d v6,v6,v18
addis r31,0,hi16(L1301)
ori r31,r31,lo16(L1301)
b L1199
L1107:
addi r5,r5,16
lvx v19,0,r5
vperm v7,v7,v19,v17
addi r11,r11,-16
lvx v19,0,r11
vperm v15,v19,v15,v16
vand v7,v7,v18
addis r31,0,hi16(L1300)
ori r31,r31,lo16(L1300)
L1199:
mtctr r31
; set up invariant vectors
vspltish v16,0 ; v16: zero vector
li r10,-12
lvsr v17,r10,r8 ; v17: result shift vector
lvsl v18,r10,r3 ; v18: residual shift back vector
li r10,-4
stw r7,-4(r9)
lvewx v19,r10,r9 ; v19: lp_quantization vector
L1200:
vmulosh v20,v0,v8 ; v20: sum vector
bcctr 20,0
L1300:
vmulosh v21,v7,v15
vsldo...