thr3ads.net - flac dev - [Flac-dev] flac-1.1.1 completely broken on linux/ppc and on macosx if built with the standard toolchain (not xcode) [Oct 2004]

If this information is useful, please help other people find it:
Share via:

Luca Barbato

2004-Oct-06 09:34 UTC

[Flac-dev] flac-1.1.1 completely broken on linux/ppc and on macosx if built with the standard toolchain (not xcode)

Sadly the latest optimization broke completely everything.

The asm code isn't gas compliant. the libFLAC linker script has a typo, 
disabling the asm optimization and/or altivec won't let a correct build 
anyway.

Instant fixes for the asm stuff:

sed -i -e"s:;:\#:" on the  lpc_asm.s
to load address instead of addis+ori you could use
lis and la and PLEASE use the @l(register) and @ha macros instead of the 
not gas supported hi/lo16().

eg:

lis 31,LABEL@ha
la 31,LABEL@l(31)


append use -mregnames as as options (remove the other)

(the option there isn't present in the as manual, and I could just try 
to figure what's supposed to do)

there are -rpath $(libdir) when -rpath,$(libdir) should be used

in src/libFLAC/Makefile
libFLAC.la: $(libFLAC_la_OBJECTS) $(libFLAC_la_DEPENDENCIES)
         $(LINK) -rpath $(libdir) $(libFLAC_la_LDFLAGS)
		$(libFLAC_la_OBJECTS) $(libFLAC_la_LIBADD) $(LIBS)


the final build still doesn't link correctly (the 2 altivec asm function 
  aren't properly linked in, I haven't the time to investigate the issue
yet=/)


More verbose report about the no altivec bug:

If I disable altivec optimization (with --disable-asm-optimizations 
--disable-altivec)

the result is:

stream_decoder.c: In function `FLAC__stream_decoder_init':
stream_decoder.c:296: error: `FLAC__lpc_restore_signal' undeclared 
(first use in this function)
stream_decoder.c:296: error: (Each undeclared identifier is reported 
only once
stream_decoder.c:296: error: for each function it appears in.)
stream_decoder.c:297: error: `FLAC__lpc_restore_signal_wide' undeclared 
(first use in this function)
make[4]: *** [stream_decoder.lo] Error 1
make[4]: Leaving directory 
`/var/tmp/portage/flac-1.1.1/work/flac-1.1.1/src/libFLAC'

Could you please at least make sure that disabling altivec optimization 
won't break the build?
That's a bit higher whish:
could you please write the altivec optimization using C intrinsics? That 
way they should work with no/little modification even on ppc 970 in 
ppc64 mode.

Regards

lu

PS:put my address in CC to the replies.

-- 
Luca Barbato
Developer
Gentoo Linux				http://www.gentoo.org/~lu_zero

Josh Coalson

2004-Oct-06 09:47 UTC

head link

[Flac-dev] flac-1.1.1 completely broken on linux/ppc and on macosx if built with the standard toolchain (not xcode)

thanks for the feedback, but it would really help if you supply
a patch (diff -c), I didn't understand all the changes you
described.

someone reported a problem with src/libFLAC/include/private/lpc.h
that was fixed in CVS and may fix the problem building with asm
disabled

http://cvs.sourceforge.net/viewcvs.py/*checkout*/flac/flac/src/libFLAC/include/private/lpc.h?rev=1.24

Josh

--- Luca Barbato <lu_zero@gentoo.org> wrote:
> Sadly the latest optimization broke completely everything.
> 
> The asm code isn't gas compliant. the libFLAC linker script has a
> typo, 
> disabling the asm optimization and/or altivec won't let a correct
> build 
> anyway.
> 
> Instant fixes for the asm stuff:
> 
> sed -i -e"s:;:\#:" on the  lpc_asm.s
> to load address instead of addis+ori you could use
> lis and la and PLEASE use the @l(register) and @ha macros instead of
> the 
> not gas supported hi/lo16().
> 
> eg:
> 
> lis 31,LABEL@ha
> la 31,LABEL@l(31)
> 
> 
> append use -mregnames as as options (remove the other)
> 
> (the option there isn't present in the as manual, and I could just
> try 
> to figure what's supposed to do)
> 
> there are -rpath $(libdir) when -rpath,$(libdir) should be used
> 
> in src/libFLAC/Makefile
> libFLAC.la: $(libFLAC_la_OBJECTS) $(libFLAC_la_DEPENDENCIES)
>          $(LINK) -rpath $(libdir) $(libFLAC_la_LDFLAGS)
> 		$(libFLAC_la_OBJECTS) $(libFLAC_la_LIBADD) $(LIBS)
> 
> 
> the final build still doesn't link correctly (the 2 altivec asm
> function 
>   aren't properly linked in, I haven't the time to investigate the
> issue 
> yet=/)
> 
> 
> More verbose report about the no altivec bug:
> 
> If I disable altivec optimization (with --disable-asm-optimizations 
> --disable-altivec)
> 
> the result is:
> 
> stream_decoder.c: In function `FLAC__stream_decoder_init':
> stream_decoder.c:296: error: `FLAC__lpc_restore_signal' undeclared 
> (first use in this function)
> stream_decoder.c:296: error: (Each undeclared identifier is reported 
> only once
> stream_decoder.c:296: error: for each function it appears in.)
> stream_decoder.c:297: error: `FLAC__lpc_restore_signal_wide'
> undeclared 
> (first use in this function)
> make[4]: *** [stream_decoder.lo] Error 1
> make[4]: Leaving directory 
> `/var/tmp/portage/flac-1.1.1/work/flac-1.1.1/src/libFLAC'
> 
> Could you please at least make sure that disabling altivec
> optimization 
> won't break the build?
> That's a bit higher whish:
> could you please write the altivec optimization using C intrinsics?
> That 
> way they should work with no/little modification even on ppc 970 in 
> ppc64 mode.
> 
> Regards
> 
> lu
> 
> PS:put my address in CC to the replies.
> 
> -- 
> Luca Barbato
> Developer
> Gentoo Linux				http://www.gentoo.org/~lu_zero


		
_______________________________
Do you Yahoo!?
Declare Yourself - Register online to vote today!
http://vote.yahoo.com

Josh Coalson

2004-Oct-09 13:46 UTC

head link

[Flac-dev] flac-1.1.1 completely broken on linux/ppc and on macosx if built with the standard toolchain (not xcode)

--- Luca Barbato <lu_zero@gentoo.org> wrote:> The asm code isn't gas compliant.
went back and looked at this again.

the lpc_asm.s in FLAC 1.1.1 compiles fine with OS X's cc, but
not with gas.  the patched one you sent doesn't compile with
the native cc.

is there a common syntax that both support?  or do we have to
have two versions of every PPC asm file?  I want the code to be
able to be built with the native OS X developer tools.

Josh



		
_______________________________
Do you Yahoo!?
Declare Yourself - Register online to vote today!
http://vote.yahoo.com

Luca Barbato

2004-Oct-15 05:10 UTC

head link

[Flac-dev] flac-1.1.1 completely broken on linux/ppc and on macosx if built with the standard toolchain (not xcode)

Josh Coalson wrote:> thanks for the feedback, but it would really help if you supply
> a patch (diff -c), I didn't understand all the changes you
> described.
I hope it helps.

lu

-- 
Luca Barbato
Developer
Gentoo Linux				http://www.gentoo.org/~lu_zero


-------------- next part --------------
*** /tmp/lpc_asm.s	Wed Oct  6 14:06:11 2004
--- src/libFLAC/ppc/lpc_asm.s	Tue Jul 27 21:32:05 2004
***************
*** 1,93 ****
! #  libFLAC - Free Lossless Audio Codec library
! #  Copyright (C) 2004  Josh Coalson
! #
! #  Redistribution and use in source and binary forms, with or without
! #  modification, are permitted provided that the following conditions
! #  are met:
! #
! #  - Redistributions of source code must retain the above copyright
! #  notice, this list of conditions and the following disclaimer.
! #
! #  - Redistributions in binary form must reproduce the above copyright
! #  notice, this list of conditions and the following disclaimer in the
! #  documentation and/or other materials provided with the distribution.
! #
! #  - Neither the name of the Xiph.org Foundation nor the names of its
! #  contributors may be used to endorse or promote products derived from
! #  this software without specific prior written permission.
! #
! #  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
! #  ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
! #  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
! #  A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR
! #  CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
! #  EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
! #  PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
! #  PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
! #  LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
! #  NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
! #  SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  
  .text
  	.align 2
  .globl _FLAC__lpc_restore_signal_asm_ppc_altivec_16
- .type _FLAC__lpc_restore_signal_asm_ppc_altivec_16, @function
- 
  .globl _FLAC__lpc_restore_signal_asm_ppc_altivec_16_order8
- .type _FLAC__lpc_restore_signal_asm_ppc_altivec_16_order8, @function
  
  _FLAC__lpc_restore_signal_asm_ppc_altivec_16:
! #	r3: residual[]
! #	r4: data_len
! #	r5: qlp_coeff[]
! #	r6: order
! #	r7: lp_quantization
! #	r8: data[]
! 
! # see src/libFLAC/lpc.c:FLAC__lpc_restore_signal()
! # these is a PowerPC/Altivec assembly version which requires bps<=16 (or
actual
! # bps<=15 for mid-side coding, since that uses an extra bit)
! 
! # these should be fast; the inner loop is unrolled (it takes no more than
! # 3*(order%4) instructions, all of which are arithmetic), and all of the
! # coefficients and all relevant history stay in registers, so the outer loop
! # has only one load from memory (the residual)
  
! # I have not yet run this through simg4, so there may be some avoidable
stalls,
! # and there may be a somewhat more clever way to do the outer loop
  
! # the branch mechanism may prevent dynamic loading; I still need to examine
! # this issue, and there may be a more elegant method
  
  	stmw r31,-4(r1)
  
  	addi r9,r1,-28
  	li r31,0xf
! 	andc r9,r9,r31 # for quadword-aligned stack data
  
! 	slwi r6,r6,2 # adjust for word size
  	slwi r4,r4,2
! 	add r4,r4,r8 # r4 = data+data_len
  
! 	mfspr r0,256 # cache old vrsave
! 	addis r31,0,0xffff
! 	ori r31,r31,0xfc00
! 	mtspr 256,r31 # declare VRs in vrsave
  
! 	cmplw cr0,r8,r4 # i<data_len
  	bc 4,0,L1400
  
! 	# load coefficients into v0-v7 and initial history into v8-v15
  	li r31,0xf
! 	and r31,r8,r31 # r31: data%4
  	li r11,16
! 	subf r31,r31,r11 # r31: 4-(data%4)
! 	slwi r31,r31,3 # convert to bits for vsro
  	li r10,-4
  	stw r31,-4(r9)
  	lvewx v0,r10,r9
  	vspltisb v18,-1
! 	vsro v18,v18,v0 # v18: mask vector
  
  	li r31,0x8
  	lvsl v0,0,r31
--- 1,90 ----
! ;  libFLAC - Free Lossless Audio Codec library
! ;  Copyright (C) 2004  Josh Coalson
! ;
! ;  Redistribution and use in source and binary forms, with or without
! ;  modification, are permitted provided that the following conditions
! ;  are met:
! ;
! ;  - Redistributions of source code must retain the above copyright
! ;  notice, this list of conditions and the following disclaimer.
! ;
! ;  - Redistributions in binary form must reproduce the above copyright
! ;  notice, this list of conditions and the following disclaimer in the
! ;  documentation and/or other materials provided with the distribution.
! ;
! ;  - Neither the name of the Xiph.org Foundation nor the names of its
! ;  contributors may be used to endorse or promote products derived from
! ;  this software without specific prior written permission.
! ;
! ;  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
! ;  ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
! ;  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
! ;  A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR
! ;  CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
! ;  EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
! ;  PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
! ;  PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
! ;  LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
! ;  NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
! ;  SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  
  .text
  	.align 2
  .globl _FLAC__lpc_restore_signal_asm_ppc_altivec_16
  .globl _FLAC__lpc_restore_signal_asm_ppc_altivec_16_order8
  
  _FLAC__lpc_restore_signal_asm_ppc_altivec_16:
! ;	r3: residual[]
! ;	r4: data_len
! ;	r5: qlp_coeff[]
! ;	r6: order
! ;	r7: lp_quantization
! ;	r8: data[]
! 
! ; see src/libFLAC/lpc.c:FLAC__lpc_restore_signal()
! ; these is a PowerPC/Altivec assembly version which requires bps<=16 (or
actual
! ; bps<=15 for mid-side coding, since that uses an extra bit)
! 
! ; these should be fast; the inner loop is unrolled (it takes no more than
! ; 3*(order%4) instructions, all of which are arithmetic), and all of the
! ; coefficients and all relevant history stay in registers, so the outer loop
! ; has only one load from memory (the residual)
  
! ; I have not yet run this through simg4, so there may be some avoidable
stalls,
! ; and there may be a somewhat more clever way to do the outer loop
  
! ; the branch mechanism may prevent dynamic loading; I still need to examine
! ; this issue, and there may be a more elegant method
  
  	stmw r31,-4(r1)
  
  	addi r9,r1,-28
  	li r31,0xf
! 	andc r9,r9,r31 ; for quadword-aligned stack data
  
! 	slwi r6,r6,2 ; adjust for word size
  	slwi r4,r4,2
! 	add r4,r4,r8 ; r4 = data+data_len
  
! 	mfspr r0,256 ; cache old vrsave
! 	addis r31,0,hi16(0xfffffc00)
! 	ori r31,r31,lo16(0xfffffc00)
! 	mtspr 256,r31 ; declare VRs in vrsave
  
! 	cmplw cr0,r8,r4 ; i<data_len
  	bc 4,0,L1400
  
! 	; load coefficients into v0-v7 and initial history into v8-v15
  	li r31,0xf
! 	and r31,r8,r31 ; r31: data%4
  	li r11,16
! 	subf r31,r31,r11 ; r31: 4-(data%4)
! 	slwi r31,r31,3 ; convert to bits for vsro
  	li r10,-4
  	stw r31,-4(r9)
  	lvewx v0,r10,r9
  	vspltisb v18,-1
! 	vsro v18,v18,v0 ; v18: mask vector
  
  	li r31,0x8
  	lvsl v0,0,r31
***************
*** 97,110 ****
  	vspltisb v2,0
  	vspltisb v3,-1
  	vmrglw v2,v2,v3
! 	vsel v0,v1,v0,v2 # v0: reversal permutation vector
  
  	add r10,r5,r6
! 	lvsl v17,0,r5 # v17: coefficient alignment permutation vector
! 	vperm v17,v17,v17,v0 # v17: reversal coefficient alignment permutation vector
  
  	mr r11,r8
! 	lvsl v16,0,r11 # v16: history alignment permutation vector
  
  	lvx v0,0,r5
  	addi r5,r5,16
--- 94,107 ----
  	vspltisb v2,0
  	vspltisb v3,-1
  	vmrglw v2,v2,v3
! 	vsel v0,v1,v0,v2 ; v0: reversal permutation vector
  
  	add r10,r5,r6
! 	lvsl v17,0,r5 ; v17: coefficient alignment permutation vector
! 	vperm v17,v17,v17,v0 ; v17: reversal coefficient alignment permutation vector
  
  	mr r11,r8
! 	lvsl v16,0,r11 ; v16: history alignment permutation vector
  
  	lvx v0,0,r5
  	addi r5,r5,16
***************
*** 117,124 ****
  	cmplw cr0,r5,r10
  	bc 12,0,L1101
  	vand v0,v0,v18
! 	addis r31,0,L1307@ha
! 	ori r31,r31,L1307@l
  	b L1199
  
  L1101:
--- 114,121 ----
  	cmplw cr0,r5,r10
  	bc 12,0,L1101
  	vand v0,v0,v18
! 	addis r31,0,hi16(L1307)
! 	ori r31,r31,lo16(L1307)
  	b L1199
  
  L1101:
***************
*** 131,138 ****
  	cmplw cr0,r5,r10
  	bc 12,0,L1102
  	vand v1,v1,v18
! 	addis r31,0,L1306@ha
! 	ori r31,r31,L1306@l
  	b L1199
  
  L1102:
--- 128,135 ----
  	cmplw cr0,r5,r10
  	bc 12,0,L1102
  	vand v1,v1,v18
! 	addis r31,0,hi16(L1306)
! 	ori r31,r31,lo16(L1306)
  	b L1199
  
  L1102:
***************
*** 145,152 ****
  	cmplw cr0,r5,r10
  	bc 12,0,L1103
  	vand v2,v2,v18
! 	lis r31,L1305@ha
! 	la r31,L1305@l(r31)
  	b L1199
  
  L1103:
--- 142,149 ----
  	cmplw cr0,r5,r10
  	bc 12,0,L1103
  	vand v2,v2,v18
! 	addis r31,0,hi16(L1305)
! 	ori r31,r31,lo16(L1305)
  	b L1199
  
  L1103:
***************
*** 159,166 ****
  	cmplw cr0,r5,r10
  	bc 12,0,L1104
  	vand v3,v3,v18
! 	lis r31,L1304@ha
! 	la r31,L1304@l(r31)
  	b L1199
  
  L1104:
--- 156,163 ----
  	cmplw cr0,r5,r10
  	bc 12,0,L1104
  	vand v3,v3,v18
! 	addis r31,0,hi16(L1304)
! 	ori r31,r31,lo16(L1304)
  	b L1199
  
  L1104:
***************
*** 173,180 ****
  	cmplw cr0,r5,r10
  	bc 12,0,L1105
  	vand v4,v4,v18
! 	lis r31,L1303@ha
! 	la r31,L1303@l(r31)
  	b L1199
  
  L1105:
--- 170,177 ----
  	cmplw cr0,r5,r10
  	bc 12,0,L1105
  	vand v4,v4,v18
! 	addis r31,0,hi16(L1303)
! 	ori r31,r31,lo16(L1303)
  	b L1199
  
  L1105:
***************
*** 187,194 ****
  	cmplw cr0,r5,r10
  	bc 12,0,L1106
  	vand v5,v5,v18
! 	lis r31,L1302@ha
! 	la r31,L1302@l(r31)
  	b L1199
  
  L1106:
--- 184,191 ----
  	cmplw cr0,r5,r10
  	bc 12,0,L1106
  	vand v5,v5,v18
! 	addis r31,0,hi16(L1302)
! 	ori r31,r31,lo16(L1302)
  	b L1199
  
  L1106:
***************
*** 201,208 ****
  	cmplw cr0,r5,r10
  	bc 12,0,L1107
  	vand v6,v6,v18
! 	lis r31,L1301@ha
! 	la r31,L1301@l(r31)
  	b L1199
  
  L1107:
--- 198,205 ----
  	cmplw cr0,r5,r10
  	bc 12,0,L1107
  	vand v6,v6,v18
! 	addis r31,0,hi16(L1301)
! 	ori r31,r31,lo16(L1301)
  	b L1199
  
  L1107:
***************
*** 213,242 ****
  	lvx v19,0,r11
  	vperm v15,v19,v15,v16
  	vand v7,v7,v18
! 	lis r31,L1300@ha
! 	la r31,L1300@l(r31)
  
  L1199:
  	mtctr r31
  
! 	# set up invariant vectors
! 	vspltish v16,0 # v16: zero vector
  
  	li r10,-12
! 	lvsr v17,r10,r8 # v17: result shift vector
! 	lvsl v18,r10,r3 # v18: residual shift back vector
  
  	li r10,-4
  	stw r7,-4(r9)
! 	lvewx v19,r10,r9 # v19: lp_quantization vector
  
  L1200:
! 	vmulosh v20,v0,v8 # v20: sum vector
  	bcctr 20,0
  
  L1300:
  	vmulosh v21,v7,v15
! 	vsldoi v15,v15,v14,4 # increment history
  	vaddsws v20,v20,v21
  
  L1301:
--- 210,239 ----
  	lvx v19,0,r11
  	vperm v15,v19,v15,v16
  	vand v7,v7,v18
! 	addis r31,0,hi16(L1300)
! 	ori r31,r31,lo16(L1300)
  
  L1199:
  	mtctr r31
  
! 	; set up invariant vectors
! 	vspltish v16,0 ; v16: zero vector
  
  	li r10,-12
! 	lvsr v17,r10,r8 ; v17: result shift vector
! 	lvsl v18,r10,r3 ; v18: residual shift back vector
  
  	li r10,-4
  	stw r7,-4(r9)
! 	lvewx v19,r10,r9 ; v19: lp_quantization vector
  
  L1200:
! 	vmulosh v20,v0,v8 ; v20: sum vector
  	bcctr 20,0
  
  L1300:
  	vmulosh v21,v7,v15
! 	vsldoi v15,v15,v14,4 ; increment history
  	vaddsws v20,v20,v21
  
  L1301:
***************
*** 270,342 ****
  	vaddsws v20,v20,v21
  
  L1307:
! 	vsumsws v20,v20,v16 # v20[3]: sum
! 	vsraw v20,v20,v19 # v20[3]: sum >> lp_quantization
  
! 	lvewx v21,0,r3 # v21[n]: *residual
! 	vperm v21,v21,v21,v18 # v21[3]: *residual
! 	vaddsws v20,v21,v20 # v20[3]: *residual + (sum >> lp_quantization)
! 	vsldoi v18,v18,v18,4 # increment shift vector
  
! 	vperm v21,v20,v20,v17 # v21[n]: shift for storage
! 	vsldoi v17,v17,v17,12 # increment shift vector
  	stvewx v21,0,r8
  
  	vsldoi v20,v20,v20,12
! 	vsldoi v8,v8,v20,4 # insert value onto history
  
  	addi r3,r3,4
  	addi r8,r8,4
! 	cmplw cr0,r8,r4 # i<data_len
  	bc 12,0,L1200
  
  L1400:
! 	mtspr 256,r0 # restore old vrsave
  	lmw r31,-4(r1)
  	blr
  
  _FLAC__lpc_restore_signal_asm_ppc_altivec_16_order8:
! #	r3: residual[]
! #	r4: data_len
! #	r5: qlp_coeff[]
! #	r6: order
! #	r7: lp_quantization
! #	r8: data[]
! 
! # see _FLAC__lpc_restore_signal_asm_ppc_altivec_16() above
! # this version assumes order<=8; it uses fewer vector registers, which
should
! # save time in context switches, and has less code, which may improve
! # instruction caching
  
  	stmw r31,-4(r1)
  
  	addi r9,r1,-28
  	li r31,0xf
! 	andc r9,r9,r31 # for quadword-aligned stack data
  
! 	slwi r6,r6,2 # adjust for word size
  	slwi r4,r4,2
! 	add r4,r4,r8 # r4 = data+data_len
  
! 	mfspr r0,256 # cache old vrsave
! 	addis r31,0,0xffc0
! 	ori r31,r31,0x0000
! 	mtspr 256,r31 # declare VRs in vrsave
  
! 	cmplw cr0,r8,r4 # i<data_len
  	bc 4,0,L2400
  
! 	# load coefficients into v0-v1 and initial history into v2-v3
  	li r31,0xf
! 	and r31,r8,r31 # r31: data%4
  	li r11,16
! 	subf r31,r31,r11 # r31: 4-(data%4)
! 	slwi r31,r31,3 # convert to bits for vsro
  	li r10,-4
  	stw r31,-4(r9)
  	lvewx v0,r10,r9
  	vspltisb v6,-1
! 	vsro v6,v6,v0 # v6: mask vector
  
  	li r31,0x8
  	lvsl v0,0,r31
--- 267,339 ----
  	vaddsws v20,v20,v21
  
  L1307:
! 	vsumsws v20,v20,v16 ; v20[3]: sum
! 	vsraw v20,v20,v19 ; v20[3]: sum >> lp_quantization
  
! 	lvewx v21,0,r3 ; v21[n]: *residual
! 	vperm v21,v21,v21,v18 ; v21[3]: *residual
! 	vaddsws v20,v21,v20 ; v20[3]: *residual + (sum >> lp_quantization)
! 	vsldoi v18,v18,v18,4 ; increment shift vector
  
! 	vperm v21,v20,v20,v17 ; v21[n]: shift for storage
! 	vsldoi v17,v17,v17,12 ; increment shift vector
  	stvewx v21,0,r8
  
  	vsldoi v20,v20,v20,12
! 	vsldoi v8,v8,v20,4 ; insert value onto history
  
  	addi r3,r3,4
  	addi r8,r8,4
! 	cmplw cr0,r8,r4 ; i<data_len
  	bc 12,0,L1200
  
  L1400:
! 	mtspr 256,r0 ; restore old vrsave
  	lmw r31,-4(r1)
  	blr
  
  _FLAC__lpc_restore_signal_asm_ppc_altivec_16_order8:
! ;	r3: residual[]
! ;	r4: data_len
! ;	r5: qlp_coeff[]
! ;	r6: order
! ;	r7: lp_quantization
! ;	r8: data[]
! 
! ; see _FLAC__lpc_restore_signal_asm_ppc_altivec_16() above
! ; this version assumes order<=8; it uses fewer vector registers, which
should
! ; save time in context switches, and has less code, which may improve
! ; instruction caching
  
  	stmw r31,-4(r1)
  
  	addi r9,r1,-28
  	li r31,0xf
! 	andc r9,r9,r31 ; for quadword-aligned stack data
  
! 	slwi r6,r6,2 ; adjust for word size
  	slwi r4,r4,2
! 	add r4,r4,r8 ; r4 = data+data_len
  
! 	mfspr r0,256 ; cache old vrsave
! 	addis r31,0,hi16(0xffc00000)
! 	ori r31,r31,lo16(0xffc00000)
! 	mtspr 256,r31 ; declare VRs in vrsave
  
! 	cmplw cr0,r8,r4 ; i<data_len
  	bc 4,0,L2400
  
! 	; load coefficients into v0-v1 and initial history into v2-v3
  	li r31,0xf
! 	and r31,r8,r31 ; r31: data%4
  	li r11,16
! 	subf r31,r31,r11 ; r31: 4-(data%4)
! 	slwi r31,r31,3 ; convert to bits for vsro
  	li r10,-4
  	stw r31,-4(r9)
  	lvewx v0,r10,r9
  	vspltisb v6,-1
! 	vsro v6,v6,v0 ; v6: mask vector
  
  	li r31,0x8
  	lvsl v0,0,r31
***************
*** 346,359 ****
  	vspltisb v2,0
  	vspltisb v3,-1
  	vmrglw v2,v2,v3
! 	vsel v0,v1,v0,v2 # v0: reversal permutation vector
  
  	add r10,r5,r6
! 	lvsl v5,0,r5 # v5: coefficient alignment permutation vector
! 	vperm v5,v5,v5,v0 # v5: reversal coefficient alignment permutation vector
  
  	mr r11,r8
! 	lvsl v4,0,r11 # v4: history alignment permutation vector
  
  	lvx v0,0,r5
  	addi r5,r5,16
--- 343,356 ----
  	vspltisb v2,0
  	vspltisb v3,-1
  	vmrglw v2,v2,v3
! 	vsel v0,v1,v0,v2 ; v0: reversal permutation vector
  
  	add r10,r5,r6
! 	lvsl v5,0,r5 ; v5: coefficient alignment permutation vector
! 	vperm v5,v5,v5,v0 ; v5: reversal coefficient alignment permutation vector
  
  	mr r11,r8
! 	lvsl v4,0,r11 ; v4: history alignment permutation vector
  
  	lvx v0,0,r5
  	addi r5,r5,16
***************
*** 366,373 ****
  	cmplw cr0,r5,r10
  	bc 12,0,L2101
  	vand v0,v0,v6
! 	lis r31,L2301@ha
! 	la r31,L2301@l(r31)
  	b L2199
  
  L2101:
--- 363,370 ----
  	cmplw cr0,r5,r10
  	bc 12,0,L2101
  	vand v0,v0,v6
! 	addis r31,0,hi16(L2301)
! 	ori r31,r31,lo16(L2301)
  	b L2199
  
  L2101:
***************
*** 378,402 ****
  	lvx v7,0,r11
  	vperm v3,v7,v3,v4
  	vand v1,v1,v6
! 	lis r31,L2300@ha
! 	la r31,L2300@l(r31)
  
  L2199:
  	mtctr r31
  
! 	# set up invariant vectors
! 	vspltish v4,0 # v4: zero vector
  
  	li r10,-12
! 	lvsr v5,r10,r8 # v5: result shift vector
! 	lvsl v6,r10,r3 # v6: residual shift back vector
  
  	li r10,-4
  	stw r7,-4(r9)
! 	lvewx v7,r10,r9 # v7: lp_quantization vector
  
  L2200:
! 	vmulosh v8,v0,v2 # v8: sum vector
  	bcctr 20,0
  
  L2300:
--- 375,399 ----
  	lvx v7,0,r11
  	vperm v3,v7,v3,v4
  	vand v1,v1,v6
! 	addis r31,0,hi16(L2300)
! 	ori r31,r31,lo16(L2300)
  
  L2199:
  	mtctr r31
  
! 	; set up invariant vectors
! 	vspltish v4,0 ; v4: zero vector
  
  	li r10,-12
! 	lvsr v5,r10,r8 ; v5: result shift vector
! 	lvsl v6,r10,r3 ; v6: residual shift back vector
  
  	li r10,-4
  	stw r7,-4(r9)
! 	lvewx v7,r10,r9 ; v7: lp_quantization vector
  
  L2200:
! 	vmulosh v8,v0,v2 ; v8: sum vector
  	bcctr 20,0
  
  L2300:
***************
*** 405,431 ****
  	vaddsws v8,v8,v9
  
  L2301:
! 	vsumsws v8,v8,v4 # v8[3]: sum
! 	vsraw v8,v8,v7 # v8[3]: sum >> lp_quantization
  
! 	lvewx v9,0,r3 # v9[n]: *residual
! 	vperm v9,v9,v9,v6 # v9[3]: *residual
! 	vaddsws v8,v9,v8 # v8[3]: *residual + (sum >> lp_quantization)
! 	vsldoi v6,v6,v6,4 # increment shift vector
  
! 	vperm v9,v8,v8,v5 # v9[n]: shift for storage
! 	vsldoi v5,v5,v5,12 # increment shift vector
  	stvewx v9,0,r8
  
  	vsldoi v8,v8,v8,12
! 	vsldoi v2,v2,v8,4 # insert value onto history
  
  	addi r3,r3,4
  	addi r8,r8,4
! 	cmplw cr0,r8,r4 # i<data_len
  	bc 12,0,L2200
  
  L2400:
! 	mtspr 256,r0 # restore old vrsave
  	lmw r31,-4(r1)
  	blr
--- 402,428 ----
  	vaddsws v8,v8,v9
  
  L2301:
! 	vsumsws v8,v8,v4 ; v8[3]: sum
! 	vsraw v8,v8,v7 ; v8[3]: sum >> lp_quantization
  
! 	lvewx v9,0,r3 ; v9[n]: *residual
! 	vperm v9,v9,v9,v6 ; v9[3]: *residual
! 	vaddsws v8,v9,v8 ; v8[3]: *residual + (sum >> lp_quantization)
! 	vsldoi v6,v6,v6,4 ; increment shift vector
  
! 	vperm v9,v8,v8,v5 ; v9[n]: shift for storage
! 	vsldoi v5,v5,v5,12 ; increment shift vector
  	stvewx v9,0,r8
  
  	vsldoi v8,v8,v8,12
! 	vsldoi v2,v2,v8,4 ; insert value onto history
  
  	addi r3,r3,4
  	addi r8,r8,4
! 	cmplw cr0,r8,r4 ; i<data_len
  	bc 12,0,L2200
  
  L2400:
! 	mtspr 256,r0 ; restore old vrsave
  	lmw r31,-4(r1)
  	blr

Reasonably Related Threads

Search for more possibly parallel threads

flac dev - Oct 2004 - flac-1.1.1 completely broken on linux/ppc and on macosx if built with the standard toolchain (not xcode)

[Flac-dev] flac-1.1.1 completely broken on linux/ppc and on macosx if built with the standard toolchain (not xcode)

[Flac-dev] flac-1.1.1 completely broken on linux/ppc and on macosx if built with the standard toolchain (not xcode)

[Flac-dev] flac-1.1.1 completely broken on linux/ppc and on macosx if built with the standard toolchain (not xcode)

[Flac-dev] flac-1.1.1 completely broken on linux/ppc and on macosx if built with the standard toolchain (not xcode)

Reasonably Related Threads