thr3ads.net - flac dev - [Flac-dev] More Altivec/PPC Stuff... [Oct 2004]

If this information is useful, please help other people find it:
Share via:

Chris Csanady

2004-Oct-30 06:41 UTC

[Flac-dev] More Altivec/PPC Stuff...

Sorry that it has been a while since the last altivec patch.  I have  
noticed something interesting,
and so it remains unfinished...

On the ppc, even with the altivec optimizations, almost a quarter of  
the time is spent in
FLAC__stream_encoder_process().  I finally discovered that it is  
because of all the integer to
float conversions.  Aside from being exceptionally slow on the g4, they  
will cause a ton of load
store rejects on the 970, making matters even worse there.

Since the single precision float conversion is much more efficient in  
altivec, I have hacked the
FLAC__compute_autocorrelation_altivec() function to take an integer  
signal, not even computing
real signal at all.  Is this ok?  It doesn't seem to affect anything  
else, though I admit it is ugly...

Anyways, the overall improvement is about 5%  at -8, and 15% at  
defaults.  In both cases, with
this hack, the altivec version is now about 45% faster.

What's left of a default encode is shown below. :)  It seems that most  
of the remaining time
is consumed by the rice coding...

	25.7%	25.7%	flac	FLAC__bitbuffer_write_raw_uint32
	11.0%	11.0%	flac	FLAC__bitbuffer_write_rice_signed
	10.8%	10.8%	flac	FLAC__MD5Accumulate	
	6.9%	6.9%	flac	set_partitioned_rice_	
	6.9%	6.9%	flac	FLAC__stream_encoder_process
	6.8%	6.8%	flac	find_best_partition_order_	
	5.6%	5.6%	flac	FLAC__MD5Transform	
	4.8%	4.8%	flac	FLAC__fixed_compute_best_predictor_altivec
	4.2%	4.2%	flac	format_input	
	2.9%	2.9%	flac	FLAC__lpc_compute_autocorrelation_altivec
	2.0%	2.0%	flac	FLAC__fixed_compute_residual_altivec
	1.9%	1.9%	flac	FLAC__crc16	
	1.8%	1.8%	mach_kernel	ml_set_interrupts_enabled
	1.5%	1.5%	flac	FLAC__lpc_compute_residual_from_qlp_coefficients_altivec
	1.2%	1.2%	flac	 
FLAC__lpc_compute_residual_from_qlp_coefficients_16bit_altivec	
For fun, I wrote a fast signed rice implementation, though I have yet  
to adapt it to the bitbuffer.

Also, for those interested, I came across a very nice arithmetic coding  
implementation at:

	http://www.cipr.rpi.edu/~said/FastAC.html

With a very crude adaptive model, it comes fairly close to the  
partitioned rice scheme, though I'm
betting it would be considerably faster, and a lot simpler.  Perhaps it  
is worth some more
investigation; it really is elegant compared to the others I've seen.  
(Unfortunately, it is written in
the hideous language that is C++, but thankfully uses a fairly  
reasonable subset of it.)

Chris

Chris Csanady

2004-Oct-30 06:55 UTC

head link

[Flac-dev] Re: More Altivec/PPC Stuff...

On 2004/10/30, at 8:40, Chris Csanady wrote:
> Anyways, the overall improvement is about 5%  at -8, and 15% at 
> defaults.  In both cases, with
> this hack, the altivec version is now about 45% faster.
Well, I think I have done it again; let me clarify that a bit.  A 
typical encode takes 45% less time,
so I probably should have said 80% faster. :)  Almost halves the 
encoding time...

Chris

flac dev - Oct 2004 - More Altivec/PPC Stuff...

[Flac-dev] More Altivec/PPC Stuff...

[Flac-dev] Re: More Altivec/PPC Stuff...