thr3ads.net - Vorbis dev - [Vorbis-dev] Butterflies in mdct.c [Jan 2005]

If this information is useful, please help other people find it:
Share via:

petshome@atlas.cz

2005-Jan-26 08:50 UTC

[Vorbis-dev] Butterflies in mdct.c

In mdct.c there's some functions including some-point butterfly. In 32-point
and 16-point there are calling of smaller-point function everytime twice on each
half of data. When I looked on it I found that's just linear algebra. So it
can be rewritten to matrix multiplication. Some one can say: there's
optimization on in register working. But imagine there's one calling
32-point, twice 16-point and 4x 8-point. Thanks to matrix associative
multiplications it can be only 1 matrix multiplication which can be optimized
and the result won't (I hardly suppose) be worse than current. Even sum of
four same numbers in different functions could be done now only with left shift.
I'm lack of time but won't be. Still it's not easy to create those
matrixes from source code (especially 32-point). Would anyone send me those
matrixes or manual how to create them? I'll do the multiplications and
hand-optimization (I know alorithm to create application but this will be faster
by hand). Or did anyone try this? Can anyone prove that this can't be
faster? Even there's possibility to use MMX when working with matrixes.

Is there any other such time critical piece of code that needs optimization?
Send me a pointer and I'll look at.

Monty

2005-Jan-27 13:14 UTC

head link

[Vorbis-dev] Butterflies in mdct.c

On Wed, Jan 26, 2005 at 05:49:15PM +0100, petshome@atlas.cz wrote:
> In mdct.c there's some functions including some-point butterfly. In32-point and 16-point there are calling of smaller-point function
everytime twice on each half of data. When I looked on it I found
that's just linear algebra. So it can be rewritten to matrix
multiplication.

This takes an O(n log2(n)) operation and makes it O(n^2); That's not
an optimization.  Matrix multiplication is a powerful concept in
theory and nearly always the last resort in practice.  Also, the
smallest butterflies are better than O(n log2 (n)) because of identity
cases.
>Can anyone prove that this can't be faster? Even there's possibility
>to use MMX when working with matrixes.
It's unlikely enough that I don't want to take it seriously unless
someone has an example in hand of it running faster.

Monty

Sebastian Gesemann

2005-Feb-09 16:56 UTC

head link

[Vorbis-dev] Butterflies in mdct.c

petshome@atlas.cz wrote:
> In mdct.c there's some functions including some-point butterfly. In
> 32-point and 16-point there are calling of smaller-point function
> everytime twice on each half of data. When I looked on it I found
> that's just linear algebra. So it can be rewritten to matrix
> multiplication. Some one can say: there's optimization on in register
> working. But imagine there's one calling 32-point, twice 16-point and
> 4x 8-point. Thanks to matrix associative multiplications it can be
> only 1 matrix multiplication which can be optimized and the result
> won't (I hardly suppose) be worse than current.  Even sum of four same
First: I'm not aware of the libVorbis code but i *am* aware of how the 
MDCT works so here's my comment:

Yes, you are right. All the small transforms that work on subset of 
components can be combined into one big matrix transform. But the fun 
part is: It'l be slower. (The 16 point thingy isn't calculated twice on 
the *same* subet of components). Recall that a naive algorithm for a 
matrix vector multiplication (NxN matrix) takes O(N^2) multiplications 
and additions. In some lucky cases though (like the MDCT, DCT or FFT) 
this matrix can be factored into O(log_2(N)) matrixes which have each 
only O(N) non-zero components (sparse). A matrix vector multiplication 
with such a sparse matrix is extremely efficient and requires (in this 
case) O(n) multiplications and additions. So, if you apply each 
"sub-transform" separately you only need O(N * log_2 N) operations 
instead of of O(N^2).

I'm not sure but I guess the libVorbis MDCT implementation follows the 
algorithm of the mentioned paper "The use of multirate filter banks for 
coding of high quality digital audio". I personally prefer to reduce the 
MDCT in O(N) time to an FFT which also can be computed in O(N * log_2 
N). Someone on this mailing list reported a speed-up by modifying the 
Tremor code to calculate the MDCT via an FFT so I guess this method 
should be preferred. I think he sent a patch to this list. That's all i 
know - hope it helps.

Sebastian

Seemingly Similar Threads

Search for more maybe matching threads

Vorbis dev - Jan 2005 - Butterflies in mdct.c

[Vorbis-dev] Butterflies in mdct.c

[Vorbis-dev] Butterflies in mdct.c

[Vorbis-dev] Butterflies in mdct.c

Seemingly Similar Threads