In mdct.c there's some functions including some-point butterfly. In 32-point and 16-point there are calling of smaller-point function everytime twice on each half of data. When I looked on it I found that's just linear algebra. So it can be rewritten to matrix multiplication. Some one can say: there's optimization on in register working. But imagine there's one calling 32-point, twice 16-point and 4x 8-point. Thanks to matrix associative multiplications it can be only 1 matrix multiplication which can be optimized and the result won't (I hardly suppose) be worse than current. Even sum of four same numbers in different functions could be done now only with left shift. I'm lack of time but won't be. Still it's not easy to create those matrixes from source code (especially 32-point). Would anyone send me those matrixes or manual how to create them? I'll do the multiplications and hand-optimization (I know alorithm to create application but this will be faster by hand). Or did anyone try this? Can anyone prove that this can't be faster? Even there's possibility to use MMX when working with matrixes. Is there any other such time critical piece of code that needs optimization? Send me a pointer and I'll look at.
On Wed, Jan 26, 2005 at 05:49:15PM +0100, petshome@atlas.cz wrote:> In mdct.c there's some functions including some-point butterfly. In32-point and 16-point there are calling of smaller-point function everytime twice on each half of data. When I looked on it I found that's just linear algebra. So it can be rewritten to matrix multiplication. This takes an O(n log2(n)) operation and makes it O(n^2); That's not an optimization. Matrix multiplication is a powerful concept in theory and nearly always the last resort in practice. Also, the smallest butterflies are better than O(n log2 (n)) because of identity cases.>Can anyone prove that this can't be faster? Even there's possibility >to use MMX when working with matrixes.It's unlikely enough that I don't want to take it seriously unless someone has an example in hand of it running faster. Monty
petshome@atlas.cz wrote:> In mdct.c there's some functions including some-point butterfly. In > 32-point and 16-point there are calling of smaller-point function > everytime twice on each half of data. When I looked on it I found > that's just linear algebra. So it can be rewritten to matrix > multiplication. Some one can say: there's optimization on in register > working. But imagine there's one calling 32-point, twice 16-point and > 4x 8-point. Thanks to matrix associative multiplications it can be > only 1 matrix multiplication which can be optimized and the result > won't (I hardly suppose) be worse than current. Even sum of four sameFirst: I'm not aware of the libVorbis code but i *am* aware of how the MDCT works so here's my comment: Yes, you are right. All the small transforms that work on subset of components can be combined into one big matrix transform. But the fun part is: It'l be slower. (The 16 point thingy isn't calculated twice on the *same* subet of components). Recall that a naive algorithm for a matrix vector multiplication (NxN matrix) takes O(N^2) multiplications and additions. In some lucky cases though (like the MDCT, DCT or FFT) this matrix can be factored into O(log_2(N)) matrixes which have each only O(N) non-zero components (sparse). A matrix vector multiplication with such a sparse matrix is extremely efficient and requires (in this case) O(n) multiplications and additions. So, if you apply each "sub-transform" separately you only need O(N * log_2 N) operations instead of of O(N^2). I'm not sure but I guess the libVorbis MDCT implementation follows the algorithm of the mentioned paper "The use of multirate filter banks for coding of high quality digital audio". I personally prefer to reduce the MDCT in O(N) time to an FFT which also can be computed in O(N * log_2 N). Someone on this mailing list reported a speed-up by modifying the Tremor code to calculate the MDCT via an FFT so I guess this method should be preferred. I think he sent a patch to this list. That's all i know - hope it helps. Sebastian