Can anybody point me at any resources that would explain how to optimize mdct_backward for a cpu with a fused multiply-accumute unit?>From what I understand from responses to my older postings, Tremor'smdct_backward could be rewritten to take advantage of a muladd. My target machine can do either two-wide 32x32 + Accum(64) -> Accum(64) integer muladd or eight-wide 16x16 + Accum(32) -> Accum(32) integer muladd or four-wide single-precision floating-point muladd. The tremor code seems to be much cleaner and more portable than the stock version for consoles (no double-precision math routines, compiles more or less out-of-the-box on a C++ compiler) but I can afford an int-to-float if necessary. What values of 'n' does mdct_backward typically get called with? Should it be pretty simple to guarantee proper alignment of the input buffers to a 16-byte boundary? Can I get away with 16x16 multiplies without too much audio degredation? I also would be better off without a big sincos lut as pointed out by Segher Boessenkool back in March. Thanks again. Just to show that I'm not a total leech, here's a slightly faster (at least on the PS2) version of bitrev12 that doesn't use any luts (thanks to http://aggregate.org/MAGIC/) STIN int bitrev12(int x){ x = ((x & 0xaaa) >> 1) | ((x & 0x555) << 1); x = ((x & 0xccc) >> 2) | ((x & 0x333) << 2); x = ((x & 0xf00) >> 8) | (x & 0x0f0) | ((x & 0x00f) << 8); return x; } -Dave --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
On Wednesday 21 May 2003 08:12, David Etherton wrote:> Can anybody point me at any resources that would explain how to optimize > mdct_backward for a cpu with a fused multiply-accumute unit?MDCT optimisation is not my area of expertise, but I'll give some other advice anyway...> > From what I understand from responses to my older postings, Tremor's > mdct_backward could be rewritten to take advantage of a muladd. > > My target machine can do either two-wide 32x32 + Accum(64) -> Accum(64) > integer muladd or eight-wide 16x16 + Accum(32) -> Accum(32) integer muladd > or four-wide single-precision floating-point muladd. > > The tremor code seems to be much cleaner and more portable than the stock > version for consoles (no double-precision math routines, compiles more or > less out-of-the-box on a C++ compiler) but I can afford an int-to-float if > necessary.Well... it's _only_ a decoder, whereas the stock version includes the encoder. This naturally makes it a lot simpler - most of the complexities are in the encoder (for example, no double precision floats are needed for the decoder).> > What values of 'n' does mdct_backward typically get called with? Should it > be pretty simple to guarantee proper alignment of the input buffers to a > 16-byte boundary? Can I get away with 16x16 multiplies without too much > audio degredation?I think multiples of 2 from 64 to 8192 are allowed, and the most common will be 128 (or 256) and 2048 (or 4096 at very low bitrates). I'd have to check those, though. Alignment should be simple enough to guarantee. 16x16 multiplies probably won't give acceptable audio quality. <p>>> I also would be better off without a big sincos lut as pointed out by > Segher Boessenkool back in March.If this is because of the memory usage of the luts, you may be interested in looking at the Tremor 'lowmem-branch' branch, out of cvs. It uses (I'm told, I haven't tried it myself) about an order of magnitude less memory (heap+stack). That's at a cost of marginally higher cpu usage (10-15%?), but that might be a worthwhile tradeoff on a console. Mike --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
On Tue, 20 May 2003, David Etherton wrote:> Can anybody point me at any resources that would explain how to optimize > mdct_backward for a cpu with a fused multiply-accumute unit?This was discussed on this list some time ago.> From what I understand from responses to my older postings, Tremor's > mdct_backward could be rewritten to take advantage of a muladd.Well in fact you could start with the current Tremor code where the XPROD macros can easily be redefined specifically for your target. You can look at the ARM version where the 32x32->64 MAC instruction is used for example.> Can I get away with 16x16 multiplies without too much > audio degredation?No. See the definition of MULT32() and MULT31() for the _LOW_ACCURACY_ case. I had to make the multiplication unbalanced (like 24x8) since a 16x16 would not give acceptable audio quality.> Thanks again. Just to show that I'm not a total leech, here's a slightly > faster (at least on the PS2) version of bitrev12 that doesn't use any luts > (thanks to http://aggregate.org/MAGIC/) > > STIN int bitrev12(int x){ > x = ((x & 0xaaa) >> 1) | ((x & 0x555) << 1); > x = ((x & 0xccc) >> 2) | ((x & 0x333) << 2); > x = ((x & 0xf00) >> 8) | (x & 0x0f0) | ((x & 0x00f) << 8); > return x; > }Yeah, that's the generic cookbook method. This however sucks big time on ARM where immediate arguments can be used directly within opcodes only if they're not wider than 8 bits. The current version was optimized for ARM. <p>Nicolas --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.