Vorbis does not appear to use any SIMD instructions. A short look around in the source code indicates that it would be possible and might even yield big performance improvements. Why has nobody done it yet? I am currently trying to learn using these instructions and would be willing to rewrite a few functions in SIMD instructions, if I understand how to vectorize them and if they make a difference performance-wise. I recently got myself a VIA C3 chip because it can be used fanlessly, but the down side is that just playing a Vorbis stream takes about 10% CPU time. It would be beneficial if that could be sped up a little. So my question at this point is: which functions need more performance the worst. I find that splitting the packages makes profiling more of a hassle, but I will do it, if there are no good profiling data. Felix --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Felix von Leitner wrote:> > Vorbis does not appear to use any SIMD instructions. > > A short look around in the source code indicates that it would be > possible and might even yield big performance improvements. Why has > nobody done it yet?Because libvorbis is just reference code; there are way bigger performance improvements to be made by writing a performance- optimized deecoder/encoder and simd'ing _that_.> So my question at this point is: which functions need more performance > the worst. I find that splitting the packages makes profiling more of a > hassle, but I will do it, if there are no good profiling data.Trig transforms; and for decoder, huffman decode; and for encoder, all of the psy model stuff. <p>Segher --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Hello, just some comments: 1. SSE2 uses 64-bit floats, Vorbis uses 32-bit... If we use 64-bit precision instead of 32, the result will be different. I've made an encoder/decoder with 64-bit precision (I've changed float(s) to double(s) almost everywhere), but the sound-result has changed, the tone was a little different (not correct). 2. The new (v1.0) huffman/codebook decoding is not really good if we want to make a fast assembly code (normal x86). It's easier to omptimize the older (RC3) routines... (ie: we can do nothing with the bitreverse() function in asm) Attila <p><p><p>--- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
> 1. > SSE2 uses 64-bit floats, Vorbis uses 32-bit... > If we use 64-bit precision instead of 32, the result will be different.You get slightly different rounding. It doesn't matter one bit (actually... it matters considerably less than one bit). No aspect of Vorbis for which an arithmetic result is roundoff-sensitive uses floats.> I've made an encoder/decoder with 64-bit precision > (I've changed float(s) to double(s) almost everywhere), > but the sound-result has changed, the tone was a little > different (not correct).Then you have a bug, or you're imagining the problem. You ears have given you a hint, now go use some tools that can tell you for certain.> 2. > The new (v1.0) huffman/codebook decoding is not > really good if we want to make a fast assembly code (normal x86). > It's easier to omptimize the older (RC3) routines...They're 100% equivalent code. There's no part of Vorbis where you're required to preserve a specific algorithm, just preserve the equivalency. It's what optimization is all about. If rc3 was easier to optimize, then use that version of the algorithm. I expect that even with rc3 in assembly, the 1.0 version wins on performance. Given that you're worried about bitreverse() being hard to do in ASM, I also believe you have a flawed understanding about where cycles are going anyway. Segher is right: The next few rounds of Vorbis tuning should have nothing to do with ASM and everything to do with improving algorithmic efficiency. Memory use patterns are a good initial target. Otherwise, you'll get yourself into a position where you have a difficult-to-maintain assembly Vorbis decoder that took a long time to write being outperformed by a carefully tuned pure-C version that took less time to make. It's happened to GOGO and LAME, where the C tuning outperformed the original translation into assembly, and the assembly took ten times as long. Monty --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
These were my observations only... maybe they are not match with yours... I'll check again my optimizations, but I think so (I hope) I haven't missed anything (or the bug is in my ear). And everybody likes to follow his own way and fortunately Vorbis is opensource (thank You for it). btw. If I modify float(s) to double(s) in vorbis_lsp_to_curve() : p and q local variables and mdct.c/mdct.h : REG_TYPE I get more high sounds at playing beta3/beta4 files. But it's true that Vorbis 1.0 (RCx) routines are not so precision sensitive (only the mdct part). I attach my decoder-side-only Ogg Vorbis library (removed all encoder-side functions to reduce code size (190k->55k)) There are some asm routines and a lot of x86 specific code, but maybe some routines/optimizations are usefull in your official/original code too (ie: codebook.c, floor1.c). Check it, and if you think, use it, if you don't think, don't use it. :) But keep up the good work! best regards Attila btw. When will we get new Vorbis encoder? -------------- next part -------------- A non-text attachment was scrubbed... Name: ogg_mpx.zip Type: application/x-zip-compressed Size: 53822 bytes Desc: ogg_mpx.zip Url : http://lists.xiph.org/pipermail/vorbis-dev/attachments/20030206/a63ee326/ogg_mpx-0001.bin
Reasonably Related Threads
- [LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
- [LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
- [LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
- [LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz
- [LLVMdev] [PATCH] Emit rbit, clz on ARM for __builtin_ctz