Hi! I've spent the last few nights digging into the Vorbis source and working to implement a vorbis_synthesis_pcmout_int() function that kicks out interleaved int16_t pcm data. I think its important to have this function available to make the job for people using the codec a little easier. This function abstracts out the conversion to int16_t and removes the extra overhead of moving the pcm data over the processors data bus just to do the int16_t conversion after the vorbis synthesize. I've written that function and I've created a linux command line player that reads a .ogg from stdin and writes the pcm samples to the OSS sound drivers using the the vorbis_synthesis_pcmout_int() function. The linux command line player (that's giving it too much credit, really) is just like the encode/decode example limited to 44khz files. The patch for these features can be found here: http://moon.eorbit.net/~robert/pcm16.patch There is one slight problem with the vorbis_synthesis_pcmout_int() function that I hope we can all solve as a team. I personally failed to solve this problem, but I think it important to get this function into the codebase before the first release. I trust once the format and the interface settle there will be more focus on optimization and at that time we should be able to solve the problem, if not before. The problem occurs in block.c line 666 (!) and line 669 of the patched sources. These two lines are the lines that I chose for doing the double to int16_t conversion. The problem is that for my test case this solution was actually a bit slower than the normal vorbis_synthesis_pcmout() call, when the original goal of this function was to make the decoder more efficient. The problem is that with the synthesizer spitting out int16_t data it was actually doing slightly more than twice the number of float multiplies than the code that does the conversion as the very last step. This is due to the nature of the synthesizer, since it may make multiple passes over the pcm data as it overlaps/adds two windows together. Finding a better place to do the int16_t conversion is the key solving this problem. Once solution that I looked into was having the mdct kick out int16_t values, but without having access to and understanding the original paper that the mdct code was based on I only made things worse. I should really go an pay some attention to my girlfriend now... --ruaok Freezerburn! All else is only icing. -- Soul Coughing Robert Kaye -- robert@moon.eorbit.net http://moon.eorbit.net/~robert --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/
robert@moon.eorbit.net wrote:> Finding a better place to do the int16_t conversion is the key solving this > problem. Once solution that I looked into was having the mdct kick out > int16_t values, but without having access to and understanding the original > paper that the mdct code was based on I only made things worse.Hi, I am using WinCVS and don't want to get cvs working on my linux machines right now, and don't have patch on my win32 machine. So I am reading straight from the patch file. I assume this is the code you are talking about: + for(i=beginSl,t=beginSl;i<endSl;i++,t+=vi->channels) + pcm_int[t+j]+=vb->pcm[j][i]*32767.; + /* the remaining section */ + for(;i<sizeW;i++,t+=vi->channels) + pcm_int[t+j]=vb->pcm[j][i]*32767.; Here is my question: would it help to have a faster way to convert from double to integer? Being a game developer I possess some arcane knowledge about how to do wacky things with IEEE-754 floating point numbers. A standard trick that we do in games is to manipulate this format directly to squeeze out the values we want. This does two things: it (a) eliminates floating point multiplies, which are more expensive than adds on some processors, and (b) it eliminates the _ftol or equivalent code that the compiler sticks in when you go from any float size to integer (This stuff is there to enforce the IEEE rounding semantics which, most of the time you are converting float to int, you don't really care about... and it is this stuff that usually takes all the time). For example here is some code that converts from a double to a machine-word- sized integer (without scaling the double)... uhh this is actually C++ so it has a reference and stuff, but it can be done without that: ---------------------------------------- static const double fix32_conv_factor = ((double)0x10000000) * 256.0 * 1.5; static const double int_conv_factor = (fix32_conv_factor * (double)0x10000); const int LOW_WORD_OFFSET = 0; inline long iDOUBLE_TO_INT(double d) { d += int_conv_factor; const long *const &num = (long *)&d + LOW_WORD_OFFSET; return *num; } ---------------------------------------- The value of LOW_WORD_OFFSET changes depending on the architecture (on a sparc, you want it to be 1). Anyway to scale the number, you add a different value to the double in the first line of iDOUBLE_TO_INT. So for example, if you want to multiply the double by 32768, just use 'fix32_conv_factor' instead of 'int_conv_factor': ---------------------------------------- inline long iDOUBLE_TO_INT(double d) { d += fix32_conv_factor; const long *const &num = (long *)&d + LOW_WORD_OFFSET; return *num; } ---------------------------------------- Assuming that your compiler (gcc?) is generating rounding code for your cast, which it probably is, this is going to be hella faster. If you want to know why this actually works, I have a book in progress that talks about this stuff, I can forward you the draft chapters of that. -Jonathan. --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/
At 01:41 PM 4/19/00 -0700, you wrote:>Hi! > >I've spent the last few nights digging into the Vorbis source and working >to implement a vorbis_synthesis_pcmout_int() function that kicks out >interleaved int16_t pcm data. > >I think its important to have this function available to make the >job for people using the codec a little easier. This function abstracts >out the conversion to int16_t and removes the extra overhead of >moving the pcm data over the processors data bus just to do the int16_t >conversion after the vorbis synthesize.vorbisfile does this. It's also an order of magnitude easier to use - a minimal example requires precisely three functions. It leaves the conversion to 16 bit ints to later (after vorbis_synthesis_pcmout()) - but realistically, that data will be sitting around in L2 cache most likely, so there isn't significant extra overhead. Right now, it's pretty slow. A minor modification to a single line makes it roughly 10% faster (for fully decoding a single bitstream, ~4 minutes long), so I'll clean up that (it's all in ov_read()) and commit it. Well, after making sure it DOES give the same results (which it should).> >I've written that function and I've created a linux command line player >that reads a .ogg from stdin and writes the pcm samples to the OSS >sound drivers using the the vorbis_synthesis_pcmout_int() function. >The linux command line player (that's giving it too much credit, really) >is just like the encode/decode example limited to 44khz files.Thanks, this player might be useful for when I can't be bothered firing up xmms. I'll try it out later.> >The patch for these features can be found here: > > http://moon.eorbit.net/~robert/pcm16.patch > >There is one slight problem with the vorbis_synthesis_pcmout_int() >function that I hope we can all solve as a team. I personally failed >to solve this problem, but I think it important to get this function into >the codebase before the first release. I trust once the format and the >interface settle there will be more focus on optimization and at that >time we should be able to solve the problem, if not before. > >The problem occurs in block.c line 666 (!) and line 669 of the >patched sources. These two lines are the lines that I chose for >doing the double to int16_t conversion. The problem is that for >my test case this solution was actually a bit slower than the >normal vorbis_synthesis_pcmout() call, when the original goal of >this function was to make the decoder more efficient.Well, I think this is probably the wrong place to do it - though it might be advantagous to have something akin to the sample conversion routines in vorbisfile moved into libvorbis itself - it's definately best done AFTER the rest of the decoding, in my opinion.> >The problem is that with the synthesizer spitting out int16_t data it was >actually doing slightly more than twice the number of float multiplies than >the code that does the conversion as the very last step. This is due to the >nature of the synthesizer, since it may make multiple passes over the pcm >data as it overlaps/adds two windows together. > >Finding a better place to do the int16_t conversion is the key solving this >problem. Once solution that I looked into was having the mdct kick out >int16_t values, but without having access to and understanding the original >paper that the mdct code was based on I only made things worse. > >I should really go an pay some attention to my girlfriend now... > > >--ruaok Freezerburn! All else is only icing. -- Soul Coughing > >Robert Kaye -- robert@moon.eorbit.net http://moon.eorbit.net/~robert--- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/
Jonathan Blow (jon@bolt-action.com) wrote :> Being a game developer I possess some arcane knowledge about how to > do wacky things with IEEE-754 floating point numbers. A standard trick > that we do in games is to manipulate this format directly to squeeze out > the values we want. This does two things: it (a) eliminates floating > point multiplies, which are more expensive than adds on some processors, > and (b) it eliminates the _ftol or equivalent code that the compiler > sticks in when you go from any float size to integer (This stuff is there > to enforce the IEEE rounding semantics which, most of the time you are > converting float to int, you don't really care about... and it is this > stuff that usually takes all the time).I think correct rounding IS important. I noticed some code ( not in vorbis ) uses simple assigments : int i; float a; i = a; Which is wrong for signed audio data because it round towards zero, while correct would be rounding to nearest. The error is small, but the whole idea behind vorbis is go have _good_quality_ at low space usage , right ? Dithering might be considered too, for that matter . Regards David Balazic --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/