So, it turns out (and another implementation actually explicitly mentions it) that LSP->LPC computation using the FIR algorithm is very sensitive to noise (iterative algorithm) and really really requires doubles [we're not kidding]. This was complicating things for folks pursuing fixed point implementations, and also was a potential source for bugs if FP optimizations got out of hand. This may be the cause of the Win32 encoder 'blorps' (which also seem to affect PhatNoize). (Note that the LSP problem, when it exists identically in the encoder and decoder will seem to almost disappear). In any case, the new LSP code is committed and should head off all of the above possible problems. As a useful side effect, LPC IIR filters and the iFFT are now completely eliminated from the decode side and decode with current streams is about 10% faster. Monty --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/
Monty wrote:> > So, it turns out (and another implementation actually explicitly mentions it) > that LSP->LPC computation using the FIR algorithm is very sensitive to noise > (iterative algorithm) and really really requires doubles [we're not kidding]. > This was complicating things for folks pursuing fixed point implementations, > and also was a potential source for bugs if FP optimizations got out of hand. > This may be the cause of the Win32 encoder 'blorps' (which also seem to affect > PhatNoize). > > (Note that the LSP problem, when it exists identically in the encoder and decoder will seem to almost disappear). > > In any case, the new LSP code is committed and should head off all of the above possible problems. As a useful side effect, LPC IIR filters and the iFFT are now completely eliminated from the decode side and decode with current streams is about 10% faster.If I remember correctly, Tony Million (of Sonique/nad) used complex numbers (that could only be computed in small chunks) to exact even more precision than doubles for this reason in several versions of his nad MP3 decoder. (At the time, the CPU overhead was horrible, but now might be acceptable.) --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/
A note to Segher: I've had to work extensively on the new lsp_to_curve function... as an end result, I'm preparing to commit code that eliminates all calls to cos(), exp(), and sqrt() in the decoder (as well as optionally taking all of the floor decoding to fixed point). So worry about the MDCT, I've got a lrage swath of the rest covered ;-) Monty --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/
Segher Boessenkool <segher@eastsite.nl> writes:> Or does anyone know a good cycle-level profiling tool that can > measure cache misses (branch predictions would be nice as well?)Cacheprof, by Julian Seward, the author of bzip2: http://www.cacheprof.org/ "Cacheprof will run your program, simulating a cache of your choice, and will annotate each line of source code with the number of memory references and the number of cache misses caused by that line. It will also print summaries per-procedure, and for the program as a whole. Cacheprof works on PCs running Linux, in conjunction with the GNU toolchain. It's designed to be as non-disruptive as possible. You don't need to modify or reinstall existing gcc/g++/g77's. Usage is very simple: place the command cacheprof in front of all compile commands, for example: cacheprof gcc -O -o myprog myprog.c." --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/