Hey everyone. I've been designing my own audio codec with extremely strict decode-performance constraints (including a fixed block size), which led me to attempting a number of unorthodox things to squeeze as much quality as possible. One surprising thing I discovered just earlier today was an extremely cheap method of reducing pre-echo during transients, without using short blocks (and still using a 50% overlap MDCT). Since I figured this might be pretty important even /with/ them (due to better frequency resolution), I decided to send a message here. The basic idea is: Transients generally add a sinusoidal shape to the frequency-domain coefficients, which is what makes them so hard to code at low bitrates, and why some codecs even implement frequency-domain linear prediction. But since that would impact performance a fair bit for my codec, I instead decided to take a lesson from wavelets and used a simple sum/difference on every pair of coefficients (sort-of emulating a Haar wavelet). ie. (excluding normalization factors) a = X[i*2], b = X[i*2+1] X[i*2] = a + b X[i*2+1] = a - b I tested with a very long block size to really emphasize the transient error diffusion, and the pre-echo seems to be cut down by half with no other effort at all. Adding another layer of this transform (ie. X[i*4]+/-X[i*4+2]) doesn't seem to improve things much, though. Perhaps a more frequency-selective wavelet (eg. LeGall5/3 or CDF9/7) would improve things a bit more, but in honesty I don't really have the motivation to test any further than this. I've included some examples of this modification in action (please excuse the particular taste in music; I just find it very useful for this kind of testing). It's coded at 32kbps stereo @ 44.1kHz, though quality is heavily degraded from using a larger block size than what I've optimized it for (and degrades further with the 2x filter, due to even worse separation between the non-zero and zero bands making the run-length coding perform very badly). I'm not sure how useful this will be, given the CELT layer's band folding, but I hope it's useful for the community at large anyway. -- Ruben -------------- next part -------------- A non-text attachment was scrubbed... Name: 32kbps_2xFilter.flac Type: audio/flac Size: 1190399 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20190216/418f0775/attachment-0003.flac> -------------- next part -------------- A non-text attachment was scrubbed... Name: 32kbps_NoFilter.flac Type: audio/flac Size: 1181118 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20190216/418f0775/attachment-0004.flac> -------------- next part -------------- A non-text attachment was scrubbed... Name: 32kbps_1xFilter.flac Type: audio/flac Size: 1177537 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20190216/418f0775/attachment-0005.flac>
Timothy B. Terriberry
2019-May-27 21:06 UTC
[opus] Potential transient pre-echo reduction filter
Aikku wrote:> linear prediction. But since that would impact performance a fair bit > for my codec, I instead decided to take a lesson from wavelets and used > a simple sum/difference on every pair of coefficients (sort-of emulating > a Haar wavelet). ie. (excluding normalization factors) > > a = X[i*2], b = X[i*2+1] > X[i*2] = a + b > X[i*2+1] = a - bYou might be interested in looking at RFC 6716 Section 4.3.4.5, and the corresponding source code in celt/bands.c's quant_band(), which calls haar1().