Hi all, I'm having trouble reconciling the coarse energy predictor's z-transform in the paper[0]/RFC and the corresponding code in libopus 1.3.1[1]. I'm pretty new to DSP theory and dealing with z-transforms, but I'm interested in learning (as well as compression), so I thought I'd study this filter. But I just can't seem to get it to match my understanding of the code; it's likely I've made a few mistakes, and any help/guidance would be greatly appreciated! Note that this is a bit difficult to describe without proper typesetting, so I've prepared some pdf notes (as well as lyx source) and attached them to this email, as well as a pdf render. In case that doesn't reach you, they're also available on my dropbox: pdf: https://www.dropbox.com/s/d3erbl9oc4r4wu7/predictor-confusion-2.pdf?dl=0 lyx: https://www.dropbox.com/s/9lxjliqfexe9vz0/predictor-confusion-2.lyx?dl=0 Finally, if THAT doesn't work, plaintext-with-tex-mixed-in version follows. Thanks for your time/help, Jake --- I'm having trouble reconciling the coarse energy predictor implementation in the libopus source code and the 2D z-transform description in the paper[0]. I've simplified the source code (in unquant_coarse_energy in quant_bands.c in libopus 1.3.1[1]) to the following C-like pseudocode: void unquant_coarse_energy(float *e, int bands) { float alpha = /* ... */; float beta = /* ... */; float p = 0.0f; for (int b = 0; b < bands; b++) { float q = /* read from bitstream */; e[i] = alpha * e[i] + p + q; p = p + q - beta * q; } } According to the paper, the 2D z-transform should be: A(z_{\ell},z_{b})=(1-\alpha z_{\ell}^{-1})\cdot\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}} First off, to state what I think is obvious: the domain of this filter should be a 2D ?energy plane? with the \ell-dimension representing frames and the b-dimension representing bands, and the range should be the prediction (actual band energy - q[\ell,b] , the residual). As a predictor, the filter must be causal. Finally, according to the code above, the energy is always 0 for b<0 (\ell<0, b\geq bands, and \ell\geq frames are not specified nor useful). Assuming this filter is separable, we first have the \ell -dimension predictor: A(z_{\ell})=1-\alpha z_{\ell}^{-1} At first, I thought this was clearly embodied by alpha * e[i] above. However, the z-transform implies that it should actually be (1 - alpha) * e[i], so already we seem to be missing another e[i] term somewhere (not to mention alpha having the wrong sign). The b-dimension predictor seems even more problematic: A(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}} This matches what's listed in the CELT blog post[2], and is equivalent to: Y(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}X(z_{b}) The equivalent difference equation is: y[b]=x[b]-x[b-1]+\beta y[b-1] And substituting names from the C code, we should get something like: prev[b]=q[b]-q[b-1]+\beta prev[b-1] Now, it should be mentioned that I actually asked about this recently in the DSP stack exchange[3] (after first emailing Jean-Marc Valin directly, but I seem to have scared him off with another wall of text similar to this one), and a helpful user there was able to clarify many things. We actually arrived at the same difference equation in the end, even though we got there a bit of a different way (one which actually included both dimensions from the original 2D z -transform), which suggests that my analysis above is correct. However, we still didn't figure out the last bit: reconciling it with the C code; it appears to differ. If I forget about the above and just read the C code, we should get: prev[b]=prev[b-1]+q[b]-\beta q[b] The equivalent z-transform for this difference equation would be: A(z_{b})=\frac{1-\beta}{1-z_{b}^{-1}} This suggests that the actual predictor description might instead be: A(z_{\ell},z_{b})=(1-\alpha z_{\ell}^{-1})\cdot\frac{1-\beta}{1-z_{b}^{-1}} However, that still ignores the apparently-missing e[i] term from the \ell-dimension. So, what am I missing? One thing that I glossed over above that the first predictor dimenson (\ell) appears to be applied to the band energy directly (as expected), whereas the second predictor dimension (b) appears to be applied to the residual q. Since q can be expressed in terms of the energy and the predictor, I tried several different interpretations and substitutions in various domains in order to describe a predictor in with the 2D ? energy plane? as the domain and the prediction as the range, and got some crazy z-transforms that don't look correct; here's a few just for the curious:A(z_{b},z_{\ell})=\frac{1-\beta+\alpha z_{\ell}^{-1}(1-z_{b}^{-1})}{\beta-z_{b}^{-1}} A(z_{b},z_{\ell})=\frac{1+\beta z_{b}^{-1}-\alpha z_{\ell}^{-1}(1-z_{b}^{-1})}{(1+\beta)z_{b}^{-1}} So, at this point I'm kindof running in circles, and I think I may have done something wrong; at least I'd like to think that's a lot more likely than the paper/RFC/libopus code were out of sync somehow! [0]: https://arxiv.org/abs/1602.04845 [1]: https://opus-codec.org/release/stable/2019/04/12/libopus-1_3_1.html [2]: https://jmvalin.dreamwidth.org/12000.html [3]: https://dsp.stackexchange.com/questions/75972/having-trouble-interpreting-z-transform-description-of-a-predictor-from-a-codec -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20210627/a4d92acf/attachment-0001.htm> -------------- next part -------------- A non-text attachment was scrubbed... Name: predictor-confusion-2.pdf Type: application/pdf Size: 81142 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20210627/a4d92acf/attachment-0001.pdf> -------------- next part -------------- A non-text attachment was scrubbed... Name: predictor-confusion-2.lyx Type: application/lyx Size: 10572 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20210627/a4d92acf/attachment-0001.bin>
Hi again all, I've made some progress with these notes since the last email. Particularly, I misunderstood how 2D (separable) filters should be cascaded, and that this is captured in the original paper's z-transform. I also discovered that I could arrive at the correct 2D z-transform from the C code if I accept that the range of the predictor is the *residual*, not the energy minus the residual, as I expected. This seems a little strange to me, and is the last missing link in the puzzle. Any tips/help/explanations would be appreciated to clear this up! Once again, notes in pdf/lyx format are attached, as well as on my dropbox: pdf: https://www.dropbox.com/s/6a3hw33w1zul7w3/predictor-confusion-3.pdf?dl=0 lyx: https://www.dropbox.com/s/e57duheaiyb7qy4/predictor-confusion-3.lyx?dl=0 and plaintext below. Thanks again, Jake --- Opus Coarse Energy Predictor Notes (Round 3) Jake Taylor (yupferris at gmail.com), 29 June 2021 I'm having trouble reconciling the coarse energy predictor implementation in the libopus source code and the 2D z-transform description in the paper[0]. At this point, I've gotten quite close, but am still having trouble with the last bits. And help/guidance would be appreciated! I've simplified the source code (in unquant_coarse_energy in quant_bands.c in libopus 1.3.1[1]) to the following C (pseudo)code: void unquant_coarse_energy(float *e, int bands) { float alpha = /* ... */; float beta = /* ... */; float prev = 0.0f; for (int b = 0; b < bands; b++) { float r = /* read from bitstream */; float q = alpha * e[i] + prev; e[i] = q + r; prev = prev + (1 - beta) * r; } } According to the paper, the 2D z-transform should be: A(z_{\ell},z_{b})=(1-\alpha z_{\ell}^{-1})\cdot\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}} First, to state what I think is obvious: the domain of this filter should be a 2D ?energy plane? with the \ell-dimension representing frames and the b-dimension representing bands, and the range should be the prediction (actual band energy minus r[\ell,b] , the residual). As a predictor, the filter must be causal. Finally, according to the code above, the energy is always 0 for b<0 (b\geq bands, \ell<0, and \ell\geq frames are neither specified nor useful). As outlined in the CELT blog post[2], this z-transform describes two (separable) cascaded predictors: A(z_{\ell},z_{b})=P(z_{\ell})Q(z_{b}) The first is the \ell-dimension predictor whose domain is indeed the 2D energy plane (albeit a single ?row? of it corresponding to a particular band in isolation over multiple frames): P(z_{\ell})=1-\alpha z_{\ell}^{-1} If we write the range explicitly: P(z_{\ell})=(1-\alpha z_{\ell}^{-1})E(z_{\ell}) This corresponds to the following difference equation: p[\ell]=e[\ell]-\alpha e[\ell-1] The range of this predictor is not the final prediction, but an intermediate p. Note that this equation includes the current band's energy e[\ell], which is somewhat surprising for a predictor, as e[\ell] will not be known until it can be derived after r is decoded in the decoder. For each \ell (?columns? of the energy plane), successive elements of p are then fed into the b-dimension predictor, whose range is the final prediction: Q(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}} The equivalent difference equation is: q[b]=p[b]-p[b-1]+\beta q[b-1] What remains now is to match these cascaded predictors with the C code, which is not trivial because it's not immediately apparent that the above derived difference equations are present in the code. There are likely several ways to do this; two high-level approaches stand out to me: either ?lower? the expected z ?transforms to difference equations and refactor until we get matching code, or ?lift? the code to z-transforms and refactor until those match. I find it easier to manipulate terms in the z-domain, so let's try the latter approach. We'll first describe prev in terms of the known signals. Looking at the C code, we know the following: q[\ell,b]=\alpha e[\ell-1,b]+prev[\ell,b] Since q is the output of the predictor, we can rewrite it in terms of e and r: q[\ell,b]=e[\ell,b]-r[\ell,b] This is also given by the C code. Substituting this in the previous equation yields: prev[\ell,b]=e[\ell,b]-\alpha e[\ell-1,b]-r[\ell,b] Or, equivalently: prev[\ell,b]=p[\ell,b]-r[\ell,b] Trivially, the corresponding z-transform is: Prev(z_{\ell},z_{b})=P(z_{\ell},z_{b})-R(z_{\ell},z_{b}) So, prev apparently represents the difference between the output of the \ell-dimension predictor p and the residual r. Interestingly, q, the expected range of the b-dimension predictor, has completely disappeared! This is odd, as without this term, it's going to be difficult to reach a z-transform with the correct range. However, ignoring this for now, we can still make some kind of progress. The difference equation governing prev is the following (again, from the C code): prev[\ell,b+1]=prev[\ell,b]+(1-\beta)r[\ell,b] Note that the output of the difference equation is indexed with b+1 because it represents the value of prev after it has been updated for this loop iteration. This is actually an important distinction, because the code that modifies prev uses the r[\ell,b] , i.e. the current residual, and these two signals should have the correct phase with respect to one another. We can also express this as: prev[\ell,b]=prev[\ell,b-1]+(1-\beta)r[\ell,b-1] These two definitions are, of course, equivalent. The z-transform of this difference equation is: Prev(z_{\ell},z_{b})=z_{b}^{-1}Prev(z_{\ell},z_{b})+(1-\beta)z_{b}^{-1}R(z_{\ell},z_{b}) The final step involves substituting occurrences of prev with equivalent expressions in terms of P and R: P(z_{\ell},z_{b})-R(z_{\ell},z_{b})=z_{b}^{-1}(P(z_{\ell},z_{b})-R(z_{\ell},z_{b}))+(1-\beta)z_{b}^{-1}R(z_{\ell},z_{b}) This can be simplified to: \frac{R(z_{\ell},z_{b})}{P(z_{\ell},z_{b})}=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}} While this filter has almost the expected definition with the correct domain p, its range is r, not q! This is somewhat unsurprising, as the q terms cancelled out of our equations above. However, I find it really strange that the output of the predictor is actually the residual, and not the prediction itself. One would think that, because the residual is not yet known to the decoder until dequantization, the predictor would represent the resulting energy minus that residual. I suppose I could be happy accepting that, but it would still be nice to get a few more pairs of eyes on this to make sure I haven't made any mistakes, and perhaps to provide additional insight on why the predictor is factored this way. [0] https://arxiv.org/abs/1602.0484 [1] https://opus-codec.org/release/stable/2019/04/12/libopus-1_3_1.html [2] https://jmvalin.dreamwidth.org/12000.html On Sun, Jun 27, 2021 at 11:29 PM Jake Taylor <yupferris at gmail.com> wrote:> Hi all, > > I'm having trouble reconciling the coarse energy predictor's z-transform > in the paper[0]/RFC and the corresponding code in libopus 1.3.1[1]. I'm > pretty new to DSP theory and dealing with z-transforms, but I'm interested > in learning (as well as compression), so I thought I'd study this filter. > But I just can't seem to get it to match my understanding of the code; it's > likely I've made a few mistakes, and any help/guidance would be greatly > appreciated! > > Note that this is a bit difficult to describe without proper typesetting, > so I've prepared some pdf notes (as well as lyx source) and attached them > to this email, as well as a pdf render. In case that doesn't reach you, > they're also available on my dropbox: > pdf: > https://www.dropbox.com/s/d3erbl9oc4r4wu7/predictor-confusion-2.pdf?dl=0 > lyx: > https://www.dropbox.com/s/9lxjliqfexe9vz0/predictor-confusion-2.lyx?dl=0 > > Finally, if THAT doesn't work, plaintext-with-tex-mixed-in version follows. > > Thanks for your time/help, > Jake > > --- > > I'm having trouble reconciling the coarse energy predictor > implementation in the libopus source code and the 2D z-transform > description in the paper[0]. > > I've simplified the source code (in unquant_coarse_energy in > quant_bands.c in libopus 1.3.1[1]) to the following C-like pseudocode: > > void unquant_coarse_energy(float *e, int bands) { > float alpha = /* ... */; > float beta = /* ... */; > float p = 0.0f; > for (int b = 0; b < bands; b++) { > float q = /* read from bitstream */; > e[i] = alpha * e[i] + p + q; > p = p + q - beta * q; > } > } > > According to the paper, the 2D z-transform should be: > > A(z_{\ell},z_{b})=(1-\alpha z_{\ell}^{-1})\cdot\frac{1-z_{b}^{-1}}{1-\beta > z_{b}^{-1}} > > First off, to state what I think is obvious: the domain of this > filter should be a 2D ?energy plane? with the \ell-dimension > representing frames and the b-dimension representing bands, and > the range should be the prediction (actual band energy - q[\ell,b] > , the residual). As a predictor, the filter must be causal. > Finally, according to the code above, the energy is always 0 for b<0 > (\ell<0, b\geq bands, and \ell\geq frames are not specified nor > useful). > > Assuming this filter is separable, we first have the \ell > -dimension predictor: > > A(z_{\ell})=1-\alpha z_{\ell}^{-1} > > At first, I thought this was clearly embodied by alpha * e[i] > above. However, the z-transform implies that it should actually > be (1 - alpha) * e[i], so already we seem to be missing another e[i] > term somewhere (not to mention alpha having the wrong sign). > > The b-dimension predictor seems even more problematic: > > A(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}} > > This matches what's listed in the CELT blog post[2], and is equivalent to: > > Y(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}X(z_{b}) > > The equivalent difference equation is: > > y[b]=x[b]-x[b-1]+\beta y[b-1] > > And substituting names from the C code, we should get something > like: > > prev[b]=q[b]-q[b-1]+\beta prev[b-1] > > Now, it should be mentioned that I actually asked about this > recently in the DSP stack exchange[3] (after first emailing Jean-Marc > Valin directly, but I seem to > have scared him off with another wall of text similar to this > one), and a helpful user there was able to clarify many things. > We actually arrived at the same difference equation in the end, > even though we got there a bit of a different way (one which > actually included both dimensions from the original 2D z > -transform), which suggests that my analysis above is correct. > > However, we still didn't figure out the last bit: reconciling it > with the C code; it appears to differ. If I forget about the > above and just read the C code, we should get: > > prev[b]=prev[b-1]+q[b]-\beta q[b] > > The equivalent z-transform for this difference equation would be: > > A(z_{b})=\frac{1-\beta}{1-z_{b}^{-1}} > > This suggests that the actual predictor description might instead > be: > > A(z_{\ell},z_{b})=(1-\alpha z_{\ell}^{-1})\cdot\frac{1-\beta}{1-z_{b}^{-1}} > > However, that still ignores the apparently-missing e[i] term from > the \ell-dimension. > > So, what am I missing? One thing that I glossed over above that > the first predictor dimenson (\ell) appears to be applied to the > band energy directly (as expected), whereas the second predictor > dimension (b) appears to be applied to the residual q. Since q > can be expressed in terms of the energy and the predictor, I > tried several different interpretations and substitutions in > various domains in order to describe a predictor in with the 2D ? > energy plane? as the domain and the prediction as the range, and > got some crazy z-transforms that don't look correct; here's a few > just for the curious:A(z_{b},z_{\ell})=\frac{1-\beta+\alpha > z_{\ell}^{-1}(1-z_{b}^{-1})}{\beta-z_{b}^{-1}} > > A(z_{b},z_{\ell})=\frac{1+\beta z_{b}^{-1}-\alpha > z_{\ell}^{-1}(1-z_{b}^{-1})}{(1+\beta)z_{b}^{-1}} > > So, at this point I'm kindof running in circles, and I think I > may have done something wrong; at least I'd like to think that's > a lot more likely than the paper/RFC/libopus code were out of > sync somehow! > > [0]: https://arxiv.org/abs/1602.04845 > [1]: https://opus-codec.org/release/stable/2019/04/12/libopus-1_3_1.html > [2]: https://jmvalin.dreamwidth.org/12000.html > [3]: > https://dsp.stackexchange.com/questions/75972/having-trouble-interpreting-z-transform-description-of-a-predictor-from-a-codec >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20210629/325bf6b6/attachment-0001.htm> -------------- next part -------------- A non-text attachment was scrubbed... Name: predictor-confusion-3.lyx Type: application/lyx Size: 16363 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20210629/325bf6b6/attachment-0001.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: predictor-confusion-3.pdf Type: application/pdf Size: 90881 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20210629/325bf6b6/attachment-0001.pdf>
Hi again all, I've made a bit more progress on this now, and I believe with some certainty that the 2D z-transform for the coarse energy predictor listed in the paper[0]/RFC[1] does *not* actually match what the libopus code is doing. It would be good to get some extra eyes on my math to (dis)prove this. Once again, notes in pdf/lyx format are attached, as well as on my dropbox: pdf: https://www.dropbox.com/s/l4tzyvq1mne8jan/predictor-confusion-4.pdf?dl=0 lyx: https://www.dropbox.com/s/map3slkbnhyctui/predictor-confusion-4.lyx?dl=0 and plaintext below. Thanks again, Jake --- Opus Coarse Energy Predictor Notes (Round 4) Jake Taylor (yupferris at gmail.com), 1 July 2021 I'm having trouble reconciling the 2D coarse energy predictor's z -transform description in the paper[0]/RFC[1] and the implementation in the libopus source code. At this point, I believe the z-transform does not actually describe the behavior of the filter in the code. According to the paper/RFC, the 2D z-transform should be: A(z_{\ell},z_{b})=(1-\alpha z_{\ell}^{-1})\cdot\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}} I've simplified the source code (in unquant_coarse_energy in quant_bands.c in libopus 1.3.1[2]) to the following C (pseudo)code: void unquant_coarse_energy(float *e, int bands) { float alpha = /* ... */; float beta = /* ... */; float prev = 0.0f; for (int b = 0; b < bands; b++) { float r = /* read from bitstream */; e[i] = alpha * e[i] + prev + r; prev = prev + (1 - beta) * r; } } The goal is simple: extract difference equations from the code, derive the corresponding 1D z-transforms, and then simplify/merge into a single 2D z-transform which should hopefully match what the paper/RFC list. The code is governed by the following difference equations: e[\ell,b]=\alpha e[\ell-1,b]+prev[\ell,b-1]+r[\ell,b] prev[\ell,b]=prev[\ell,b-1]+(1-\beta)r[\ell,b] The corresponding z-transforms are: E(z_{\ell},z_{b})=\alpha z_{\ell}^{-1}E(z_{\ell},z_{b})+z_{b}^{-1}Prev(z_{\ell},z_{b})+R(z_{\ell},z_{b}) Prev(z_{\ell},z_{b})=z_{b}^{-1}Prev(z_{\ell},z_{b})+(1-\beta)R(z_{\ell},z_{b}) We expect the domain of the predictor to be e (the 2D ?energy plane?) and the range of the predictor q to be: q[\ell,b]=e[\ell,b]-r[\ell,b] Equivalently, in the z-domain: Q(z_{\ell},z_{b})=E(z_{\ell},z_{b})-R(z_{\ell},z_{b}) These three z-transforms should be all we need to reach the 2D z -transform from the paper/RFC. First, let's eliminate the R terms using the definition for Q: R(z_{\ell},z_{b})=E(z_{\ell},z_{b})-Q(z_{\ell},z_{b}) This leads us to: E(z_{\ell},z_{b})=\alpha z_{\ell}^{-1}E(z_{\ell},z_{b})+z_{b}^{-1}Prev(z_{\ell},z_{b})+E(z_{\ell},z_{b})-Q(z_{\ell},z_{b}) Prev(z_{\ell},z_{b})=z_{b}^{-1}Prev(z_{\ell},z_{b})+(1-\beta)(E(z_{\ell},z_{b})-Q(z_{\ell},z_{b})) Let's simplify each individually a bit: -\alpha z_{\ell}^{-1}E(z_{\ell},z_{b})=z_{b}^{-1}Prev(z_{\ell},z_{b})-Q(z_{\ell},z_{b}) Prev(z_{\ell},z_{b})=\frac{1-\beta}{1-z_{b}^{-1}}(E(z_{\ell},z_{b})-Q(z_{\ell},z_{b}))>From here, we can substitute the Prev term in the former z-transform: -\alpha z_{\ell}^{-1}E(z_{\ell},z_{b})=z_{b}^{-1}(\frac{1-\beta}{1-z_{b}^{-1}}(E(z_{\ell},z_{b})-Q(z_{\ell},z_{b})))-Q(z_{\ell},z_{b}) And simplifying that yields: Q(z_{\ell},z_{b})=\frac{\alpha z_{\ell}^{-1}(1-z_{b}^{-1})+z_{b}^{-1}(1-\beta)}{1-\beta z_{b}^{-1}}E(z_{\ell},z_{b}) Q(z_{\ell},z_{b})=\frac{\alpha z_{\ell}^{-1}(1-z_{b}^{-1})+z_{b}^{-1}(1-\beta)}{1-\beta z_{b}^{-1}}E(z_{\ell},z_{b}) It's difficult to factor this in a better way; even using Wolfram Alpha[3]. But, barring mathematical mistakes I might have made, one thing is clear: this is not the same as what's listed in the paper/RFC. [0]: https://arxiv.org/abs/1602.04845 [1]: https://datatracker.ietf.org/doc/html/rfc6716#section-4.3.2 [2]: https://opus-codec.org/release/stable/2019/04/12/libopus-1_3_1.html [3]: https://www.wolframalpha.com/input/?i=alpha*l%281-b%29%2Bb%281-beta%29 On Tue, Jun 29, 2021 at 7:32 PM Jake Taylor <yupferris at gmail.com> wrote:> Hi again all, > > I've made some progress with these notes since the last email. > Particularly, I misunderstood how 2D (separable) filters should be > cascaded, and that this is captured in the original paper's z-transform. I > also discovered that I could arrive at the correct 2D z-transform from the > C code if I accept that the range of the predictor is the *residual*, not > the energy minus the residual, as I expected. This seems a little strange > to me, and is the last missing link in the puzzle. Any > tips/help/explanations would be appreciated to clear this up! > > Once again, notes in pdf/lyx format are attached, as well as on my dropbox: > pdf: > https://www.dropbox.com/s/6a3hw33w1zul7w3/predictor-confusion-3.pdf?dl=0 > lyx: > https://www.dropbox.com/s/e57duheaiyb7qy4/predictor-confusion-3.lyx?dl=0 > > and plaintext below. > > Thanks again, > Jake > > --- > > Opus Coarse Energy Predictor Notes (Round 3) > > Jake Taylor (yupferris at gmail.com), 29 June 2021 > > I'm having trouble reconciling the coarse energy predictor > implementation in the libopus source code and the 2D z-transform > description in the paper[0]. At this point, I've gotten quite close, but > am still having > trouble with the last bits. And help/guidance would be > appreciated! > > I've simplified the source code (in unquant_coarse_energy in > quant_bands.c in libopus 1.3.1[1]) to the following C (pseudo)code: > > void unquant_coarse_energy(float *e, int bands) { > float alpha = /* ... */; > float beta = /* ... */; > float prev = 0.0f; > for (int b = 0; b < bands; b++) { > float r = /* read from bitstream */; > float q = alpha * e[i] + prev; > e[i] = q + r; > prev = prev + (1 - beta) * r; > } > } > > According to the paper, the 2D z-transform should be: > > A(z_{\ell},z_{b})=(1-\alpha z_{\ell}^{-1})\cdot\frac{1-z_{b}^{-1}}{1-\beta > z_{b}^{-1}} > > First, to state what I think is obvious: the domain of this > filter should be a 2D ?energy plane? with the \ell-dimension > representing frames and the b-dimension representing bands, and > the range should be the prediction (actual band energy minus r[\ell,b] > , the residual). As a predictor, the filter must be causal. > Finally, according to the code above, the energy is always 0 for b<0 > (b\geq bands, \ell<0, and \ell\geq frames are neither specified > nor useful). > > As outlined in the CELT blog post[2], this z-transform describes two > (separable) cascaded > predictors: > > A(z_{\ell},z_{b})=P(z_{\ell})Q(z_{b}) > > The first is the \ell-dimension predictor whose domain is indeed > the 2D energy plane (albeit a single ?row? of it corresponding to > a particular band in isolation over multiple frames): > > P(z_{\ell})=1-\alpha z_{\ell}^{-1} > > If we write the range explicitly: > > P(z_{\ell})=(1-\alpha z_{\ell}^{-1})E(z_{\ell}) > > This corresponds to the following difference equation: > > p[\ell]=e[\ell]-\alpha e[\ell-1] > > The range of this predictor is not the final prediction, but an > intermediate p. Note that this equation includes the current > band's energy e[\ell], which is somewhat surprising for a > predictor, as e[\ell] will not be known until it can be derived > after r is decoded in the decoder. > > For each \ell (?columns? of the energy plane), successive > elements of p are then fed into the b-dimension predictor, whose > range is the final prediction: > > Q(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}} > > The equivalent difference equation is: > > q[b]=p[b]-p[b-1]+\beta q[b-1] > > What remains now is to match these cascaded predictors with the C > code, which is not trivial because it's not immediately apparent > that the above derived difference equations are present in the > code. There are likely several ways to do this; two high-level > approaches stand out to me: either ?lower? the expected z > ?transforms to difference equations and refactor until we get > matching code, or ?lift? the code to z-transforms and refactor > until those match. > > I find it easier to manipulate terms in the z-domain, so let's > try the latter approach. We'll first describe prev in terms of > the known signals. Looking at the C code, we know the following: > > q[\ell,b]=\alpha e[\ell-1,b]+prev[\ell,b] > > Since q is the output of the predictor, we can rewrite it in > terms of e and r: > > q[\ell,b]=e[\ell,b]-r[\ell,b] > > This is also given by the C code. Substituting this in the > previous equation yields: > > prev[\ell,b]=e[\ell,b]-\alpha e[\ell-1,b]-r[\ell,b] > > Or, equivalently: > > prev[\ell,b]=p[\ell,b]-r[\ell,b] > > Trivially, the corresponding z-transform is: > > Prev(z_{\ell},z_{b})=P(z_{\ell},z_{b})-R(z_{\ell},z_{b}) > > So, prev apparently represents the difference between the output > of the \ell-dimension predictor p and the residual r. > Interestingly, q, the expected range of the b-dimension > predictor, has completely disappeared! This is odd, as without > this term, it's going to be difficult to reach a z-transform with > the correct range. However, ignoring this for now, we can still > make some kind of progress. > > The difference equation governing prev is the following (again, > from the C code): > > prev[\ell,b+1]=prev[\ell,b]+(1-\beta)r[\ell,b] > > Note that the output of the difference equation is indexed with b+1 > because it represents the value of prev after it has been > updated for this loop iteration. This is actually an important > distinction, because the code that modifies prev uses the r[\ell,b] > , i.e. the current residual, and these two signals should have > the correct phase with respect to one another. We can also > express this as: > > prev[\ell,b]=prev[\ell,b-1]+(1-\beta)r[\ell,b-1] > > These two definitions are, of course, equivalent. The z-transform > of this difference equation is: > > > Prev(z_{\ell},z_{b})=z_{b}^{-1}Prev(z_{\ell},z_{b})+(1-\beta)z_{b}^{-1}R(z_{\ell},z_{b}) > > The final step involves substituting occurrences of prev with > equivalent expressions in terms of P and R: > > > P(z_{\ell},z_{b})-R(z_{\ell},z_{b})=z_{b}^{-1}(P(z_{\ell},z_{b})-R(z_{\ell},z_{b}))+(1-\beta)z_{b}^{-1}R(z_{\ell},z_{b}) > > This can be simplified to: > > \frac{R(z_{\ell},z_{b})}{P(z_{\ell},z_{b})}=\frac{1-z_{b}^{-1}}{1-\beta > z_{b}^{-1}} > > While this filter has almost the expected definition with the > correct domain p, its range is r, not q! This is somewhat > unsurprising, as the q terms cancelled out of our equations > above. However, I find it really strange that the output of the > predictor is actually the residual, and not the prediction > itself. One would think that, because the residual is not yet > known to the decoder until dequantization, the predictor would > represent the resulting energy minus that residual. I suppose I > could be happy accepting that, but it would still be nice to get > a few more pairs of eyes on this to make sure I haven't made any > mistakes, and perhaps to provide additional insight on why the > predictor is factored this way. > > [0] https://arxiv.org/abs/1602.0484 > [1] https://opus-codec.org/release/stable/2019/04/12/libopus-1_3_1.html > [2] https://jmvalin.dreamwidth.org/12000.html > > On Sun, Jun 27, 2021 at 11:29 PM Jake Taylor <yupferris at gmail.com> wrote: > >> Hi all, >> >> I'm having trouble reconciling the coarse energy predictor's z-transform >> in the paper[0]/RFC and the corresponding code in libopus 1.3.1[1]. I'm >> pretty new to DSP theory and dealing with z-transforms, but I'm interested >> in learning (as well as compression), so I thought I'd study this filter. >> But I just can't seem to get it to match my understanding of the code; it's >> likely I've made a few mistakes, and any help/guidance would be greatly >> appreciated! >> >> Note that this is a bit difficult to describe without proper typesetting, >> so I've prepared some pdf notes (as well as lyx source) and attached them >> to this email, as well as a pdf render. In case that doesn't reach you, >> they're also available on my dropbox: >> pdf: >> https://www.dropbox.com/s/d3erbl9oc4r4wu7/predictor-confusion-2.pdf?dl=0 >> lyx: >> https://www.dropbox.com/s/9lxjliqfexe9vz0/predictor-confusion-2.lyx?dl=0 >> >> Finally, if THAT doesn't work, plaintext-with-tex-mixed-in version >> follows. >> >> Thanks for your time/help, >> Jake >> >> --- >> >> I'm having trouble reconciling the coarse energy predictor >> implementation in the libopus source code and the 2D z-transform >> description in the paper[0]. >> >> I've simplified the source code (in unquant_coarse_energy in >> quant_bands.c in libopus 1.3.1[1]) to the following C-like pseudocode: >> >> void unquant_coarse_energy(float *e, int bands) { >> float alpha = /* ... */; >> float beta = /* ... */; >> float p = 0.0f; >> for (int b = 0; b < bands; b++) { >> float q = /* read from bitstream */; >> e[i] = alpha * e[i] + p + q; >> p = p + q - beta * q; >> } >> } >> >> According to the paper, the 2D z-transform should be: >> >> A(z_{\ell},z_{b})=(1-\alpha >> z_{\ell}^{-1})\cdot\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}} >> >> First off, to state what I think is obvious: the domain of this >> filter should be a 2D ?energy plane? with the \ell-dimension >> representing frames and the b-dimension representing bands, and >> the range should be the prediction (actual band energy - q[\ell,b] >> , the residual). As a predictor, the filter must be causal. >> Finally, according to the code above, the energy is always 0 for b<0 >> (\ell<0, b\geq bands, and \ell\geq frames are not specified nor >> useful). >> >> Assuming this filter is separable, we first have the \ell >> -dimension predictor: >> >> A(z_{\ell})=1-\alpha z_{\ell}^{-1} >> >> At first, I thought this was clearly embodied by alpha * e[i] >> above. However, the z-transform implies that it should actually >> be (1 - alpha) * e[i], so already we seem to be missing another e[i] >> term somewhere (not to mention alpha having the wrong sign). >> >> The b-dimension predictor seems even more problematic: >> >> A(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}} >> >> This matches what's listed in the CELT blog post[2], and is equivalent to: >> >> Y(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}X(z_{b}) >> >> The equivalent difference equation is: >> >> y[b]=x[b]-x[b-1]+\beta y[b-1] >> >> And substituting names from the C code, we should get something >> like: >> >> prev[b]=q[b]-q[b-1]+\beta prev[b-1] >> >> Now, it should be mentioned that I actually asked about this >> recently in the DSP stack exchange[3] (after first emailing Jean-Marc >> Valin directly, but I seem to >> have scared him off with another wall of text similar to this >> one), and a helpful user there was able to clarify many things. >> We actually arrived at the same difference equation in the end, >> even though we got there a bit of a different way (one which >> actually included both dimensions from the original 2D z >> -transform), which suggests that my analysis above is correct. >> >> However, we still didn't figure out the last bit: reconciling it >> with the C code; it appears to differ. If I forget about the >> above and just read the C code, we should get: >> >> prev[b]=prev[b-1]+q[b]-\beta q[b] >> >> The equivalent z-transform for this difference equation would be: >> >> A(z_{b})=\frac{1-\beta}{1-z_{b}^{-1}} >> >> This suggests that the actual predictor description might instead >> be: >> >> A(z_{\ell},z_{b})=(1-\alpha >> z_{\ell}^{-1})\cdot\frac{1-\beta}{1-z_{b}^{-1}} >> >> However, that still ignores the apparently-missing e[i] term from >> the \ell-dimension. >> >> So, what am I missing? One thing that I glossed over above that >> the first predictor dimenson (\ell) appears to be applied to the >> band energy directly (as expected), whereas the second predictor >> dimension (b) appears to be applied to the residual q. Since q >> can be expressed in terms of the energy and the predictor, I >> tried several different interpretations and substitutions in >> various domains in order to describe a predictor in with the 2D ? >> energy plane? as the domain and the prediction as the range, and >> got some crazy z-transforms that don't look correct; here's a few >> just for the curious:A(z_{b},z_{\ell})=\frac{1-\beta+\alpha >> z_{\ell}^{-1}(1-z_{b}^{-1})}{\beta-z_{b}^{-1}} >> >> A(z_{b},z_{\ell})=\frac{1+\beta z_{b}^{-1}-\alpha >> z_{\ell}^{-1}(1-z_{b}^{-1})}{(1+\beta)z_{b}^{-1}} >> >> So, at this point I'm kindof running in circles, and I think I >> may have done something wrong; at least I'd like to think that's >> a lot more likely than the paper/RFC/libopus code were out of >> sync somehow! >> >> [0]: https://arxiv.org/abs/1602.04845 >> [1]: https://opus-codec.org/release/stable/2019/04/12/libopus-1_3_1.html >> [2]: https://jmvalin.dreamwidth.org/12000.html >> [3]: >> https://dsp.stackexchange.com/questions/75972/having-trouble-interpreting-z-transform-description-of-a-predictor-from-a-codec >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20210701/235c71a6/attachment-0001.htm> -------------- next part -------------- A non-text attachment was scrubbed... Name: predictor-confusion-4.lyx Type: application/octet-stream Size: 8510 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20210701/235c71a6/attachment-0001.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: predictor-confusion-4.pdf Type: application/pdf Size: 76873 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20210701/235c71a6/attachment-0001.pdf>