Hi all,
I'm having trouble reconciling the coarse energy predictor's z-transform
in
the paper[0]/RFC and the corresponding code in libopus 1.3.1[1]. I'm pretty
new to DSP theory and dealing with z-transforms, but I'm interested in
learning (as well as compression), so I thought I'd study this filter. But
I just can't seem to get it to match my understanding of the code; it's
likely I've made a few mistakes, and any help/guidance would be greatly
appreciated!
Note that this is a bit difficult to describe without proper typesetting,
so I've prepared some pdf notes (as well as lyx source) and attached them
to this email, as well as a pdf render. In case that doesn't reach you,
they're also available on my dropbox:
pdf:
https://www.dropbox.com/s/d3erbl9oc4r4wu7/predictor-confusion-2.pdf?dl=0
lyx:
https://www.dropbox.com/s/9lxjliqfexe9vz0/predictor-confusion-2.lyx?dl=0
Finally, if THAT doesn't work, plaintext-with-tex-mixed-in version follows.
Thanks for your time/help,
Jake
---
I'm having trouble reconciling the coarse energy predictor
implementation in the libopus source code and the 2D z-transform
description in the paper[0].
I've simplified the source code (in unquant_coarse_energy in
quant_bands.c in libopus 1.3.1[1]) to the following C-like pseudocode:
void unquant_coarse_energy(float *e, int bands) {
float alpha = /* ... */;
float beta = /* ... */;
float p = 0.0f;
for (int b = 0; b < bands; b++) {
float q = /* read from bitstream */;
e[i] = alpha * e[i] + p + q;
p = p + q - beta * q;
}
}
According to the paper, the 2D z-transform should be:
A(z_{\ell},z_{b})=(1-\alpha z_{\ell}^{-1})\cdot\frac{1-z_{b}^{-1}}{1-\beta
z_{b}^{-1}}
First off, to state what I think is obvious: the domain of this
filter should be a 2D ?energy plane? with the \ell-dimension
representing frames and the b-dimension representing bands, and
the range should be the prediction (actual band energy - q[\ell,b]
, the residual). As a predictor, the filter must be causal.
Finally, according to the code above, the energy is always 0 for b<0
(\ell<0, b\geq bands, and \ell\geq frames are not specified nor
useful).
Assuming this filter is separable, we first have the \ell
-dimension predictor:
A(z_{\ell})=1-\alpha z_{\ell}^{-1}
At first, I thought this was clearly embodied by alpha * e[i]
above. However, the z-transform implies that it should actually
be (1 - alpha) * e[i], so already we seem to be missing another e[i]
term somewhere (not to mention alpha having the wrong sign).
The b-dimension predictor seems even more problematic:
A(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}
This matches what's listed in the CELT blog post[2], and is equivalent to:
Y(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}X(z_{b})
The equivalent difference equation is:
y[b]=x[b]-x[b-1]+\beta y[b-1]
And substituting names from the C code, we should get something
like:
prev[b]=q[b]-q[b-1]+\beta prev[b-1]
Now, it should be mentioned that I actually asked about this
recently in the DSP stack exchange[3] (after first emailing Jean-Marc Valin
directly, but I seem to
have scared him off with another wall of text similar to this
one), and a helpful user there was able to clarify many things.
We actually arrived at the same difference equation in the end,
even though we got there a bit of a different way (one which
actually included both dimensions from the original 2D z
-transform), which suggests that my analysis above is correct.
However, we still didn't figure out the last bit: reconciling it
with the C code; it appears to differ. If I forget about the
above and just read the C code, we should get:
prev[b]=prev[b-1]+q[b]-\beta q[b]
The equivalent z-transform for this difference equation would be:
A(z_{b})=\frac{1-\beta}{1-z_{b}^{-1}}
This suggests that the actual predictor description might instead
be:
A(z_{\ell},z_{b})=(1-\alpha z_{\ell}^{-1})\cdot\frac{1-\beta}{1-z_{b}^{-1}}
However, that still ignores the apparently-missing e[i] term from
the \ell-dimension.
So, what am I missing? One thing that I glossed over above that
the first predictor dimenson (\ell) appears to be applied to the
band energy directly (as expected), whereas the second predictor
dimension (b) appears to be applied to the residual q. Since q
can be expressed in terms of the energy and the predictor, I
tried several different interpretations and substitutions in
various domains in order to describe a predictor in with the 2D ?
energy plane? as the domain and the prediction as the range, and
got some crazy z-transforms that don't look correct; here's a few
just for the curious:A(z_{b},z_{\ell})=\frac{1-\beta+\alpha
z_{\ell}^{-1}(1-z_{b}^{-1})}{\beta-z_{b}^{-1}}
A(z_{b},z_{\ell})=\frac{1+\beta z_{b}^{-1}-\alpha
z_{\ell}^{-1}(1-z_{b}^{-1})}{(1+\beta)z_{b}^{-1}}
So, at this point I'm kindof running in circles, and I think I
may have done something wrong; at least I'd like to think that's
a lot more likely than the paper/RFC/libopus code were out of
sync somehow!
[0]: https://arxiv.org/abs/1602.04845
[1]: https://opus-codec.org/release/stable/2019/04/12/libopus-1_3_1.html
[2]: https://jmvalin.dreamwidth.org/12000.html
[3]:
https://dsp.stackexchange.com/questions/75972/having-trouble-interpreting-z-transform-description-of-a-predictor-from-a-codec
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20210627/a4d92acf/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: predictor-confusion-2.pdf
Type: application/pdf
Size: 81142 bytes
Desc: not available
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20210627/a4d92acf/attachment-0001.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: predictor-confusion-2.lyx
Type: application/lyx
Size: 10572 bytes
Desc: not available
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20210627/a4d92acf/attachment-0001.bin>
Hi again all,
I've made some progress with these notes since the last email.
Particularly, I misunderstood how 2D (separable) filters should be
cascaded, and that this is captured in the original paper's z-transform. I
also discovered that I could arrive at the correct 2D z-transform from the
C code if I accept that the range of the predictor is the *residual*, not
the energy minus the residual, as I expected. This seems a little strange
to me, and is the last missing link in the puzzle. Any
tips/help/explanations would be appreciated to clear this up!
Once again, notes in pdf/lyx format are attached, as well as on my dropbox:
pdf:
https://www.dropbox.com/s/6a3hw33w1zul7w3/predictor-confusion-3.pdf?dl=0
lyx:
https://www.dropbox.com/s/e57duheaiyb7qy4/predictor-confusion-3.lyx?dl=0
and plaintext below.
Thanks again,
Jake
---
Opus Coarse Energy Predictor Notes (Round 3)
Jake Taylor (yupferris at gmail.com), 29 June 2021
I'm having trouble reconciling the coarse energy predictor
implementation in the libopus source code and the 2D z-transform
description in the paper[0]. At this point, I've gotten quite close, but am
still having
trouble with the last bits. And help/guidance would be
appreciated!
I've simplified the source code (in unquant_coarse_energy in
quant_bands.c in libopus 1.3.1[1]) to the following C (pseudo)code:
void unquant_coarse_energy(float *e, int bands) {
float alpha = /* ... */;
float beta = /* ... */;
float prev = 0.0f;
for (int b = 0; b < bands; b++) {
float r = /* read from bitstream */;
float q = alpha * e[i] + prev;
e[i] = q + r;
prev = prev + (1 - beta) * r;
}
}
According to the paper, the 2D z-transform should be:
A(z_{\ell},z_{b})=(1-\alpha z_{\ell}^{-1})\cdot\frac{1-z_{b}^{-1}}{1-\beta
z_{b}^{-1}}
First, to state what I think is obvious: the domain of this
filter should be a 2D ?energy plane? with the \ell-dimension
representing frames and the b-dimension representing bands, and
the range should be the prediction (actual band energy minus r[\ell,b]
, the residual). As a predictor, the filter must be causal.
Finally, according to the code above, the energy is always 0 for b<0
(b\geq bands, \ell<0, and \ell\geq frames are neither specified
nor useful).
As outlined in the CELT blog post[2], this z-transform describes two
(separable) cascaded
predictors:
A(z_{\ell},z_{b})=P(z_{\ell})Q(z_{b})
The first is the \ell-dimension predictor whose domain is indeed
the 2D energy plane (albeit a single ?row? of it corresponding to
a particular band in isolation over multiple frames):
P(z_{\ell})=1-\alpha z_{\ell}^{-1}
If we write the range explicitly:
P(z_{\ell})=(1-\alpha z_{\ell}^{-1})E(z_{\ell})
This corresponds to the following difference equation:
p[\ell]=e[\ell]-\alpha e[\ell-1]
The range of this predictor is not the final prediction, but an
intermediate p. Note that this equation includes the current
band's energy e[\ell], which is somewhat surprising for a
predictor, as e[\ell] will not be known until it can be derived
after r is decoded in the decoder.
For each \ell (?columns? of the energy plane), successive
elements of p are then fed into the b-dimension predictor, whose
range is the final prediction:
Q(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}
The equivalent difference equation is:
q[b]=p[b]-p[b-1]+\beta q[b-1]
What remains now is to match these cascaded predictors with the C
code, which is not trivial because it's not immediately apparent
that the above derived difference equations are present in the
code. There are likely several ways to do this; two high-level
approaches stand out to me: either ?lower? the expected z
?transforms to difference equations and refactor until we get
matching code, or ?lift? the code to z-transforms and refactor
until those match.
I find it easier to manipulate terms in the z-domain, so let's
try the latter approach. We'll first describe prev in terms of
the known signals. Looking at the C code, we know the following:
q[\ell,b]=\alpha e[\ell-1,b]+prev[\ell,b]
Since q is the output of the predictor, we can rewrite it in
terms of e and r:
q[\ell,b]=e[\ell,b]-r[\ell,b]
This is also given by the C code. Substituting this in the
previous equation yields:
prev[\ell,b]=e[\ell,b]-\alpha e[\ell-1,b]-r[\ell,b]
Or, equivalently:
prev[\ell,b]=p[\ell,b]-r[\ell,b]
Trivially, the corresponding z-transform is:
Prev(z_{\ell},z_{b})=P(z_{\ell},z_{b})-R(z_{\ell},z_{b})
So, prev apparently represents the difference between the output
of the \ell-dimension predictor p and the residual r.
Interestingly, q, the expected range of the b-dimension
predictor, has completely disappeared! This is odd, as without
this term, it's going to be difficult to reach a z-transform with
the correct range. However, ignoring this for now, we can still
make some kind of progress.
The difference equation governing prev is the following (again,
from the C code):
prev[\ell,b+1]=prev[\ell,b]+(1-\beta)r[\ell,b]
Note that the output of the difference equation is indexed with b+1
because it represents the value of prev after it has been
updated for this loop iteration. This is actually an important
distinction, because the code that modifies prev uses the r[\ell,b]
, i.e. the current residual, and these two signals should have
the correct phase with respect to one another. We can also
express this as:
prev[\ell,b]=prev[\ell,b-1]+(1-\beta)r[\ell,b-1]
These two definitions are, of course, equivalent. The z-transform
of this difference equation is:
Prev(z_{\ell},z_{b})=z_{b}^{-1}Prev(z_{\ell},z_{b})+(1-\beta)z_{b}^{-1}R(z_{\ell},z_{b})
The final step involves substituting occurrences of prev with
equivalent expressions in terms of P and R:
P(z_{\ell},z_{b})-R(z_{\ell},z_{b})=z_{b}^{-1}(P(z_{\ell},z_{b})-R(z_{\ell},z_{b}))+(1-\beta)z_{b}^{-1}R(z_{\ell},z_{b})
This can be simplified to:
\frac{R(z_{\ell},z_{b})}{P(z_{\ell},z_{b})}=\frac{1-z_{b}^{-1}}{1-\beta
z_{b}^{-1}}
While this filter has almost the expected definition with the
correct domain p, its range is r, not q! This is somewhat
unsurprising, as the q terms cancelled out of our equations
above. However, I find it really strange that the output of the
predictor is actually the residual, and not the prediction
itself. One would think that, because the residual is not yet
known to the decoder until dequantization, the predictor would
represent the resulting energy minus that residual. I suppose I
could be happy accepting that, but it would still be nice to get
a few more pairs of eyes on this to make sure I haven't made any
mistakes, and perhaps to provide additional insight on why the
predictor is factored this way.
[0] https://arxiv.org/abs/1602.0484
[1] https://opus-codec.org/release/stable/2019/04/12/libopus-1_3_1.html
[2] https://jmvalin.dreamwidth.org/12000.html
On Sun, Jun 27, 2021 at 11:29 PM Jake Taylor <yupferris at gmail.com>
wrote:
> Hi all,
>
> I'm having trouble reconciling the coarse energy predictor's
z-transform
> in the paper[0]/RFC and the corresponding code in libopus 1.3.1[1]. I'm
> pretty new to DSP theory and dealing with z-transforms, but I'm
interested
> in learning (as well as compression), so I thought I'd study this
filter.
> But I just can't seem to get it to match my understanding of the code;
it's
> likely I've made a few mistakes, and any help/guidance would be greatly
> appreciated!
>
> Note that this is a bit difficult to describe without proper typesetting,
> so I've prepared some pdf notes (as well as lyx source) and attached
them
> to this email, as well as a pdf render. In case that doesn't reach you,
> they're also available on my dropbox:
> pdf:
> https://www.dropbox.com/s/d3erbl9oc4r4wu7/predictor-confusion-2.pdf?dl=0
> lyx:
> https://www.dropbox.com/s/9lxjliqfexe9vz0/predictor-confusion-2.lyx?dl=0
>
> Finally, if THAT doesn't work, plaintext-with-tex-mixed-in version
follows.
>
> Thanks for your time/help,
> Jake
>
> ---
>
> I'm having trouble reconciling the coarse energy predictor
> implementation in the libopus source code and the 2D z-transform
> description in the paper[0].
>
> I've simplified the source code (in unquant_coarse_energy in
> quant_bands.c in libopus 1.3.1[1]) to the following C-like pseudocode:
>
> void unquant_coarse_energy(float *e, int bands) {
> float alpha = /* ... */;
> float beta = /* ... */;
> float p = 0.0f;
> for (int b = 0; b < bands; b++) {
> float q = /* read from bitstream */;
> e[i] = alpha * e[i] + p + q;
> p = p + q - beta * q;
> }
> }
>
> According to the paper, the 2D z-transform should be:
>
> A(z_{\ell},z_{b})=(1-\alpha z_{\ell}^{-1})\cdot\frac{1-z_{b}^{-1}}{1-\beta
> z_{b}^{-1}}
>
> First off, to state what I think is obvious: the domain of this
> filter should be a 2D ?energy plane? with the \ell-dimension
> representing frames and the b-dimension representing bands, and
> the range should be the prediction (actual band energy - q[\ell,b]
> , the residual). As a predictor, the filter must be causal.
> Finally, according to the code above, the energy is always 0 for b<0
> (\ell<0, b\geq bands, and \ell\geq frames are not specified nor
> useful).
>
> Assuming this filter is separable, we first have the \ell
> -dimension predictor:
>
> A(z_{\ell})=1-\alpha z_{\ell}^{-1}
>
> At first, I thought this was clearly embodied by alpha * e[i]
> above. However, the z-transform implies that it should actually
> be (1 - alpha) * e[i], so already we seem to be missing another e[i]
> term somewhere (not to mention alpha having the wrong sign).
>
> The b-dimension predictor seems even more problematic:
>
> A(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}
>
> This matches what's listed in the CELT blog post[2], and is equivalent
to:
>
> Y(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}X(z_{b})
>
> The equivalent difference equation is:
>
> y[b]=x[b]-x[b-1]+\beta y[b-1]
>
> And substituting names from the C code, we should get something
> like:
>
> prev[b]=q[b]-q[b-1]+\beta prev[b-1]
>
> Now, it should be mentioned that I actually asked about this
> recently in the DSP stack exchange[3] (after first emailing Jean-Marc
> Valin directly, but I seem to
> have scared him off with another wall of text similar to this
> one), and a helpful user there was able to clarify many things.
> We actually arrived at the same difference equation in the end,
> even though we got there a bit of a different way (one which
> actually included both dimensions from the original 2D z
> -transform), which suggests that my analysis above is correct.
>
> However, we still didn't figure out the last bit: reconciling it
> with the C code; it appears to differ. If I forget about the
> above and just read the C code, we should get:
>
> prev[b]=prev[b-1]+q[b]-\beta q[b]
>
> The equivalent z-transform for this difference equation would be:
>
> A(z_{b})=\frac{1-\beta}{1-z_{b}^{-1}}
>
> This suggests that the actual predictor description might instead
> be:
>
> A(z_{\ell},z_{b})=(1-\alpha z_{\ell}^{-1})\cdot\frac{1-\beta}{1-z_{b}^{-1}}
>
> However, that still ignores the apparently-missing e[i] term from
> the \ell-dimension.
>
> So, what am I missing? One thing that I glossed over above that
> the first predictor dimenson (\ell) appears to be applied to the
> band energy directly (as expected), whereas the second predictor
> dimension (b) appears to be applied to the residual q. Since q
> can be expressed in terms of the energy and the predictor, I
> tried several different interpretations and substitutions in
> various domains in order to describe a predictor in with the 2D ?
> energy plane? as the domain and the prediction as the range, and
> got some crazy z-transforms that don't look correct; here's a few
> just for the curious:A(z_{b},z_{\ell})=\frac{1-\beta+\alpha
> z_{\ell}^{-1}(1-z_{b}^{-1})}{\beta-z_{b}^{-1}}
>
> A(z_{b},z_{\ell})=\frac{1+\beta z_{b}^{-1}-\alpha
> z_{\ell}^{-1}(1-z_{b}^{-1})}{(1+\beta)z_{b}^{-1}}
>
> So, at this point I'm kindof running in circles, and I think I
> may have done something wrong; at least I'd like to think that's
> a lot more likely than the paper/RFC/libopus code were out of
> sync somehow!
>
> [0]: https://arxiv.org/abs/1602.04845
> [1]: https://opus-codec.org/release/stable/2019/04/12/libopus-1_3_1.html
> [2]: https://jmvalin.dreamwidth.org/12000.html
> [3]:
>
https://dsp.stackexchange.com/questions/75972/having-trouble-interpreting-z-transform-description-of-a-predictor-from-a-codec
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20210629/325bf6b6/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: predictor-confusion-3.lyx
Type: application/lyx
Size: 16363 bytes
Desc: not available
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20210629/325bf6b6/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: predictor-confusion-3.pdf
Type: application/pdf
Size: 90881 bytes
Desc: not available
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20210629/325bf6b6/attachment-0001.pdf>
Hi again all,
I've made a bit more progress on this now, and I believe with some
certainty that the 2D z-transform for the coarse energy predictor listed in
the paper[0]/RFC[1] does *not* actually match what the libopus code is
doing. It would be good to get some extra eyes on my math to (dis)prove
this.
Once again, notes in pdf/lyx format are attached, as well as on my dropbox:
pdf:
https://www.dropbox.com/s/l4tzyvq1mne8jan/predictor-confusion-4.pdf?dl=0
lyx:
https://www.dropbox.com/s/map3slkbnhyctui/predictor-confusion-4.lyx?dl=0
and plaintext below.
Thanks again,
Jake
---
Opus Coarse Energy Predictor Notes (Round 4)
Jake Taylor (yupferris at gmail.com), 1 July 2021
I'm having trouble reconciling the 2D coarse energy predictor's z
-transform description in the paper[0]/RFC[1] and the implementation
in the libopus source code. At this point, I believe the z-transform
does not actually describe the behavior of the filter in the code.
According to the paper/RFC, the 2D z-transform should be:
A(z_{\ell},z_{b})=(1-\alpha z_{\ell}^{-1})\cdot\frac{1-z_{b}^{-1}}{1-\beta
z_{b}^{-1}}
I've simplified the source code (in unquant_coarse_energy in
quant_bands.c in libopus 1.3.1[2]) to the following C (pseudo)code:
void unquant_coarse_energy(float *e, int bands) {
float alpha = /* ... */;
float beta = /* ... */;
float prev = 0.0f;
for (int b = 0; b < bands; b++) {
float r = /* read from bitstream */;
e[i] = alpha * e[i] + prev + r;
prev = prev + (1 - beta) * r;
}
}
The goal is simple: extract difference equations from the code,
derive the corresponding 1D z-transforms, and then simplify/merge
into a single 2D z-transform which should hopefully match what
the paper/RFC list.
The code is governed by the following difference equations:
e[\ell,b]=\alpha e[\ell-1,b]+prev[\ell,b-1]+r[\ell,b]
prev[\ell,b]=prev[\ell,b-1]+(1-\beta)r[\ell,b]
The corresponding z-transforms are:
E(z_{\ell},z_{b})=\alpha
z_{\ell}^{-1}E(z_{\ell},z_{b})+z_{b}^{-1}Prev(z_{\ell},z_{b})+R(z_{\ell},z_{b})
Prev(z_{\ell},z_{b})=z_{b}^{-1}Prev(z_{\ell},z_{b})+(1-\beta)R(z_{\ell},z_{b})
We expect the domain of the predictor to be e (the 2D ?energy
plane?) and the range of the predictor q to be:
q[\ell,b]=e[\ell,b]-r[\ell,b]
Equivalently, in the z-domain:
Q(z_{\ell},z_{b})=E(z_{\ell},z_{b})-R(z_{\ell},z_{b})
These three z-transforms should be all we need to reach the 2D z
-transform from the paper/RFC.
First, let's eliminate the R terms using the definition for Q:
R(z_{\ell},z_{b})=E(z_{\ell},z_{b})-Q(z_{\ell},z_{b})
This leads us to:
E(z_{\ell},z_{b})=\alpha
z_{\ell}^{-1}E(z_{\ell},z_{b})+z_{b}^{-1}Prev(z_{\ell},z_{b})+E(z_{\ell},z_{b})-Q(z_{\ell},z_{b})
Prev(z_{\ell},z_{b})=z_{b}^{-1}Prev(z_{\ell},z_{b})+(1-\beta)(E(z_{\ell},z_{b})-Q(z_{\ell},z_{b}))
Let's simplify each individually a bit:
-\alpha
z_{\ell}^{-1}E(z_{\ell},z_{b})=z_{b}^{-1}Prev(z_{\ell},z_{b})-Q(z_{\ell},z_{b})
Prev(z_{\ell},z_{b})=\frac{1-\beta}{1-z_{b}^{-1}}(E(z_{\ell},z_{b})-Q(z_{\ell},z_{b}))
>From here, we can substitute the Prev term in the former z
-transform:
-\alpha
z_{\ell}^{-1}E(z_{\ell},z_{b})=z_{b}^{-1}(\frac{1-\beta}{1-z_{b}^{-1}}(E(z_{\ell},z_{b})-Q(z_{\ell},z_{b})))-Q(z_{\ell},z_{b})
And simplifying that yields:
Q(z_{\ell},z_{b})=\frac{\alpha
z_{\ell}^{-1}(1-z_{b}^{-1})+z_{b}^{-1}(1-\beta)}{1-\beta
z_{b}^{-1}}E(z_{\ell},z_{b})
Q(z_{\ell},z_{b})=\frac{\alpha
z_{\ell}^{-1}(1-z_{b}^{-1})+z_{b}^{-1}(1-\beta)}{1-\beta
z_{b}^{-1}}E(z_{\ell},z_{b})
It's difficult to factor this in a better way; even using Wolfram
Alpha[3]. But, barring mathematical mistakes I might have made, one
thing is clear: this is not the same as what's listed in the
paper/RFC.
[0]: https://arxiv.org/abs/1602.04845
[1]: https://datatracker.ietf.org/doc/html/rfc6716#section-4.3.2
[2]: https://opus-codec.org/release/stable/2019/04/12/libopus-1_3_1.html
[3]: https://www.wolframalpha.com/input/?i=alpha*l%281-b%29%2Bb%281-beta%29
On Tue, Jun 29, 2021 at 7:32 PM Jake Taylor <yupferris at gmail.com>
wrote:
> Hi again all,
>
> I've made some progress with these notes since the last email.
> Particularly, I misunderstood how 2D (separable) filters should be
> cascaded, and that this is captured in the original paper's
z-transform. I
> also discovered that I could arrive at the correct 2D z-transform from the
> C code if I accept that the range of the predictor is the *residual*, not
> the energy minus the residual, as I expected. This seems a little strange
> to me, and is the last missing link in the puzzle. Any
> tips/help/explanations would be appreciated to clear this up!
>
> Once again, notes in pdf/lyx format are attached, as well as on my dropbox:
> pdf:
> https://www.dropbox.com/s/6a3hw33w1zul7w3/predictor-confusion-3.pdf?dl=0
> lyx:
> https://www.dropbox.com/s/e57duheaiyb7qy4/predictor-confusion-3.lyx?dl=0
>
> and plaintext below.
>
> Thanks again,
> Jake
>
> ---
>
> Opus Coarse Energy Predictor Notes (Round 3)
>
> Jake Taylor (yupferris at gmail.com), 29 June 2021
>
> I'm having trouble reconciling the coarse energy predictor
> implementation in the libopus source code and the 2D z-transform
> description in the paper[0]. At this point, I've gotten quite close,
but
> am still having
> trouble with the last bits. And help/guidance would be
> appreciated!
>
> I've simplified the source code (in unquant_coarse_energy in
> quant_bands.c in libopus 1.3.1[1]) to the following C (pseudo)code:
>
> void unquant_coarse_energy(float *e, int bands) {
> float alpha = /* ... */;
> float beta = /* ... */;
> float prev = 0.0f;
> for (int b = 0; b < bands; b++) {
> float r = /* read from bitstream */;
> float q = alpha * e[i] + prev;
> e[i] = q + r;
> prev = prev + (1 - beta) * r;
> }
> }
>
> According to the paper, the 2D z-transform should be:
>
> A(z_{\ell},z_{b})=(1-\alpha z_{\ell}^{-1})\cdot\frac{1-z_{b}^{-1}}{1-\beta
> z_{b}^{-1}}
>
> First, to state what I think is obvious: the domain of this
> filter should be a 2D ?energy plane? with the \ell-dimension
> representing frames and the b-dimension representing bands, and
> the range should be the prediction (actual band energy minus r[\ell,b]
> , the residual). As a predictor, the filter must be causal.
> Finally, according to the code above, the energy is always 0 for b<0
> (b\geq bands, \ell<0, and \ell\geq frames are neither specified
> nor useful).
>
> As outlined in the CELT blog post[2], this z-transform describes two
> (separable) cascaded
> predictors:
>
> A(z_{\ell},z_{b})=P(z_{\ell})Q(z_{b})
>
> The first is the \ell-dimension predictor whose domain is indeed
> the 2D energy plane (albeit a single ?row? of it corresponding to
> a particular band in isolation over multiple frames):
>
> P(z_{\ell})=1-\alpha z_{\ell}^{-1}
>
> If we write the range explicitly:
>
> P(z_{\ell})=(1-\alpha z_{\ell}^{-1})E(z_{\ell})
>
> This corresponds to the following difference equation:
>
> p[\ell]=e[\ell]-\alpha e[\ell-1]
>
> The range of this predictor is not the final prediction, but an
> intermediate p. Note that this equation includes the current
> band's energy e[\ell], which is somewhat surprising for a
> predictor, as e[\ell] will not be known until it can be derived
> after r is decoded in the decoder.
>
> For each \ell (?columns? of the energy plane), successive
> elements of p are then fed into the b-dimension predictor, whose
> range is the final prediction:
>
> Q(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}
>
> The equivalent difference equation is:
>
> q[b]=p[b]-p[b-1]+\beta q[b-1]
>
> What remains now is to match these cascaded predictors with the C
> code, which is not trivial because it's not immediately apparent
> that the above derived difference equations are present in the
> code. There are likely several ways to do this; two high-level
> approaches stand out to me: either ?lower? the expected z
> ?transforms to difference equations and refactor until we get
> matching code, or ?lift? the code to z-transforms and refactor
> until those match.
>
> I find it easier to manipulate terms in the z-domain, so let's
> try the latter approach. We'll first describe prev in terms of
> the known signals. Looking at the C code, we know the following:
>
> q[\ell,b]=\alpha e[\ell-1,b]+prev[\ell,b]
>
> Since q is the output of the predictor, we can rewrite it in
> terms of e and r:
>
> q[\ell,b]=e[\ell,b]-r[\ell,b]
>
> This is also given by the C code. Substituting this in the
> previous equation yields:
>
> prev[\ell,b]=e[\ell,b]-\alpha e[\ell-1,b]-r[\ell,b]
>
> Or, equivalently:
>
> prev[\ell,b]=p[\ell,b]-r[\ell,b]
>
> Trivially, the corresponding z-transform is:
>
> Prev(z_{\ell},z_{b})=P(z_{\ell},z_{b})-R(z_{\ell},z_{b})
>
> So, prev apparently represents the difference between the output
> of the \ell-dimension predictor p and the residual r.
> Interestingly, q, the expected range of the b-dimension
> predictor, has completely disappeared! This is odd, as without
> this term, it's going to be difficult to reach a z-transform with
> the correct range. However, ignoring this for now, we can still
> make some kind of progress.
>
> The difference equation governing prev is the following (again,
> from the C code):
>
> prev[\ell,b+1]=prev[\ell,b]+(1-\beta)r[\ell,b]
>
> Note that the output of the difference equation is indexed with b+1
> because it represents the value of prev after it has been
> updated for this loop iteration. This is actually an important
> distinction, because the code that modifies prev uses the r[\ell,b]
> , i.e. the current residual, and these two signals should have
> the correct phase with respect to one another. We can also
> express this as:
>
> prev[\ell,b]=prev[\ell,b-1]+(1-\beta)r[\ell,b-1]
>
> These two definitions are, of course, equivalent. The z-transform
> of this difference equation is:
>
>
>
Prev(z_{\ell},z_{b})=z_{b}^{-1}Prev(z_{\ell},z_{b})+(1-\beta)z_{b}^{-1}R(z_{\ell},z_{b})
>
> The final step involves substituting occurrences of prev with
> equivalent expressions in terms of P and R:
>
>
>
P(z_{\ell},z_{b})-R(z_{\ell},z_{b})=z_{b}^{-1}(P(z_{\ell},z_{b})-R(z_{\ell},z_{b}))+(1-\beta)z_{b}^{-1}R(z_{\ell},z_{b})
>
> This can be simplified to:
>
> \frac{R(z_{\ell},z_{b})}{P(z_{\ell},z_{b})}=\frac{1-z_{b}^{-1}}{1-\beta
> z_{b}^{-1}}
>
> While this filter has almost the expected definition with the
> correct domain p, its range is r, not q! This is somewhat
> unsurprising, as the q terms cancelled out of our equations
> above. However, I find it really strange that the output of the
> predictor is actually the residual, and not the prediction
> itself. One would think that, because the residual is not yet
> known to the decoder until dequantization, the predictor would
> represent the resulting energy minus that residual. I suppose I
> could be happy accepting that, but it would still be nice to get
> a few more pairs of eyes on this to make sure I haven't made any
> mistakes, and perhaps to provide additional insight on why the
> predictor is factored this way.
>
> [0] https://arxiv.org/abs/1602.0484
> [1] https://opus-codec.org/release/stable/2019/04/12/libopus-1_3_1.html
> [2] https://jmvalin.dreamwidth.org/12000.html
>
> On Sun, Jun 27, 2021 at 11:29 PM Jake Taylor <yupferris at gmail.com>
wrote:
>
>> Hi all,
>>
>> I'm having trouble reconciling the coarse energy predictor's
z-transform
>> in the paper[0]/RFC and the corresponding code in libopus 1.3.1[1].
I'm
>> pretty new to DSP theory and dealing with z-transforms, but I'm
interested
>> in learning (as well as compression), so I thought I'd study this
filter.
>> But I just can't seem to get it to match my understanding of the
code; it's
>> likely I've made a few mistakes, and any help/guidance would be
greatly
>> appreciated!
>>
>> Note that this is a bit difficult to describe without proper
typesetting,
>> so I've prepared some pdf notes (as well as lyx source) and
attached them
>> to this email, as well as a pdf render. In case that doesn't reach
you,
>> they're also available on my dropbox:
>> pdf:
>>
https://www.dropbox.com/s/d3erbl9oc4r4wu7/predictor-confusion-2.pdf?dl=0
>> lyx:
>>
https://www.dropbox.com/s/9lxjliqfexe9vz0/predictor-confusion-2.lyx?dl=0
>>
>> Finally, if THAT doesn't work, plaintext-with-tex-mixed-in version
>> follows.
>>
>> Thanks for your time/help,
>> Jake
>>
>> ---
>>
>> I'm having trouble reconciling the coarse energy predictor
>> implementation in the libopus source code and the 2D z-transform
>> description in the paper[0].
>>
>> I've simplified the source code (in unquant_coarse_energy in
>> quant_bands.c in libopus 1.3.1[1]) to the following C-like pseudocode:
>>
>> void unquant_coarse_energy(float *e, int bands) {
>> float alpha = /* ... */;
>> float beta = /* ... */;
>> float p = 0.0f;
>> for (int b = 0; b < bands; b++) {
>> float q = /* read from bitstream */;
>> e[i] = alpha * e[i] + p + q;
>> p = p + q - beta * q;
>> }
>> }
>>
>> According to the paper, the 2D z-transform should be:
>>
>> A(z_{\ell},z_{b})=(1-\alpha
>> z_{\ell}^{-1})\cdot\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}
>>
>> First off, to state what I think is obvious: the domain of this
>> filter should be a 2D ?energy plane? with the \ell-dimension
>> representing frames and the b-dimension representing bands, and
>> the range should be the prediction (actual band energy - q[\ell,b]
>> , the residual). As a predictor, the filter must be causal.
>> Finally, according to the code above, the energy is always 0 for b<0
>> (\ell<0, b\geq bands, and \ell\geq frames are not specified nor
>> useful).
>>
>> Assuming this filter is separable, we first have the \ell
>> -dimension predictor:
>>
>> A(z_{\ell})=1-\alpha z_{\ell}^{-1}
>>
>> At first, I thought this was clearly embodied by alpha * e[i]
>> above. However, the z-transform implies that it should actually
>> be (1 - alpha) * e[i], so already we seem to be missing another e[i]
>> term somewhere (not to mention alpha having the wrong sign).
>>
>> The b-dimension predictor seems even more problematic:
>>
>> A(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}
>>
>> This matches what's listed in the CELT blog post[2], and is
equivalent to:
>>
>> Y(z_{b})=\frac{1-z_{b}^{-1}}{1-\beta z_{b}^{-1}}X(z_{b})
>>
>> The equivalent difference equation is:
>>
>> y[b]=x[b]-x[b-1]+\beta y[b-1]
>>
>> And substituting names from the C code, we should get something
>> like:
>>
>> prev[b]=q[b]-q[b-1]+\beta prev[b-1]
>>
>> Now, it should be mentioned that I actually asked about this
>> recently in the DSP stack exchange[3] (after first emailing Jean-Marc
>> Valin directly, but I seem to
>> have scared him off with another wall of text similar to this
>> one), and a helpful user there was able to clarify many things.
>> We actually arrived at the same difference equation in the end,
>> even though we got there a bit of a different way (one which
>> actually included both dimensions from the original 2D z
>> -transform), which suggests that my analysis above is correct.
>>
>> However, we still didn't figure out the last bit: reconciling it
>> with the C code; it appears to differ. If I forget about the
>> above and just read the C code, we should get:
>>
>> prev[b]=prev[b-1]+q[b]-\beta q[b]
>>
>> The equivalent z-transform for this difference equation would be:
>>
>> A(z_{b})=\frac{1-\beta}{1-z_{b}^{-1}}
>>
>> This suggests that the actual predictor description might instead
>> be:
>>
>> A(z_{\ell},z_{b})=(1-\alpha
>> z_{\ell}^{-1})\cdot\frac{1-\beta}{1-z_{b}^{-1}}
>>
>> However, that still ignores the apparently-missing e[i] term from
>> the \ell-dimension.
>>
>> So, what am I missing? One thing that I glossed over above that
>> the first predictor dimenson (\ell) appears to be applied to the
>> band energy directly (as expected), whereas the second predictor
>> dimension (b) appears to be applied to the residual q. Since q
>> can be expressed in terms of the energy and the predictor, I
>> tried several different interpretations and substitutions in
>> various domains in order to describe a predictor in with the 2D ?
>> energy plane? as the domain and the prediction as the range, and
>> got some crazy z-transforms that don't look correct; here's a
few
>> just for the curious:A(z_{b},z_{\ell})=\frac{1-\beta+\alpha
>> z_{\ell}^{-1}(1-z_{b}^{-1})}{\beta-z_{b}^{-1}}
>>
>> A(z_{b},z_{\ell})=\frac{1+\beta z_{b}^{-1}-\alpha
>> z_{\ell}^{-1}(1-z_{b}^{-1})}{(1+\beta)z_{b}^{-1}}
>>
>> So, at this point I'm kindof running in circles, and I think I
>> may have done something wrong; at least I'd like to think
that's
>> a lot more likely than the paper/RFC/libopus code were out of
>> sync somehow!
>>
>> [0]: https://arxiv.org/abs/1602.04845
>> [1]:
https://opus-codec.org/release/stable/2019/04/12/libopus-1_3_1.html
>> [2]: https://jmvalin.dreamwidth.org/12000.html
>> [3]:
>>
https://dsp.stackexchange.com/questions/75972/having-trouble-interpreting-z-transform-description-of-a-predictor-from-a-codec
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20210701/235c71a6/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: predictor-confusion-4.lyx
Type: application/octet-stream
Size: 8510 bytes
Desc: not available
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20210701/235c71a6/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: predictor-confusion-4.pdf
Type: application/pdf
Size: 76873 bytes
Desc: not available
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20210701/235c71a6/attachment-0001.pdf>