-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 So you're the encoder. Tou get two vectors in (for some band), L and R. One thing you could do with this is compute M = L+R and S = L-R. (Yes, I know, this is not how the encoder actually works. Bear with me.) Then let m = normalize(M) and s = normalize(S). You transmit, m, s, |L|, and |R|. The decoder needs to find unknown positive constants a and b to compute L = a*m + b*s R = a*m - b*s To find a and b, we use two constraints |L|^2 = |a*m + b*s|^2 |R|^2 = |a*m - b*s|^2 That proceeds as follows: |a*m + b*s|^2 = a^2*|m|^2 + b^2*|s|^2 + 2*a*b*dot(m,s) = a^2 + b^2 + 2*a*b*dot(m,s) = |L|^2 |a*m - b*s|^2 = a^2*|m|^2 + b^2*|s|^2 - 2*a*b*dot(m,s) = a^2 + b^2 - 2*a*b*dot(m,s) = |R|^2 We now compute the sum and difference: sum: 2*(a^2 + b^2) = |L|^2 + |R|^2 a^2 + b^2 = (|L|^2 + |R|^2)/2 difference: 4*a*b*dot(m,s) = |L|^2 - |R|^2 a*b = (|L|^2 - |R|^2)/(4*dot(m,s)) Combining these equations again in two ways: a^2 + b^2 + 2*a*b = (|L|^2 + |R|^2)/2 + (|L|^2 - |R|^2)/(2*dot(m,s)) a + b = sqrt((|L|^2 + |R|^2)/2 + (|L|^2 - |R|^2)/(2*dot(m,s))) a^2 + b^2 - 2*a*b = (|L|^2 + |R|^2)/2 - (|L|^2 - |R|^2)/(2*dot(m,s)) a - b = sqrt((|L|^2 + |R|^2)/2 - (|L|^2 - |R|^2)/(2*dot(m,s))) The remainder of the solution is left as an exercise. Anyway, the point is: in principle, _if_ M = L+R, then you don't need to transmit theta. The solution, ultimately, is equivalent to theta = (1/2)arcsin(((|L|-|R|)/(|L|+|R|))*(1/dot(m,s))) Apart from the interesting debate over whether to use M = L+R or M normalize(L) + normalize(R), there's one other obvious issue. This calculation relies on computing dot(m,s). Since m and s are coded with error, the calculation of theta will also have error. Some quantization error in theta doesn't seem intrinsically unreasonable, but if m and s have enough error, the above procedure can derive a contradiction. For PVQ with a small number of pulses, it seems likely that dot(m,s) could be zero, even though |L| != |R|. The decoder then finds itself in a very awkward situation. I believe this can be remedied with a bit of edge-case handling, though. - --Ben -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAknDIyUACgkQUJT6e6HFtqRPowCghKBuJcPotFdd1lPKoK0ngfSw rHUAn3NcTW/qP3ckJzwd5qoPG//gdBD6 =xyUt -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Currently, the encoder does l = normalize(L), r = normalize(R), M = l+r, m = normalize(M), etc. The trick will not work in this case. Here's why: dot(m,s) = c*dot(M,S) for some constant c. dot(M,S) = dot(l+r,l-r) = dot(l,l) + dot(r,l) - dot(l,r) - dot(r,r) = 1 + dot(r,l) - dot(r,l) - 1 = 0 so dot(m,s) is always zero. The trick described previously does not work when dot(m,s) = 0, so theta must be transferred explicitly. However, this orthogonality is still useful. If m is encoded in N dimensions, then from this orthogonality s can be encoded in N-1 dimensions. That means that transmitting theta no longer represents an extra degree of freedom. Maybe you're already doing this. I have no idea. - --Ben -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAknDMNEACgkQUJT6e6HFtqT24ACcDc7UwMfp0J3kaf0vdE0zpbjA g0IAoJoYrGCT/IbOtGdn9JUQNj4/JN2+ =CDtl -----END PGP SIGNATURE-----
Benjamin M. Schwartz wrote:> However, this orthogonality is still useful. If m is encoded in N > dimensions, then from this orthogonality s can be encoded in N-1 > dimensions. That means that transmitting theta no longer represents an > extra degree of freedom.This was one of the first things I proposed to Jean-Marc. Well, more accurately, I proposed to still encode s in N dimensions, but to use the orthogonality constraint to take the place of the spectral folding used to add a noise floor for mono. The reason is that it is very computationally expensive to determine a basis for s that is orthogonal to an arbitrary m (O(N^3)). We could have special cased N=3, but... This approach turns out to harm quality. The quantization in m is too large, especially for HF bands, for this constraint to actually be useful, and it was doing more harm than good.
On Fri, Mar 20, 2009 at 1:01 AM, Benjamin M. Schwartz <bmschwar at fas.harvard.edu> wrote: [snip]> Apart from the interesting debate over whether to use M = L+R or M > normalize(L) + normalize(R), there's one other obvious issue. ?ThisOne of the most important concepts in the CELT design is that the correct energy in each band must be preserved for perceptual reasons. It is fairly simple to demonstrate for yourself that relevant hearing machinery driving this operates independently in each ear: Generate two test signals, either two tones or a noise and a tone, such that one masks the other. Play them together and you can't hear the masked tone, send the two signals (via headphones) to separate ears and you can hear the previously masked tone. If the listener were using loud speakers rather than headphones this example wouldn't work due to cross-talk. As such, those kinds of low level psychoacoustic effects must be evaluated on an ear by ear basis since the listener may be using headphones. If you compute M = L+R; then signal energy(M); quantize(normalize(M)); S=? the resulting L and R that the decoder recovers will not likely have well preserved energy. Of course, extra data could be sent to make sure that energy was preserved in this case? but that the problems of sending extra data. (Nor does would sending the L+R energy in addition to the M+S energy give us a way to select the bitrate for M/S)
Reasonably Related Threads
- Specifying ui and ci such that ui %*% theta - ci >= 0
- How to extract the theta values from coxph frailty models
- Negatie Binomial Regression: "Warning while fitting theta: alternation limit reached"
- $theta of frailty in coxph
- Lack of 'LEFT JOIN' in Oracle 8, any patch for theta style (+)