Hi
I am building an opus decoder (java) and I am currently working on the
Silk decoding. I am working through the spec and using the reference
implementation to check things as I go along. I am up to the silk
reconstruction part (4.2.7.9) and I have a few questions. I'm assuming
there are 4 subframes in the 2nd part just to look at the widest
possible case.
1) The spec says
"Although the reference implementation only includes a fixed-point
version of the remaining steps, this section describes them in terms
of a floating-point version for simplicity."
What are the main differences in the calculations? I see the clamping
is done in the range of +- 16 bit numbers. Is is it just a 32767
factor difference?
2) In the "4.2.7.9.1. LTP Synthesis" section you have to loop through
all the sub frames. J is the index of the first sample in the current
sub-frame. The first voiced res[i] calculation loops through the range
j - pitch_lags[s] - 2) <= i < out_end
For the first sub-frame j = 0 (its basically s*n and s = 0) so i
starts off negative. Couple of questions here.
What is the first / second res[i] calc actually doing? How do you
interpret the negative index?
Out_end is either (j - (s-2)*n) or (j - s*n), if the sub-frames are
interpolated frames why are frames 3 and 4 ending at the same index as
1 and 2? For s = 1 and 3 (2nd and forth sub frame) isn't there a
danger of having a funny range if pitch_lags[s] - 2 < s*n?
I have found this section of the RFC confusing to understand.
Thanks
Peter