Bram de Jong
2006-Dec-05 13:52 UTC
[Vorbis-dev] intro + specification remarks + some questions
hi everyone, since about 10 days I've been working on an implementation of ogg vorbis *decoder* from scratch, and I've got a few constructive remarks about the specs and some questions. -------------------------- First a question: > 9.2.4. low_neighbor > > "low_neighbor(v,x)" finds the position n in vector [v] of the greatest > value scalar element for which n is less than [x] and vector [v] > element n is less than vector [v] element [x]. I'm thoroughly confused by this definition... Does this mean: A) for (n=x-1; n>=0; n--) if (v[n]<v[x]) return n; B) low_neighbor(v, x) { max_pos = 0; for (n=0...x-1) if (v[n] < v[x]) if (v[n] >= v[max_pos]) max_pos = n; return max_pos; } In case of A), perhaps the specs should be rewritten for clarity, and some pseudocode could be added... In case of B), doesn't this mean that, depending on the loop and last if( ) one can get different results as in: http://bram.smartelectronix.com/tmp/low_neighbor.png ( line = vector values, red cross = x, green dots = possible values of n ) -------------------------- Secondly, some remarks on the clarity of the specifications. * some additional pseudocode overview of the whole process would be highly appreciated by newbies like me, even something as simple as: - read one packet and decode as header information - read one packet and decode as comments - read one packet and decode as setup: - decode codebooks - decode floors - decode residues - decode mappings - decode modes - read (packets) as audio packet: - header - 'packet decode' floor curves - 'packet decode' residues - synthesise I.e. a bit more highlevel structure to the docs would be cool and provide a lot of clarity. The lowlevel pseudocode is great and highly readable, but I did notice that somewhere halfway (after decoding the headers) the style changed quite a bit leading me to suspect two authors at work. * The second part (packet decode / synthesis) is the probably the hardest to read/understand, and some of the pseudocode is on the edge of difficult to grasp. * 7.2.2. header decode / 7.2.2.1. packet decode it is quite confusing for a newbie that 7.2.2.1 is a subsection of 7.2.2 as it is *not* the logical "next step" when decoding the header. Packet decode should really have a line saying something like: "when decoding an audio packet, the floor is decoded as follows". This goes for both floors and residues. * it would be interesting to note the size of vectors in the pseudocode. Some parts of the pseudocode become unusable without this information: a particular example from "8.6.2. packet decode": "allocate and zero all vectors that will be returned." This is a problem throughout the specifications. I hope you guys like constructive criticism! kindest regards and thanks for any help offered, - bram de jong -- http://www.splicemusic.com http://freesound.iua.upf.edu http://www.smartelectronix.com http://www.musicdsp.org
Bram de Jong
2006-Dec-05 13:53 UTC
[Vorbis-dev] intro + specification remarks + some questions
hi everyone, since about 10 days I've been working on an implementation of ogg vorbis *decoder* from scratch, and I've got a few constructive remarks about the specs and some questions. -------------------------- First a question: > 9.2.4. low_neighbor > > "low_neighbor(v,x)" finds the position n in vector [v] of the greatest > value scalar element for which n is less than [x] and vector [v] > element n is less than vector [v] element [x]. I'm thoroughly confused by this definition... Does this mean: A) for (n=x-1; n>=0; n--) if (v[n]<v[x]) return n; B) low_neighbor(v, x) { max_pos = 0; for (n=0...x-1) if (v[n] < v[x]) if (v[n] >= v[max_pos]) max_pos = n; return max_pos; } In case of A), perhaps the specs should be rewritten for clarity, and some pseudocode could be added... In case of B), doesn't this mean that, depending on the loop and last if( ) one can get different results as in: http://bram.smartelectronix.com/tmp/low_neighbor.png ( line = vector values, red cross = x, green dots = possible values of n ) -------------------------- Secondly, some remarks on the clarity of the specifications. * some additional pseudocode overview of the whole process would be highly appreciated by newbies like me, even something as simple as: - read one packet and decode as header information - read one packet and decode as comments - read one packet and decode as setup: - decode codebooks - decode floors - decode residues - decode mappings - decode modes - read (packets) as audio packet: - header - 'packet decode' floor curves - 'packet decode' residues - synthesise I.e. a bit more highlevel structure to the docs would be cool and provide a lot of clarity. The lowlevel pseudocode is great and highly readable, but I did notice that somewhere halfway (after decoding the headers) the style changed quite a bit leading me to suspect two authors at work. * The second part (packet decode / synthesis) is the probably the hardest to read/understand, and some of the pseudocode is on the edge of difficult to grasp. * 7.2.2. header decode / 7.2.2.1. packet decode it is quite confusing for a newbie that 7.2.2.1 is a subsection of 7.2.2 as it is *not* the logical "next step" when decoding the header. Packet decode should really have a line saying something like: "when decoding an audio packet, the floor is decoded as follows". This goes for both floors and residues. * it would be interesting to note the size of vectors in the pseudocode. Some parts of the pseudocode become unusable without this information: a particular example from "8.6.2. packet decode": "allocate and zero all vectors that will be returned." This is a problem throughout the specifications. I hope you guys like constructive criticism! kindest regards and thanks for any help offered, - bram de jong -- http://www.splicemusic.com http://freesound.iua.upf.edu http://www.smartelectronix.com http://www.musicdsp.org
xiphmont@xiph.org
2006-Dec-05 20:15 UTC
[Vorbis-dev] intro + specification remarks + some questions
On 12/5/06, Bram de Jong <bram@splicemusic.com> wrote:> > "low_neighbor(v,x)" finds the position n in vector [v] of the greatest > > value scalar element for which n is less than [x] and vector [v] > > element n is less than vector [v] element [x].Yes, this wording is... tough... but so is the encoding optimization it's trying to express. Vector [v] is essentially a map and the values in vector [v] are integer positions; they're offsets into a another vector. Vector [v]'s values are these offsets listed in encoded order. It's a progressive 'tree' encoding where positions are listed in multiple passes of increasing resolution. Given a value from the mapping, the low neighbor is the closest value preceeding that in the encoding up to that point (there may be a closer value if the vector is scanned in its entirety, but we're only interested in values that have already been decoded). So, low_neigbor finds the position in [v] of the mapping value immediately preceeding vector [v] element [x], but only up to the current point of decoding, that is [x]. This is why the return value must be A) less than [x] (only up to the current decode point), B) less than vector [v] element [x] (must be a lower value) and the value at the returned position must be the largest of the values that otherwise satisfy A) and B) (that is, it not only preceeds vector [v] element [x] in the mapping, but is in fact the mapping value immediately preceeding it, if vector [v] was sorted)> I'm thoroughly confused by this definition... Does this mean: > > A) > for (n=x-1; n>=0; n--) if (v[n]<v[x]) return n;no.> B) > low_neighbor(v, x) > { > max_pos = 0; > > for (n=0...x-1) > if (v[n] < v[x]) > if (v[n] >= v[max_pos]) > max_pos = n; > > return max_pos; > }yes.> In case of B), doesn't this mean that, > depending on the loop and last if( ) one can get different results as in: > > http://bram.smartelectronix.com/tmp/low_neighbor.png > > ( line = vector values, red cross = x, green dots = possible values of n )low_neighbor is not well defined if each value in vector[v] is not unique, however a [v] with non-unique values is not allowed.> I.e. a bit more highlevel structure to the docs would be cool and > provide a lot of clarity. The lowlevel pseudocode is great and highly > readable, but I did notice that somewhere halfway (after decoding the > headers) the style changed quite a bit leading me to suspect two authors > at work.The original spec was written entirely by me. I do recall needing to shift style slightly to capture a concept that I couldn't encode int he original style, but I thought I'd gone back and updated the previous code to all reflect the 'newer' style. [I've not commented on several other points that are good suggestions] Monty