>1) Before compressing with Theora, I must unpack my YUY2 data into
>three separate arrays (Y, U, and V respectively), and record in a
>"yuv_buffer" structure.
The reference libtheora only supports 4:2:0 input (YUY2 is typically 4:2:2).
But yes, the input must be planar with a horizontal stride of 1.
>2) "yuv_buffer.y_stride" is this the number of bytes in one row of
Y
>data. In other words, y_stride*8/y_width = num bits per Y component
Only 8 bits per component is supported. y_stride resp. uv_stride allows for
buffers to be arranged in memory such that the offset of the start of the next
row (in bytes) from the current row is different from y_width resp. uv_width.
This is useful when, e.g., the rows are aligned in memory, the buffer refers
to a smaller portion of a larger buffer, etc. libtheora itself returns
pointers to buffers that are internally padded with 8 or 16 pixels on each
side, and IIRC the current version even returns negative stride values.
>3) "theora_encode_header" must be called exactly once, and its
output
>must be the first packet received by a Theora decoder
Correct. libogg should also perform a page flush to ensure that this is the
only packet on the page.
>4) "theora_encode_comment" need never be called, but can be called
at
>any time, and its packet can be inserted anwhere
Incorrect. It must be called exactly once, and it must be the second packet
received by the decoder. The comment header is not optional, there cannot be
more than one, and it cannot occur at any other position.
>5) "theroa_encode_tables" must be called exactly once, and its
output
>must be the first non-comment packet after the header
Correct. A page flush must also occur after this packet, so that the first
data packet begins on a new page.
>6) "theora_encode_YUVin" must be called once per frame, and its
output
>added to the stream in a sequential manner
Correct.
>7) Theora can tolerate any number of "dropped packets", so long as
the
>packet is not "header", "tables", or (perhaps) the first
"YUVin".
Yes and no. All three header (info, comment, setup) must be delivered. Except
at key frames, the contents of the previous frame (and the previous key frame)
are used to predict the contents of the next frame, so any dropped packets
will lead to artifacts in the output. The severity of the artifacts depends on
the amount of motion, etc., in the dropped frame. That said, the decoder will
still produce some kind of output for the packets that are provided, even if
others are omitted. Whether the picture in it is recognizable depends on the
content.