Gabriel TEIXEIRA
2010-Dec-10 15:51 UTC
[theora-dev] Bitstream encoded huffman tables always the same
Hello all, I've been working a little inside the Theora decoder when I found that it seems that many videos had the very same huffman tables encoded into their bitstreams (at least the ones that I could take my time to dissecate). I found that the tables are listed as TH_VP31_HUFF_CODES in the file huffenc.c. I tried to investigate a little bit more to see who was setting the bitstream to those tables, but I ended in the fact that this is dependent whether the function th_encode_ctl is called with TH_ENCCTL_SET_HUFFMAN_CODES or TH_ENCCTL_SET_VP3_COMPATIBLE, but I could figure myself who is calling it using what parameter. Is that true that libtheora will always set the huffman codes to the same ones? Isn't this approach a little bit inefficient since the distribution of the probabilities of the symbols is not always the ones in the 80 tables (although the tables may be very good, there's always some room for increased precision), and besides we spend around 1-2kb to stock them, instead of having them preencoded in the decoder (of course, this would break the compatibility)? Thanks in advance Gabriel TEIXEIRA
Timothy B. Terriberry
2010-Dec-10 16:05 UTC
[theora-dev] Bitstream encoded huffman tables always the same
> figure myself who is calling it using what parameter. Is that true that > libtheora will always set the huffman codes to the same ones? Isn't thisUnless the user manually overrides them, yes. There is a rehuff example in http://svn.xiph.org/trunk/theora-exp/examples/ which can be used to compute optimized tables for a specific video file, after encoding. It typically saves between 1-2% in file size. We have long planned to update the default tables as part of the 1.2 release, but initial experiments (back in 2007 with the old 1.0 encoder) found that using insufficient training data (in that case 4 400-600 frame 1080p sequences) produces tables that are significantly _worse_, on average, when used on files that were not in the training set. Greg Maxwell thinks he can do better now, and we have more training data than we did then, but improvements have yet to be demonstrated.
Gregory Maxwell
2010-Dec-10 16:05 UTC
[theora-dev] Bitstream encoded huffman tables always the same
On Fri, Dec 10, 2010 at 10:51 AM, Gabriel TEIXEIRA <gabriel_teixeira at sdesigns.eu> wrote:> Hello all, > > I've been working a little inside the Theora decoder when I found that > it seems that many videos had the very same huffman tables encoded into > their bitstreams (at least the ones that I could take my time to > dissecate). I found that the tables are listed as TH_VP31_HUFF_CODES in > the file huffenc.c. I tried to investigate a little bit more to see who > was setting the bitstream to those tables, but I ended in the fact that > this is dependent whether the function th_encode_ctl is called with > TH_ENCCTL_SET_HUFFMAN_CODES or TH_ENCCTL_SET_VP3_COMPATIBLE, but I could > figure myself who is calling it using what parameter. Is that true that > libtheora will always set the huffman codes to the same ones? Isn't this > approach a little bit inefficient since the distribution of the > probabilities of the symbols is not always the ones in the 80 tables > (although the tables may be very good, there's always some room for > increased precision), and besides we spend around 1-2kb to stock them, > instead of having them preencoded in the decoder (of course, this would > break the compatibility)?When the encoder begins it has no idea what they'll be? so it must use stock ones of some kind. There is a tool in the theora-exp branch at http://svn.xiph.org/trunk/theora-exp/examples/rehuff.c which will losslessly optimize the huffman tables. The tables it produces aren't optimal? the frame clustering for table assignment is non-trivial (the tool also should be updated? I found the specific approach it uses to be a bit pessimal) though it almost always makes files somewhat smaller.
Reasonably Related Threads
- theora encoder reordering, order of puting data from DCT 8x8 blocks to huffman compressor, and puting result of huffman compressor to buffer bitstream memory
- binary stream after tokenizer and huffman
- [Fwd: Re: Question about Huffman Tables in Setup Header]
- Huffman decompression
- VQ and Huffman codebooks creation