On Tue, Feb 17, 2009 at 10:51 PM, Jean-Marc Valin
<jean-marc.valin at usherbrooke.ca> wrote:> Hi everyone,
>
> Version 0.5.2 has just been released with a few fixes over 0.5.1. On top
> of that, the pitch prediction was both improved and simplified. The
> other main change is a new bit allocation algorithm with better rounding
> and fine energy allocation.
Lacking JM's gift for brevity, I thought I'd follow up his
announcement with some additional information.
As a reminder: 0.5.2 changes the bit-stream, so 0.5.2 is not
compatible with previous versions. This is the norm for CELT right now
and you can expect bit-stream changes with almost every release until
we freeze it.
The most important improvement in 0.5.2 is the quality at small frame
sizes. In the extreme case, at 64 sample frames, you may be able to
reduce your bitrate by as much as 20kbit/sec. Small frame sizes are
required to get the lowest possible delays, though most applications
will not use frames this small because of the considerable overhead.
For a while I've been conducting an aggressive testing program against
every CELT release prior to public announcement. The primary purpose
is to find and eliminate crash-bugs and cases where the CELT allocator
attempts to use more bits than are available, but I also do automated
quality screening to watch out for quality regressions? particularly
at frame sizes and bitrates that we wouldn't otherwise be checking.
I've decided to post the results from these tests publicly, since I'm
already running them:
http://www.celt-codec.org/testing/test.0.5.2.shtml
The test results include an illustration which can better help you
understand the trade-off between frame-size (delay), quality, and
bitrate that CELT currently provides.
The test results pages will grow over time, as I automate more of the
tests that I run. If anyone has any tests which are important to your
applications which can be automated, I'd be glad to include them in my
regular process.
Speaking of quality? I've noticed that a number of application
developers are making interesting frame size selections in order to
obtain 10ms frames, such as 320, and 480 samples. I'd like to
generally advise against that practice and encourage developers to
stick to "power-of-two" sizes such as 256 samples per frame.
There are a couple of reasons for this recommendation:
First, the available audio hardware on PCs today does DMA in
power-of-two chunks. If you use a size which is not an integer
multiple (i.e. 1x, 2x) of the underlying hardware transfer size you're
going to take a whole frame of additional unnecessary latency. CELT
works hard to keep delay down, but some care is required in
applications to minimize latency. This is one reason that
jack-audio-connection-kit only supports power-of-two processing
periods. Even if you don't care about getting the utmost minimal
latency in your own application, you may want to someday interoperate
with another application with different design goals which has chosen
to restrict itself to power of two sizes.
The second reason for the recommendation is that (primarily because of
the above) the power of two sizes receive the greatest attention from
CELT developers. CELT tuning is primarily performed against 256 sample
frames. Some of the testing cycles I conduct are only executed on a
small set of frame sizes.
Finally, the power-of-two sizes have the best performance. For various
and sundry reasons some sizes (especially ones whos
prime-factorizations include large values) run *much slower* than
power-of-two sizes. Some sizes force internal compromises inside CELT.
In general, if you can not use power-of-two sizes you should try to
use sizes which are divisible by 8 (This is one of the reasons for the
frame size to adjacent frame size quality variations in the testing
graph). Fortunately, 320 and 480 are rich in 2s in their factorization
and work reasonably well, and were it not for the first reason I would
probably also recommend them.
If you have a real requirement to use particular frame sizes, then by
all means use the size that works for you. CELT was designed to
include a high degree of frame size flexibility because some
applications must use particular sizes or suffer additional delay. If
we saw a way to offer offer odd numbered sizes without excessive
compromise we would support them as well. But be aware that all sizes
are not created equal. Near-prime sizes have, and will likely continue
to have, lower quality and worse performance. If you are writing
software for a typical PC, the magical sizes that should work best
with your hardware are powers of two.
This information will be included in a future developer's manual, but
I thought it would be helpful for early adopters to get a message out
now.
Happy hacking,
Greg