thr3ads.net - opus - [CELT-dev] Opus for audiobooks etc [Nov 2011]

If this information is useful, please help other people find it:
Share via:

Daniel Jensen

2011-Nov-17 19:41 UTC

[CELT-dev] Opus for audiobooks etc

I know the focus for Opus is low delay, but I've been watching its 
development with interest because of the potential for audiobook/podcast 
use, where latency is practically irrelevant. I hear the upcoming USAC 
codec will give good results for this niche (though listening test 
results don't seem to be available to the public yet), but I also hear 
it'll be extremely patent encumbered. If Opus can do anywhere near as 
well, I think a lot of folks would be interested in using it for 
audiobooks and avoiding the patent jungle.

The only comment I've seen about use of Opus for audiobooks was jmvalin 
saying in response to someone on his blog that Opus's ability to do 
fullband would be a key advantage here. This seems kind of 
counterintuitive to me- can people even ABX human speech at a 32 or even 
24kHz sample rate from speech at 48kHz, much less hear a large quality 
difference? A number of audiobooks I've listened to have used 22kHz mp3s 
without being clearly objectionable, and in my personal use I've had 
decent results using the -voice LAME setting (downsamples to 32kHz and 
encodes as 56kbps abr).

The recent hydrogenaudio tests showed Opus CELT modes trumping the best 
of breed high-latency codecs at 64kbps despite having only 22.5 ms 
latency, and the SILK modes do a great job at the opposite of the 
bitrate spectrum and can make use of larger frame sizes for those of use 
who don't care about latency. Inbetween the two, the hybrid mode appears 
to do better than other codecs with similar latency- but Christian 
Hoene's results showed it losing pretty convincingly to AMR-WB+ (which 
was able to use 4x larger frame sizes) at 32kbps. (How much of this was 
due to the test being stereo, I wonder? Some mono tests seem to have 
given 32kbps Opus rather high marks.)

For audiobook use, I don't know that the SILK modes or anything else 
with that low of a bitrate will be good enough, and when you're storing 
hundreds of hours of speech 64kbps adds up fast. I'd guess the sweet 
spot for audiobooks would be between 20 and 32 kbps, and this seems to 
my unschooled understanding to be a region where Opus's low delay might 
put it at a serious disadvantage.

Other than just being curious in general about what folks have to say 
about audiobook use, I'm curious about one thing in particular-- how 
feasible would it be to use larger frame sizes (e.g. matching SILK 
mode's 60ms maximum) for Opus, especially for the hybrid mode, and what 
would the potential for improved quality be?

Benjamin M. Schwartz

2011-Nov-17 20:18 UTC

head link

[CELT-dev] Opus for audiobooks etc

On 11/17/2011 02:41 PM, Daniel Jensen wrote:> can people even ABX human speech at a 32 or even 
> 24kHz sample rate from speech at 48kHz, much less hear a large quality 
> difference?
Yes (but certainly not everyone can hear the difference).

Perhaps more importantly, with Opus you don't have to worry about audio
bandwidth (i.e. samplerate; 48k vs. 22050 vs ...).  Just throw in a
fullband input and set your bitrate.  If the best quality is achieved by
downsampling, the opus encoder will do that internally.
> The recent hydrogenaudio tests showed Opus CELT modes trumping the best 
> of breed high-latency codecs at 64kbps despite having only 22.5 ms 
> latency, and the SILK modes do a great job at the opposite of the 
> bitrate spectrum and can make use of larger frame sizes for those of use 
> who don't care about latency.
Yes, although the larger SILK frames are basically just 2 or 3 20ms frames
stuck together in a way that reduces packing overhead.

Inbetween the two, the hybrid mode appears> to do better than other codecs with similar latency- but Christian 
> Hoene's results showed it losing pretty convincingly to AMR-WB+ (which 
> was able to use 4x larger frame sizes) at 32kbps. (How much of this was 
> due to the test being stereo, I wonder? Some mono tests seem to have 
> given 32kbps Opus rather high marks.)
That test was deliberately using very weird stereo, like two different
speakers saying different things in both ears.  There have also been some
improvements in the stereo encoding since then.  I wouldn't worry too much
about those results.
> For audiobook use, I don't know that the SILK modes or anything else 
> with that low of a bitrate will be good enough, and when you're storing
> hundreds of hours of speech 64kbps adds up fast. I'd guess the sweet 
> spot for audiobooks would be between 20 and 32 kbps, and this seems to 
> my unschooled understanding to be a region where Opus's low delay might
> put it at a serious disadvantage.
Here's Opus at 20 kbps beating AMR-WB, and at 32 kbps getting close to
transparent (at 16 kHz samplerate, you may note):

http://www.octasic.com/en/tech/opus_audio_codec.php#Google
> Other than just being curious in general about what folks have to say 
> about audiobook use, I'm curious about one thing in particular-- how 
> feasible would it be to use larger frame sizes (e.g. matching SILK 
> mode's 60ms maximum) for Opus, especially for the hybrid mode, and what
> would the potential for improved quality be?
Opus will be fantastic for audiobooks.

Frame size is a bit tricky in Opus.  The short version is "don't worry
about it".  In Hybrid and CELT modes, the maximum frame size is 20ms.

A slightly longer version is that 20ms frames can be combined into
"packets" up to 120ms long.  This can save about 1 byte per frame, or
about 0.4 kbps, compared to the configuration we've been testing so far
(20ms frames in 20ms packets).

This is less than 2% bitrate savings in your "sweet spot", so we
haven't
been worrying about it.  The real reason for this feature is that some
transports (like RTP) have large per-packet costs, so then reducing the
number of packets can be valuable.

--Ben

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: OpenPGP digital signature
Url :
http://lists.xiph.org/pipermail/opus/attachments/20111117/89567506/attachment-0002.pgp

Gregory Maxwell

2011-Nov-17 20:42 UTC

head link

[CELT-dev] Opus for audiobooks etc

On Thu, Nov 17, 2011 at 2:41 PM, Daniel Jensen <jensend at iname.com>
wrote:> The only comment I've seen about use of Opus for audiobooks was jmvalin
> saying in response to someone on his blog that Opus's ability to do
> fullband would be a key advantage here. This seems kind of
> counterintuitive to me- can people even ABX human speech at a 32 or even
> 24kHz sample rate from speech at 48kHz, much less hear a large quality
> difference? A number of audiobooks I've listened to have used 22kHz
mp3s
> without being clearly objectionable, and in my personal use I've had
> decent results using the -voice LAME setting (downsamples to 32kHz and
> encodes as 56kbps abr).
22kHz speech isn't "objectionable", but it's trivially
ABXable, at least
if the speech was recorded with full bandpass.  32KHz vs 48KHz may
not be ABX-able for speech (or even for music for many adults!), but you
get the extra extension for free in opus.

The low bandpass can be objectionable for the music parts in mixed content.

Keep in mind that communication codecs are usually do a wideband at 16KHz,
which sounds clearly and obviously worse for speech. (Although not
objectionable)

Vs MP3 opus is just a lot more efficient.

[snip]> Hoene's results showed it losing pretty convincingly to AMR-WB+ (which
> was able to use 4x larger frame sizes) at 32kbps. (How much of this was
> due to the test being stereo, I wonder? Some mono tests seem to have
> given 32kbps Opus rather high marks.)
IIRC he was testing some rather torturous samples with different speakers
running concurrently in different ears? and at rates lower than we'd
recommend for general stereo. I believe the goal was mostly to make sure
the codec didn't blow up or perform too terribly.

(You can do things like pan-potted mono down to lower rates in opus, but
full stereo needs some more bitrate).

The encoder is now more aggressive at flattening the audio to to mono
at very low rates.
> For audiobook use, I don't know that the SILK modes or anything else
> with that low of a bitrate will be good enough, and when you're storing
> hundreds of hours of speech 64kbps adds up fast. I'd guess the sweet
> spot for audiobooks would be between 20 and 32 kbps, and this seems to
> my unschooled understanding to be a region where Opus's low delay might
> put it at a serious disadvantage.
Well... Disadvantage compared to what?

If you're able to get licenses for USAC under $2 per decoder I'll be
surprised. At those rates it may well turn out to work better.
Hundreds of milliseconds of delay can be helpful. :) Considering the
licensing and the wider use cases (VoIP as well as high delay stuff) I
hope and expect Opus to be much more widely deployed.

If your comparison points are Vorbis, MP3, Speex (or other
pure-communication codec), or AAC it should be no contest.
> Other than just being curious in general about what folks have to say
> about audiobook use, I'm curious about one thing in particular-- how
> feasible would it be to use larger frame sizes (e.g. matching SILK
> mode's 60ms maximum) for Opus, especially for the hybrid mode, and what
> would the potential for improved quality be?
Audiobook use was a consideration for us and it was one of the drivers behind
the codec's ability to do seamless mode switching.

Our higher latency modes (>>20ms) are mostly about reducing IP/UDP/RTP
overhead, an issue you won't have for Opus in Ogg.

For your application the improvements for encoder VBR and automatic speech
detection that we're currently working on will probably be relevant. This
use case would probably also benefit from additional look-ahead in the encoder,
(and potentially two-pass rate control)

Geoff Shang

2011-Nov-17 22:49 UTC

head link

[CELT-dev] Opus for audiobooks etc

Hi,

As a blind person, this subject interests me greatly.

On Thu, 17 Nov 2011, Daniel Jensen wrote:
> The only comment I've seen about use of Opus for audiobooks was jmvalin
> saying in response to someone on his blog that Opus's ability to do
> fullband would be a key advantage here. This seems kind of
> counterintuitive to me- can people even ABX human speech at a 32 or even
> 24kHz sample rate from speech at 48kHz, much less hear a large quality
> difference?
32 kHz may be hard to notice.  24kHz I'd think would not be, and 22kHz is 
definitely noticeable.   It's not to say that this isn't acceptable to 
many, but it's easy enough to hear and I personally would prefer something 
higher, particularly if I'm going to pay for it.
> A number of audiobooks I've listened to have used 22kHz mp3s
> without being clearly objectionable, and in my personal use I've had
> decent results using the -voice LAME setting (downsamples to 32kHz and
> encodes as 56kbps abr).
hmm.   I just tried reencoding some speech material I have here at 64kbps 
MP3 (44.1kHz mono) which is not ideal I know, at 56kbps CBR 32kHz mono 
using lame -q 1.  I'm fairly hard-pressed to tell the difference, though I 
might with better source material.  Still, it'd be hard to tell in 
isolation.
> For audiobook use, I don't know that the SILK modes or anything else
> with that low of a bitrate will be good enough, and when you're storing
> hundreds of hours of speech 64kbps adds up fast. I'd guess the sweet
> spot for audiobooks would be between 20 and 32 kbps, and this seems to
> my unschooled understanding to be a region where Opus's low delay might
> put it at a serious disadvantage.
Interesting you say this.  Audible (which uses its proprietory AA/AAX 
formats) offers books in a few quality encodes.  The best one, which IMHO 
is noticeably better than the others, is their AAX format (also known as 
Enhanced Format).  This format is at 64 kbps (not sure which codec it 
uses).  This means that a longer book, such as a book from Dianna 
Gabaldon's Outlander series, comes in at over a gig.  And yet they happily 
provide this format.  With storage what it is today, I don't think that 64 
kbps is the size problem it once  was, and the improved sound is quite 
noticeable.

As for the sampling rate of the Audible files, 
https://discussions.apple.com/thread/2422463?start=0&tstart=0 gives the 
rate for AAX-format books as being 22.05 kHz, but I doubt this.  To me 
they sound better than this, maybe 32kHz.  It sounds like it's not quite 
44.1kHz but the difference isn't really noticeable unless you are in a 
position to compare the two.  Certainly the listening experience is very 
comfortable, even on hi-fi setups.

And I'm glad the Opus developers considered audiobooks as a possible 
application.  I have to admit that it never occured to me, but the 
benefits are clearly obvious.

Geoff.

Reasonably Related Threads

Search for more possibly parallel threads

opus - Nov 2011 - Opus for audiobooks etc

[CELT-dev] Opus for audiobooks etc

[CELT-dev] Opus for audiobooks etc

[CELT-dev] Opus for audiobooks etc

[CELT-dev] Opus for audiobooks etc

Reasonably Related Threads