I know the focus for Opus is low delay, but I've been watching its development with interest because of the potential for audiobook/podcast use, where latency is practically irrelevant. I hear the upcoming USAC codec will give good results for this niche (though listening test results don't seem to be available to the public yet), but I also hear it'll be extremely patent encumbered. If Opus can do anywhere near as well, I think a lot of folks would be interested in using it for audiobooks and avoiding the patent jungle. The only comment I've seen about use of Opus for audiobooks was jmvalin saying in response to someone on his blog that Opus's ability to do fullband would be a key advantage here. This seems kind of counterintuitive to me- can people even ABX human speech at a 32 or even 24kHz sample rate from speech at 48kHz, much less hear a large quality difference? A number of audiobooks I've listened to have used 22kHz mp3s without being clearly objectionable, and in my personal use I've had decent results using the -voice LAME setting (downsamples to 32kHz and encodes as 56kbps abr). The recent hydrogenaudio tests showed Opus CELT modes trumping the best of breed high-latency codecs at 64kbps despite having only 22.5 ms latency, and the SILK modes do a great job at the opposite of the bitrate spectrum and can make use of larger frame sizes for those of use who don't care about latency. Inbetween the two, the hybrid mode appears to do better than other codecs with similar latency- but Christian Hoene's results showed it losing pretty convincingly to AMR-WB+ (which was able to use 4x larger frame sizes) at 32kbps. (How much of this was due to the test being stereo, I wonder? Some mono tests seem to have given 32kbps Opus rather high marks.) For audiobook use, I don't know that the SILK modes or anything else with that low of a bitrate will be good enough, and when you're storing hundreds of hours of speech 64kbps adds up fast. I'd guess the sweet spot for audiobooks would be between 20 and 32 kbps, and this seems to my unschooled understanding to be a region where Opus's low delay might put it at a serious disadvantage. Other than just being curious in general about what folks have to say about audiobook use, I'm curious about one thing in particular-- how feasible would it be to use larger frame sizes (e.g. matching SILK mode's 60ms maximum) for Opus, especially for the hybrid mode, and what would the potential for improved quality be?
On 11/17/2011 02:41 PM, Daniel Jensen wrote:> can people even ABX human speech at a 32 or even > 24kHz sample rate from speech at 48kHz, much less hear a large quality > difference?Yes (but certainly not everyone can hear the difference). Perhaps more importantly, with Opus you don't have to worry about audio bandwidth (i.e. samplerate; 48k vs. 22050 vs ...). Just throw in a fullband input and set your bitrate. If the best quality is achieved by downsampling, the opus encoder will do that internally.> The recent hydrogenaudio tests showed Opus CELT modes trumping the best > of breed high-latency codecs at 64kbps despite having only 22.5 ms > latency, and the SILK modes do a great job at the opposite of the > bitrate spectrum and can make use of larger frame sizes for those of use > who don't care about latency.Yes, although the larger SILK frames are basically just 2 or 3 20ms frames stuck together in a way that reduces packing overhead. Inbetween the two, the hybrid mode appears> to do better than other codecs with similar latency- but Christian > Hoene's results showed it losing pretty convincingly to AMR-WB+ (which > was able to use 4x larger frame sizes) at 32kbps. (How much of this was > due to the test being stereo, I wonder? Some mono tests seem to have > given 32kbps Opus rather high marks.)That test was deliberately using very weird stereo, like two different speakers saying different things in both ears. There have also been some improvements in the stereo encoding since then. I wouldn't worry too much about those results.> For audiobook use, I don't know that the SILK modes or anything else > with that low of a bitrate will be good enough, and when you're storing > hundreds of hours of speech 64kbps adds up fast. I'd guess the sweet > spot for audiobooks would be between 20 and 32 kbps, and this seems to > my unschooled understanding to be a region where Opus's low delay might > put it at a serious disadvantage.Here's Opus at 20 kbps beating AMR-WB, and at 32 kbps getting close to transparent (at 16 kHz samplerate, you may note): http://www.octasic.com/en/tech/opus_audio_codec.php#Google> Other than just being curious in general about what folks have to say > about audiobook use, I'm curious about one thing in particular-- how > feasible would it be to use larger frame sizes (e.g. matching SILK > mode's 60ms maximum) for Opus, especially for the hybrid mode, and what > would the potential for improved quality be?Opus will be fantastic for audiobooks. Frame size is a bit tricky in Opus. The short version is "don't worry about it". In Hybrid and CELT modes, the maximum frame size is 20ms. A slightly longer version is that 20ms frames can be combined into "packets" up to 120ms long. This can save about 1 byte per frame, or about 0.4 kbps, compared to the configuration we've been testing so far (20ms frames in 20ms packets). This is less than 2% bitrate savings in your "sweet spot", so we haven't been worrying about it. The real reason for this feature is that some transports (like RTP) have large per-packet costs, so then reducing the number of packets can be valuable. --Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature Url : http://lists.xiph.org/pipermail/opus/attachments/20111117/89567506/attachment-0002.pgp
On Thu, Nov 17, 2011 at 2:41 PM, Daniel Jensen <jensend at iname.com> wrote:> The only comment I've seen about use of Opus for audiobooks was jmvalin > saying in response to someone on his blog that Opus's ability to do > fullband would be a key advantage here. This seems kind of > counterintuitive to me- can people even ABX human speech at a 32 or even > 24kHz sample rate from speech at 48kHz, much less hear a large quality > difference? A number of audiobooks I've listened to have used 22kHz mp3s > without being clearly objectionable, and in my personal use I've had > decent results using the -voice LAME setting (downsamples to 32kHz and > encodes as 56kbps abr).22kHz speech isn't "objectionable", but it's trivially ABXable, at least if the speech was recorded with full bandpass. 32KHz vs 48KHz may not be ABX-able for speech (or even for music for many adults!), but you get the extra extension for free in opus. The low bandpass can be objectionable for the music parts in mixed content. Keep in mind that communication codecs are usually do a wideband at 16KHz, which sounds clearly and obviously worse for speech. (Although not objectionable) Vs MP3 opus is just a lot more efficient. [snip]> Hoene's results showed it losing pretty convincingly to AMR-WB+ (which > was able to use 4x larger frame sizes) at 32kbps. (How much of this was > due to the test being stereo, I wonder? Some mono tests seem to have > given 32kbps Opus rather high marks.)IIRC he was testing some rather torturous samples with different speakers running concurrently in different ears? and at rates lower than we'd recommend for general stereo. I believe the goal was mostly to make sure the codec didn't blow up or perform too terribly. (You can do things like pan-potted mono down to lower rates in opus, but full stereo needs some more bitrate). The encoder is now more aggressive at flattening the audio to to mono at very low rates.> For audiobook use, I don't know that the SILK modes or anything else > with that low of a bitrate will be good enough, and when you're storing > hundreds of hours of speech 64kbps adds up fast. I'd guess the sweet > spot for audiobooks would be between 20 and 32 kbps, and this seems to > my unschooled understanding to be a region where Opus's low delay might > put it at a serious disadvantage.Well... Disadvantage compared to what? If you're able to get licenses for USAC under $2 per decoder I'll be surprised. At those rates it may well turn out to work better. Hundreds of milliseconds of delay can be helpful. :) Considering the licensing and the wider use cases (VoIP as well as high delay stuff) I hope and expect Opus to be much more widely deployed. If your comparison points are Vorbis, MP3, Speex (or other pure-communication codec), or AAC it should be no contest.> Other than just being curious in general about what folks have to say > about audiobook use, I'm curious about one thing in particular-- how > feasible would it be to use larger frame sizes (e.g. matching SILK > mode's 60ms maximum) for Opus, especially for the hybrid mode, and what > would the potential for improved quality be?Audiobook use was a consideration for us and it was one of the drivers behind the codec's ability to do seamless mode switching. Our higher latency modes (>>20ms) are mostly about reducing IP/UDP/RTP overhead, an issue you won't have for Opus in Ogg. For your application the improvements for encoder VBR and automatic speech detection that we're currently working on will probably be relevant. This use case would probably also benefit from additional look-ahead in the encoder, (and potentially two-pass rate control)
Hi, As a blind person, this subject interests me greatly. On Thu, 17 Nov 2011, Daniel Jensen wrote:> The only comment I've seen about use of Opus for audiobooks was jmvalin > saying in response to someone on his blog that Opus's ability to do > fullband would be a key advantage here. This seems kind of > counterintuitive to me- can people even ABX human speech at a 32 or even > 24kHz sample rate from speech at 48kHz, much less hear a large quality > difference?32 kHz may be hard to notice. 24kHz I'd think would not be, and 22kHz is definitely noticeable. It's not to say that this isn't acceptable to many, but it's easy enough to hear and I personally would prefer something higher, particularly if I'm going to pay for it.> A number of audiobooks I've listened to have used 22kHz mp3s > without being clearly objectionable, and in my personal use I've had > decent results using the -voice LAME setting (downsamples to 32kHz and > encodes as 56kbps abr).hmm. I just tried reencoding some speech material I have here at 64kbps MP3 (44.1kHz mono) which is not ideal I know, at 56kbps CBR 32kHz mono using lame -q 1. I'm fairly hard-pressed to tell the difference, though I might with better source material. Still, it'd be hard to tell in isolation.> For audiobook use, I don't know that the SILK modes or anything else > with that low of a bitrate will be good enough, and when you're storing > hundreds of hours of speech 64kbps adds up fast. I'd guess the sweet > spot for audiobooks would be between 20 and 32 kbps, and this seems to > my unschooled understanding to be a region where Opus's low delay might > put it at a serious disadvantage.Interesting you say this. Audible (which uses its proprietory AA/AAX formats) offers books in a few quality encodes. The best one, which IMHO is noticeably better than the others, is their AAX format (also known as Enhanced Format). This format is at 64 kbps (not sure which codec it uses). This means that a longer book, such as a book from Dianna Gabaldon's Outlander series, comes in at over a gig. And yet they happily provide this format. With storage what it is today, I don't think that 64 kbps is the size problem it once was, and the improved sound is quite noticeable. As for the sampling rate of the Audible files, https://discussions.apple.com/thread/2422463?start=0&tstart=0 gives the rate for AAX-format books as being 22.05 kHz, but I doubt this. To me they sound better than this, maybe 32kHz. It sounds like it's not quite 44.1kHz but the difference isn't really noticeable unless you are in a position to compare the two. Certainly the listening experience is very comfortable, even on hi-fi setups. And I'm glad the Opus developers considered audiobooks as a possible application. I have to admit that it never occured to me, but the benefits are clearly obvious. Geoff.