thr3ads.net - opus - [opus] Opus application_mode==AUDIO, 20ms framing issue? [Jun 2016]

If this information is useful, please help other people find it:
Share via:

Kevin Connor

2016-Jun-13 17:10 UTC

[opus] Opus application_mode==AUDIO, 20ms framing issue?

Hi Jean-Marc, 

Sorry for late reply, thanks for interest.     It's quality good for
10ms/audio,  poorer for 20ms/audio.  Quality equivalent for 10,20ms for
mode=voip.  PESQ was the tool that alerted me to something of interest, but I
don't trust PESQ to almost any degree!  It's good for hearing relative
differences, of course, but not absolutes.    Bitrate here was 28kbps,  but I
hear same thing at 32kbps.

Please find attached a zip file with the audio files, converted to .wavs for
simpler listening.

  https://www.dropbox.com/s/bzu4i3dmg5f91tv/20msAudioModeQuestion.zip?dl=0
<https://www.dropbox.com/s/bzu4i3dmg5f91tv/20msAudioModeQuestion.zip?dl=0>

If there is one single thing to listen to, it would be    

    ar3_20_audio.wav,   loop the section "china hit" starting t=0.6s 
and listen for artifacts in the unvoiced speech.  reference is ar3.wav.

and by comparison
     
    ar2_10_audio.wav   ( same segment, sounds more like the reference ar3.wav)


Here is a cat of the README.txt.   Thanks very much!


16bit, 16kHz input wav files (ar1, ar2, ar3), content from ~50Hz to near 8kHz.
All .pcm files are 16kHz, 16bit, signed ints, little (intel) endian.

./opus_demo -e voip 16000 1 28000  -framesize 20 ~/ar1.wav ar1_20_voip.bit 
./opus_demo -d 16000 ar1_20_voip.bit ar1_20_voip.pcm

opus_demo reports version:    libopus 1.1-alpha

Using recent pesq code compiled from src, +16000 option.
( same phenomenon seen with +16000 +wb option)  


                   5ms      10ms     20ms      40ms

ar1_NN_voip       4.314    4.493    4.488     4.488
ar2_NN_voip       4.346    4.442    4.436     4.474
ar3_NN_voip       3.993    4.375    4.414     4.390

ar1_NN_audio      4.292    4.485 -> 4.313     4.313
ar2_NN_audio      4.364    4.460 -> 4.350     4.350
ar3_NN_audio      3.924    4.327 -> 4.218     4.218


Note that this size/type of pesq test is insufficient to draw ANY conclusions.
However, it is useful for drawing attention to relative differences, that
might be interesting for HUMAN LISTENING.

So the question here was, is this pesq drop from 10ms to 20ms framesize, seen in
the
case of mode=AUDIO (but not VOIP)  something REAL?  It warranted listening.



( same results, interleaved mode=VOIP,AUDIO numbers ) 

                   5ms      10ms     20ms      40ms

ar1_NN_voip       4.314    4.493    4.488*     4.488
ar1_NN_audio      4.292    4.485    4.313*     4.313

ar2_NN_voip       4.346    4.442    4.436*     4.474
ar2_NN_audio      4.364    4.460    4.350*     4.350

ar3_NN_voip       3.993    4.375    4.414*     4.390
ar3_NN_audio      3.924    4.327    4.218*     4.218


same data,  interleaved to highlight fact that drop is seen for same sentences, 
from mode=VOIP to mode=AUDIO,  for 20ms framesize.  (40ms is same processing as
20ms, I believe).


So the  that is implied:
- is there a phenomenon for mode=AUDIO that results in lower scores for 20ms in
particular, but not 10ms?

Listening to the processed files (sighted), I have the following subjective
opinion:

- Given: sampling rate = 16000,  bitrate = 28000.  (also replicated at 32 kbps)
- the 10ms versions (voip,audio) and the 20ms (audio) version sound
"focused" and have high fidelity to the ref.
- the 20ms mode=AUDIO versions sound "hollow", "smeared",
"unfocused", especially during unvoiced segments.
- example "china hit" file ar3.pcm, t=0.6s.  Very clear diff between
10ms and 20ms framesize in mode=audio.


This isn't about pesq scores -- pesq was just the "difference
noticed" flag that got me to listen to some files.
I notice this same kind of de-focused sound in the same samples processed using
recent opus lib in linux.
I'm not surprised at a delta between mode=voip and mode=audio for a constant
framesize.  That's entirely expected.
What I'm curious about is the delta between 10ms and 20ms , for mode=audio.


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20160613/2b54589e/attachment-0001.html>

Maybe Matching Threads

Search for more possibly parallel threads

opus - Jun 2016 - Opus application_mode==AUDIO, 20ms framing issue?

[opus] Opus application_mode==AUDIO, 20ms framing issue?

Maybe Matching Threads

Wisdom of the Ancients