thr3ads.net - Speex dev - SV: [speex-dev] Speex modes [Aug 2004]

If this information is useful, please help other people find it:
Share via:

Steve Underwood

2004-Aug-06 15:01 UTC

SV: [speex-dev] Speex modes

Pontus Carlsson wrote:
>Thanks!
>
>Btw, have you tried using SBR-technology or similar with speech codecs? That
>might be a good idea I thought.. But I don't know if it produces as good
>quality with speech codecs as it does for music codecs. Do you know if there
>is any open source variant of SBR?
>SBR exploits a limitation of your ears. At high frequencies (like over 
10kHz) you cannot determine pitch with any accuracy. You hear up to 
15kHz to 20kHz (depending on age and other factors), but you really 
cannot identify pitch at these frequencies. You cannot even determine if 
content above about 10kHz is properly harmonically related to the lower 
pitched fundamentals which usually give rise to them.

I don't know of any voice specific coder that even attempts to capture 
energy above 10kHz. SBR just isn't relevent. Most wideband speech coding 
captures only 7kHz to 8kHz bandwidth. The key improvement that gives 
over the 3kHz to 4kHz most mainstream voice coders capture is to clean 
up unvoiced sounds. fffff, sssss, and other unvoiced sounds appear 
almost the same at telephone bandwidth. At 7kHz bandwidth they have 
enough character to make them more distinguishable. The basic 
intelligibility improvement you get is usually small. However, the voice 
is rather more pleasant and less tiring to listen to. That brings 
considerable intelligibility improvements in a long discussion. Adding 
energy up to the limit of hearing adds more to the pleasantness of the 
voice, but it isn't usually considered enough to get people excited 
about commiting extra bits per second to it.

Regards,
Steve

<p><p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'speex-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Pontus Carlsson

2004-Aug-06 15:01 UTC

head link

SV: [speex-dev] Speex modes

Thanks!

Btw, have you tried using SBR-technology or similar with speech codecs? That
might be a good idea I thought.. But I don't know if it produces as good
quality with speech codecs as it does for music codecs. Do you know if there
is any open source variant of SBR?

/Pontus

-----Ursprungligt meddelande-----
Från: owner-speex-dev@xiph.org [mailto:owner-speex-dev@xiph.org]För
Jean-Marc Valin
Skickat: den 13 oktober 2002 05:57
Till: speex
Ämne: Re: [speex-dev] Speex modes

<p>> I'm about finished developing a QuickTime component that
supports Speex> (on
> MacOS X and Windows).. As it is now the user can set complexity
> (SPEEX_SET_COMPLEXITY) and quality (SPEEX_SET_QUALITY /
> SPEEX_SET_VBR_QUALITY) and to wether to use VBR or not. Will these
> options
> make it possible to produce all combinations of bitrates/qualities? Or
> should I also use SPEEX_SET_MODE/SPEEX_SET_LOW_MODE/SPEEX_SET_HIGH_MODE
> to
> accomplish this?
The first thing to know that setting quality from 0-10 is in fact a more
user-friendly of setting the mode. That being said, for narrowband
encoding, all modes are available with at least one quality setting
(sometimes two quality settings point to the same mode because there are
less than 10 modes). For wideband encoding, not all possible mode
combination (one mode for the low-band, one for the high-band) are
available with the 10 quality settings, but those that aren't available
are mostly useless anyway (e.g. a combination that gives you very good
quality above 4 kHz, but very poor below that is useless).

In most cases I would suggest not making the modes directly available,
unless maybe for "expert users". The only other place where it can be
useful is that modes have a specific bit-rate/quality associated to
them, while the mapping between the "quality settings" and the modes
are
not guarantied to remain the same in the future. That being said, you
probably better keep what you have now. Hope this helps.

        Jean-Marc


--
Jean-Marc Valin, M.Sc.A.
LABORIUS (http://www.gel.usherb.ca/laborius)
Université de Sherbrooke, Québec, Canada

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'speex-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Jean-Marc Valin

2004-Aug-06 15:01 UTC

head link

SV: [speex-dev] Speex modes

Well, I don't know what SBR is, but there's something in the wideband
mode that may be similar: It's possible to encode the whole 4-8 kHz band
with just ~1-2 kbps by only encoding the (LPC) shape of the spectrum and
then just filling that band with "something that makes sense". Quality
is quite reasonable...

        Jean-Marc

Le dim 13/10/2002 à 06:18, Steve Underwood a écrit :> Pontus Carlsson wrote:
> 
> >Thanks!
> >
> >Btw, have you tried using SBR-technology or similar with speech codecs?
That
> >might be a good idea I thought.. But I don't know if it produces as
good
> >quality with speech codecs as it does for music codecs. Do you know if
there
> >is any open source variant of SBR?
> >
> SBR exploits a limitation of your ears. At high frequencies (like over 
> 10kHz) you cannot determine pitch with any accuracy. You hear up to 
> 15kHz to 20kHz (depending on age and other factors), but you really 
> cannot identify pitch at these frequencies. You cannot even determine if 
> content above about 10kHz is properly harmonically related to the lower 
> pitched fundamentals which usually give rise to them.
> 
> I don't know of any voice specific coder that even attempts to capture 
> energy above 10kHz. SBR just isn't relevent. Most wideband speech
coding
> captures only 7kHz to 8kHz bandwidth. The key improvement that gives 
> over the 3kHz to 4kHz most mainstream voice coders capture is to clean 
> up unvoiced sounds. fffff, sssss, and other unvoiced sounds appear 
> almost the same at telephone bandwidth. At 7kHz bandwidth they have 
> enough character to make them more distinguishable. The basic 
> intelligibility improvement you get is usually small. However, the voice 
> is rather more pleasant and less tiring to listen to. That brings 
> considerable intelligibility improvements in a long discussion. Adding 
> energy up to the limit of hearing adds more to the pleasantness of the 
> voice, but it isn't usually considered enough to get people excited 
> about commiting extra bits per second to it.
> 
> Regards,
> Steve
> 
> 
> 
> --- >8 ----
> List archives:  http://www.xiph.org/archives/
> Ogg project homepage: http://www.xiph.org/ogg/
> To unsubscribe from this list, send a message to
'speex-dev-request@xiph.org'
> containing only the word 'unsubscribe' in the body.  No subject is
needed.
> Unsubscribe messages sent to the list will be ignored/filtered.
-- 
Jean-Marc Valin, M.Sc.A.
LABORIUS (http://www.gel.usherb.ca/laborius)
Université de Sherbrooke, Québec, Canada


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 242 bytes
Desc: signature.asc
Url :
http://lists.xiph.org/pipermail/speex-dev/attachments/20021013/ceda047e/signature-0001.pgp

Maybe Matching Threads

Search for more reasonably related threads

Speex dev - Aug 2004 - SV: Speex modes

SV: [speex-dev] Speex modes

SV: [speex-dev] Speex modes

SV: [speex-dev] Speex modes

Maybe Matching Threads