thr3ads.net - Vorbis dev - [vorbis-dev] Speex: Open-source, patent-free speech coding [Mar 2002]

If this information is useful, please help other people find it:
Share via:

Jean-Marc Valin

2002-Mar-27 21:23 UTC

[vorbis-dev] Speex: Open-source, patent-free speech coding

Hi,

We would like to announce the first release of the Speex project. Speex
(http://speex.sourceforge.net) is an open-source (LGPL), patent-free
compression format allowing an alternative to expensive proprietary
codecs. Unlike Ogg Vorbis which compresses general audio, Speex is
designed especially for speech. For that reason, Speex is meant to be a
complement to Vorbis. Since it is specialized for voice communications,
it is possible to attain lower (compared to Ogg Vorbis/MP3) bit-rates in
the 8-32 kbps/channel range. Possible applications include Voice over IP
(VoIP) applications, Internet audio streaming at low bit-rate and
archiving of speech data (e.g. voice mail).

This first version of Speex supports fixed bit-rate CELP coding at 14.5
kbps for speech sampled at 8 kHz (narrowband) and at 28.5 kbps for 16
kHz (wideband) speech. Future releases will likely provide a wider
choice of bit-rates, better quality, as well as variable bit-rate (VBR)
and discontinuous transmission (DTX).

Disclaimer:

Note that this is a preliminary release and that bit-rates, quality and
bitstream definition will change in the future. This means that the
format used in this release will not be compatible with future releases.
Also note that this software has so far received only a minimum amount
of testing, so it may break or do unexpected things.


-- 
Jean-Marc Valin, M.Sc.A.
LABORIUS (http://www.gel.usherb.ca/laborius)
Université de Sherbrooke, Québec, Canada

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Linus Walleij

2002-Mar-28 03:08 UTC

head link

[vorbis-dev] Speex: Open-source, patent-free speech coding

On 28 Mar 2002, Jean-Marc Valin wrote:
> Speex is meant to be a
> complement to Vorbis. Since it is specialized for voice communications,
Great! Have you adopted the Ogg bitstream format for standalone file
storage of Speex files too? (IE do you encapsulate your encoded data in
Ogg bitstreams or do you use your own mock-up file format.)

Linus

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Jean-Marc Valin

2002-Mar-28 06:01 UTC

head link

[vorbis-dev] Speex: Open-source, patent-free speech coding

> Great! Have you adopted the Ogg bitstream format for standalone file
> storage of Speex files too? (IE do you encapsulate your encoded data
in> Ogg bitstreams or do you use your own mock-up file format.)
Not yet (right now, we use a 5-byte header and then write raw encoded
frames), but that's something we were thinking about. Could someone help
us doing that?

        Jean-Marc


-- 
Jean-Marc Valin, M.Sc.A.
LABORIUS (http://www.gel.usherb.ca/laborius)
Université de Sherbrooke, Québec, Canada

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Kendal

2002-Mar-28 10:16 UTC

head link

[vorbis-dev] Flushing the encoder's buffers?

Hi,

What is the recommended way of flushing the Encoder's buffers.
My audio recording seem to be getting truncated.

Thanks!
Rolande Kendal

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Jean-Marc Valin

2002-Mar-28 10:26 UTC

head link

[vorbis-dev] Speex: Open-source, patent-free speech coding

Le jeu 28/03/2002 à 13:16, Kendal a écrit :> To the best of my knowledge, CELP is a patented method that requires
licensing.
> See: http://www.voiceage.com
> 
> How can you produce a CELP codec that circumvents existing patents?
They own a patent on ACELP (Algebraic CELP), invented at the University
of Sherbrooke (I did my master with them and VoiceAge). CELP was
invented by Atal in the early 80's and is not patented AFAIK. ACELP is
now the most widely used technique, but unfortunately, we cannot use it
because of patents. We're using a "multi-pulse-like" method, but
without
the search efficiency of ACELP. If you know about patents on multi-pulse
or other topics in Speech coding, please let me know.

        Jean-Marc


-- 
Jean-Marc Valin, M.Sc.A.
LABORIUS (http://www.gel.usherb.ca/laborius)
Université de Sherbrooke, Québec, Canada

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

DSPguru

2002-Mar-28 11:27 UTC

head link

[vorbis-dev] Re: Speex: Open-source, patent-free speech coding

Greetings developers, Jean-Marc,

Great to hear about your work. i find speech encoding pretty useful.

these days we think about MultiLingual vorbis defenitions, and the best
compression we could acheive were if we could save the common track on
two channels, and then save the delta between the common track to the
specifiec language (english speech/italian speech/etc'..) on a low
bandwidth track. a vocoder track for this task could be very
interesting. don't you think ?

another point,
I wanted to ask why you had decided to develop CELP, which is based on
linear prediction (LPC), while there are multi-band excitation (MBE)  
speech coders (IMBE/AMBE) that have better compression-gain (down to 2kbps),
and consumes less MIPS. i heard some samples of DVSI's vocoders, and i 
must admit that their quality is very very good.
LPC is abit old, isn't it ?-)

<p>Good luck with your work !
Dg. http://DSPguru.doom9.net

<p>Jean-Marc Valin wrote:> 
> Hi,
> 
> We would like to announce the first release of the Speex project. Speex
> (http://speex.sourceforge.net) is an open-source (LGPL), patent-free
> compression format allowing an alternative to expensive proprietary
> codecs. Unlike Ogg Vorbis which compresses general audio, Speex is
> designed especially for speech. For that reason, Speex is meant to be a
> complement to Vorbis. Since it is specialized for voice communications,
> it is possible to attain lower (compared to Ogg Vorbis/MP3) bit-rates in
> the 8-32 kbps/channel range. Possible applications include Voice over IP
> (VoIP) applications, Internet audio streaming at low bit-rate and
> archiving of speech data (e.g. voice mail).
> 
> This first version of Speex supports fixed bit-rate CELP coding at 14.5
> kbps for speech sampled at 8 kHz (narrowband) and at 28.5 kbps for 16
> kHz (wideband) speech. Future releases will likely provide a wider
> choice of bit-rates, better quality, as well as variable bit-rate (VBR)
> and discontinuous transmission (DTX).
> 
> Disclaimer:
> 
> Note that this is a preliminary release and that bit-rates, quality and
> bitstream definition will change in the future. This means that the
> format used in this release will not be compatible with future releases.
> Also note that this software has so far received only a minimum amount
> of testing, so it may break or do unexpected things.
> 
> --
> Jean-Marc Valin, M.Sc.A.
> LABORIUS (http://www.gel.usherb.ca/laborius)
> Université de Sherbrooke, Québec, Canada
<p><p>_____________________________________________________________
Get email for your site ---> http://www.everyone.net

_____________________________________________________________
Run a small business? Then you need professional email like you@yourbiz.com from
Everyone.net  http://www.everyone.net?tag

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

DSPguru

2002-Mar-28 13:28 UTC

head link

[vorbis-dev] Re: Speex: Open-source, patent-free speech coding

Hi Jean-Marc,
> > these days we think about MultiLingual vorbis defenitions, and the
best
> > compression we could acheive were if we could save the common track on
> > two channels, and then save the delta between the common track to the
> > specifiec language (english speech/italian speech/etc'..) on a low
> > bandwidth track. a vocoder track for this task could be very
> > interesting. don't you think ?
> 
> Probably, what kind of bandwidth (sampling rate) and bit-rate are you
> thinking about?
ampling-rate should be high (48khz), but bandwidth should be less than 
16khz (after "extracting" speech-only from the lingual track).

about bitrate, let me describe something :
up until vorbis came, people used to encode their soundtrack of movies 
at 128kbps to 192kbps MP3. now, with Ogg, we can encode the "common"
track at around 100kbps vorbis, and encode each speech track at less
than 30kbps with speex. this gives us about 180kbps for a movie with
three soundtracks (english/italian/francis, for instance).
that could make a small revolution :).

<p>> Well CELP (mostly CELP variants) is still the most widely used
technique> for speech coding. If you look at the latest standards (like G.729 and
> AMR wideband), most use ACELP (which we cannot use because of patents,
> but that's another thing). Note that CELP has nothing to do with LPC
> vocoders (aside from the fact that is uses LPC analysis). There are
> other techniques, like WI (waveform interpolation) and sinusoidal coding
> (not sure about MBE), but most of them are for low bit-rate coding (in
> the 4 kbps range). Our current goals focus more on high quality that low
> bit-rate.
you're right, VoIP applications uses variations of CELP. AMBE (& MELP)
vocoders are mostly used in military applications (Digital voice over HF). but
in some way, MBE complements Speex, the same way Speex
complements vorbis :).
anyway, it was only a thought..

just to mention, that it been some years since i studied vocoders
principles, but i still remember (correct me if i'm wrong here :>),
that CELP bitstream mostly (i believe that in LD-CELP it's even ONLY)
includes codes to exciters codebook, but both the encoder and decoder
includes LPC analysis.

you can find some info about MBE over at :
http://www.dvsinc.com/papers/mbe.htm

these days, some people are testing the difference between tracks within
mutlilingual movies.
results would be published here :
http://forum.doom9.org/forumdisplay.php?s=&forumid=11
> 
> That being said, the Speex is an open-source project and the code is
> designed so that it's really easy to change any part of the codec (LSP
> quantization, pitch, fixed codebooks, ...). So you can play with it if
> you like and if you find something that works well, let us know.
> 
>         Jean-Marc
10x,
Dg. http://DSPguru.doom9.net

_____________________________________________________________
Get email for your site ---> http://www.everyone.net

_____________________________________________________________
Run a small business? Then you need professional email like you@yourbiz.com from
Everyone.net  http://www.everyone.net?tag

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

DSPguru

2002-Mar-28 15:40 UTC

head link

[vorbis-dev] Re: Speex: Open-source, patent-free speech coding

Hi Jean-Marc,
> > sampling-rate should be high (48khz), but bandwidth should be less
than
> > 16khz (after "extracting" speech-only from the lingual
track).
> 
> Currently, Speex only supports sampling at 8 kHz and 16 kHz, so it would
> need to be adapted to work at 32 kHz (and then up-sample to 48 kHz).
I'd
> say it's quite feasible.
Speex shouldn't bother dealing with non-standard speech sampling-rates.
Encoding tools like mine would downsample the signal before delivering
it to Speex, and decoding tools like Tobias' DirectShowFilter should
take care of the upsampling and summing the different tracks (common + speech).
There's a brilliant, open-source, HQ sample-rate convertor, called 
SSRC. it's under LPGL, and i even made a dll release of this fine tool.
> 
> > about bitrate, let me describe something :
> > up until vorbis came, people used to encode their soundtrack of movies
> > at 128kbps to 192kbps MP3. now, with Ogg, we can encode the
"common"
> > track at around 100kbps vorbis, and encode each speech track at less
> > than 30kbps with speex. this gives us about 180kbps for a movie with
> > three soundtracks (english/italian/francis, for instance).
> > that could make a small revolution :).
> 
> I think 30 kbps is realistic. When we add VBR, the average could easily
> drop to ~16 kbps/track.
ure thing! the speech track suppose to have lots of silent moments, so 
DTX (AD/CFI) would help to drop the bitrate.
problem is - can CELP handle multiple spokesmen (ie, when two ppl are
talking at the same time), and will sound quality differ when compressing
English track compared to encoding Russian track ?
> 
> > you can find some info about MBE over at :
> > http://www.dvsinc.com/papers/mbe.htm
> 
> This info seems very biased to me...
most probably. 
still, i know of some satelite applications where AMBE is succesfuly 
used. i also know of a few projects where MELP is used over HF.
> 
> So I'd say the first step would be to build a prototype that
downsamples
> the 48 kHz stream to 16 kHz and encodes it with the current Speex
> version. Once that works, we can try making Speex work at 32/48 kHz.
> Actually, that *might* not even be necessary, as most of the energy in
> speech is in the 0-8 kHz band - and even the 4-8 kHz band can in some
> cases (speech only) be severely distorted before the ear can tell the
> difference.
the first step is :
- decide how we extract the 'common' track
- define Speex integration in ogg
- start testing - taking a multilingual title, creating the 'common'
track, downsampling the 'lingual' tracks using ssrc.dll and muxing
everything to ogg stream. then doing the reversed process.., and
comparing quality.

<p>Jean-Marc, 
you have a lot of knowledge regarding speech models. can you point out
some useful sites/tools which i should check in order to implement the
first stage of 'extracting the common track' ?

keep in mind that i should take advantage of the fact that i have
multiple soundtracks that mostly (only?) differs in the speech content.

Best Regards,
Dg. http://DSPguru.doom9.net

_____________________________________________________________
Get email for your site ---> http://www.everyone.net

_____________________________________________________________
Run a small business? Then you need professional email like you@yourbiz.com from
Everyone.net  http://www.everyone.net?tag

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

DSPguru

2002-Mar-30 08:58 UTC

head link

[vorbis-dev] Re: Speex: Open-source, patent-free speech coding

Hi Jean-Marc,
> The only potential problem is the one where two people are talking at
> the same time. In this case, the solution could be to just boost the
> bit-rate for a couple frames.
are you sure this will give us good results for both voiced and unvoiced sounds
?
i'm asking because this could raise an alternative solution.
assuming with have two speech signals, s1,s2. if Speex can encode
s1+s2, it should be able to encode s1-s2, as well. right ?
in this case, our "common" track could be the original english (s1)
soundtrack encoded at ~100kbps vorbis, and the german track (s2) would
be encoded as speex of s1-s2.
that way we have better compression-gain, and better quality for the
default soundtrack.
Beware, developers, we would need to define some comments strategy to
the ogg, to be able to hold info about languages. 
so in the future, each user could choose its prefered langauge, and the
player should be able to supply it by default (if available).

<p>> > > the first step is :
> > - decide how we extract the 'common' track
> 
> I'll leave that one to you. I have no idea about the properties of the
> different tracks.
ok. maybe anyone else in the list have ideas ?

<p>Best Regards,
Dg. http://DSPguru.doom9.net

_____________________________________________________________
Get email for your site ---> http://www.everyone.net

_____________________________________________________________
Run a small business? Then you need professional email like you@yourbiz.com from
Everyone.net  http://www.everyone.net?tag

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

DSPguru

2002-Mar-30 15:55 UTC

head link

[vorbis-dev] Re: Speex: Open-source, patent-free speech coding

> I believe unvoiced sounds won't be a problem, but voiced sounds could
> (especially two simultaneous vowels at different pitch and very
> different LPC), but increasing bit-rate could still work.
that's exactly what i was afraid of. two voiced sounds with different
pitches.. :(
we need to test Speex behavior under this.
> I don't think this would work. Encoding one track with occasional
> double-talk would make sense (even if we need to triple bit-rate for the
> few double-talk instances), but continuous double-talk would cause too
> much problems.
okay, back to original plan.
btw, when you say "triple bit-rate", how much would that be.. ?

<p>> Just a thought: ICA (Independent Component Analysis) might be able
to do> it. Not sure whether it's good enough though. It has to do a perfect
job
> if you don't want to end up with the problem I described above. So
I'm
> not sure whether the project is feasible at all without access to the
> "original" common track.
there's a lot we can do.
we should take advantage of the following facts :
- we have several soundtracks that mostly differ in the speech, or in
other words (generally speaking) : spectral view mostly differs in 3 to
 6 formants.
- BEFORE we downmix the 6ch (5.1) soundtrack to 2ch, we can focus our
analysis on the _center_ channel on each soundtrack, which its nature
is speech content.

above all that, we should try to run ICA, kareoke, echo cancelling, etc'..

<p>Best Regards,
Dg. http://DSPguru.doom9.net

_____________________________________________________________
Get email for your site ---> http://www.everyone.net

_____________________________________________________________
Run a small business? Then you need professional email like you@yourbiz.com from
Everyone.net  http://www.everyone.net?tag

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Kevin Marks

2002-Apr-01 13:17 UTC

head link

[vorbis-dev] Re: Speex: Open-source, patent-free speech coding

On Saturday, March 30, 2002, at 08:58 AM, DSPguru wrote:>>> the first step is :
>>> - decide how we extract the 'common' track
>>
>> I'll leave that one to you. I have no idea about the properties of
the
>> different tracks.
>
> ok. maybe anyone else in the list have ideas ?
When a TV program is edited, the Music & Effects is a separate track 
form the speech precisely so that different langauges can be added 
afterwards. They are only mixed at the mastering stage.

You should start from the unmixed tracks and compress them separately.

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Possibly Parallel Threads

Search for more maybe matching threads

Vorbis dev - Mar 2002 - Speex: Open-source, patent-free speech coding

[vorbis-dev] Speex: Open-source, patent-free speech coding

[vorbis-dev] Speex: Open-source, patent-free speech coding

[vorbis-dev] Speex: Open-source, patent-free speech coding

[vorbis-dev] Flushing the encoder's buffers?

[vorbis-dev] Speex: Open-source, patent-free speech coding

[vorbis-dev] Re: Speex: Open-source, patent-free speech coding

[vorbis-dev] Re: Speex: Open-source, patent-free speech coding

[vorbis-dev] Re: Speex: Open-source, patent-free speech coding

[vorbis-dev] Re: Speex: Open-source, patent-free speech coding

[vorbis-dev] Re: Speex: Open-source, patent-free speech coding

[vorbis-dev] Re: Speex: Open-source, patent-free speech coding

Possibly Parallel Threads