Jean-Marc Valin
2002-Mar-27 21:23 UTC
[vorbis-dev] Speex: Open-source, patent-free speech coding
Hi, We would like to announce the first release of the Speex project. Speex (http://speex.sourceforge.net) is an open-source (LGPL), patent-free compression format allowing an alternative to expensive proprietary codecs. Unlike Ogg Vorbis which compresses general audio, Speex is designed especially for speech. For that reason, Speex is meant to be a complement to Vorbis. Since it is specialized for voice communications, it is possible to attain lower (compared to Ogg Vorbis/MP3) bit-rates in the 8-32 kbps/channel range. Possible applications include Voice over IP (VoIP) applications, Internet audio streaming at low bit-rate and archiving of speech data (e.g. voice mail). This first version of Speex supports fixed bit-rate CELP coding at 14.5 kbps for speech sampled at 8 kHz (narrowband) and at 28.5 kbps for 16 kHz (wideband) speech. Future releases will likely provide a wider choice of bit-rates, better quality, as well as variable bit-rate (VBR) and discontinuous transmission (DTX). Disclaimer: Note that this is a preliminary release and that bit-rates, quality and bitstream definition will change in the future. This means that the format used in this release will not be compatible with future releases. Also note that this software has so far received only a minimum amount of testing, so it may break or do unexpected things. -- Jean-Marc Valin, M.Sc.A. LABORIUS (http://www.gel.usherb.ca/laborius) Université de Sherbrooke, Québec, Canada <p>--- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Linus Walleij
2002-Mar-28 03:08 UTC
[vorbis-dev] Speex: Open-source, patent-free speech coding
On 28 Mar 2002, Jean-Marc Valin wrote:> Speex is meant to be a > complement to Vorbis. Since it is specialized for voice communications,Great! Have you adopted the Ogg bitstream format for standalone file storage of Speex files too? (IE do you encapsulate your encoded data in Ogg bitstreams or do you use your own mock-up file format.) Linus <p>--- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Jean-Marc Valin
2002-Mar-28 06:01 UTC
[vorbis-dev] Speex: Open-source, patent-free speech coding
> Great! Have you adopted the Ogg bitstream format for standalone file > storage of Speex files too? (IE do you encapsulate your encoded datain> Ogg bitstreams or do you use your own mock-up file format.)Not yet (right now, we use a 5-byte header and then write raw encoded frames), but that's something we were thinking about. Could someone help us doing that? Jean-Marc -- Jean-Marc Valin, M.Sc.A. LABORIUS (http://www.gel.usherb.ca/laborius) Université de Sherbrooke, Québec, Canada <p>--- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Hi, What is the recommended way of flushing the Encoder's buffers. My audio recording seem to be getting truncated. Thanks! Rolande Kendal <p>--- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Jean-Marc Valin
2002-Mar-28 10:26 UTC
[vorbis-dev] Speex: Open-source, patent-free speech coding
Le jeu 28/03/2002 à 13:16, Kendal a écrit :> To the best of my knowledge, CELP is a patented method that requires licensing. > See: http://www.voiceage.com > > How can you produce a CELP codec that circumvents existing patents?They own a patent on ACELP (Algebraic CELP), invented at the University of Sherbrooke (I did my master with them and VoiceAge). CELP was invented by Atal in the early 80's and is not patented AFAIK. ACELP is now the most widely used technique, but unfortunately, we cannot use it because of patents. We're using a "multi-pulse-like" method, but without the search efficiency of ACELP. If you know about patents on multi-pulse or other topics in Speech coding, please let me know. Jean-Marc -- Jean-Marc Valin, M.Sc.A. LABORIUS (http://www.gel.usherb.ca/laborius) Université de Sherbrooke, Québec, Canada <p>--- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
DSPguru
2002-Mar-28 11:27 UTC
[vorbis-dev] Re: Speex: Open-source, patent-free speech coding
Greetings developers, Jean-Marc, Great to hear about your work. i find speech encoding pretty useful. these days we think about MultiLingual vorbis defenitions, and the best compression we could acheive were if we could save the common track on two channels, and then save the delta between the common track to the specifiec language (english speech/italian speech/etc'..) on a low bandwidth track. a vocoder track for this task could be very interesting. don't you think ? another point, I wanted to ask why you had decided to develop CELP, which is based on linear prediction (LPC), while there are multi-band excitation (MBE) speech coders (IMBE/AMBE) that have better compression-gain (down to 2kbps), and consumes less MIPS. i heard some samples of DVSI's vocoders, and i must admit that their quality is very very good. LPC is abit old, isn't it ?-) <p>Good luck with your work ! Dg. http://DSPguru.doom9.net <p>Jean-Marc Valin wrote:> > Hi, > > We would like to announce the first release of the Speex project. Speex > (http://speex.sourceforge.net) is an open-source (LGPL), patent-free > compression format allowing an alternative to expensive proprietary > codecs. Unlike Ogg Vorbis which compresses general audio, Speex is > designed especially for speech. For that reason, Speex is meant to be a > complement to Vorbis. Since it is specialized for voice communications, > it is possible to attain lower (compared to Ogg Vorbis/MP3) bit-rates in > the 8-32 kbps/channel range. Possible applications include Voice over IP > (VoIP) applications, Internet audio streaming at low bit-rate and > archiving of speech data (e.g. voice mail). > > This first version of Speex supports fixed bit-rate CELP coding at 14.5 > kbps for speech sampled at 8 kHz (narrowband) and at 28.5 kbps for 16 > kHz (wideband) speech. Future releases will likely provide a wider > choice of bit-rates, better quality, as well as variable bit-rate (VBR) > and discontinuous transmission (DTX). > > Disclaimer: > > Note that this is a preliminary release and that bit-rates, quality and > bitstream definition will change in the future. This means that the > format used in this release will not be compatible with future releases. > Also note that this software has so far received only a minimum amount > of testing, so it may break or do unexpected things. > > -- > Jean-Marc Valin, M.Sc.A. > LABORIUS (http://www.gel.usherb.ca/laborius) > Université de Sherbrooke, Québec, Canada<p><p>_____________________________________________________________ Get email for your site ---> http://www.everyone.net _____________________________________________________________ Run a small business? Then you need professional email like you@yourbiz.com from Everyone.net http://www.everyone.net?tag --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
DSPguru
2002-Mar-28 13:28 UTC
[vorbis-dev] Re: Speex: Open-source, patent-free speech coding
Hi Jean-Marc,> > these days we think about MultiLingual vorbis defenitions, and the best > > compression we could acheive were if we could save the common track on > > two channels, and then save the delta between the common track to the > > specifiec language (english speech/italian speech/etc'..) on a low > > bandwidth track. a vocoder track for this task could be very > > interesting. don't you think ? > > Probably, what kind of bandwidth (sampling rate) and bit-rate are you > thinking about?ampling-rate should be high (48khz), but bandwidth should be less than 16khz (after "extracting" speech-only from the lingual track). about bitrate, let me describe something : up until vorbis came, people used to encode their soundtrack of movies at 128kbps to 192kbps MP3. now, with Ogg, we can encode the "common" track at around 100kbps vorbis, and encode each speech track at less than 30kbps with speex. this gives us about 180kbps for a movie with three soundtracks (english/italian/francis, for instance). that could make a small revolution :). <p>> Well CELP (mostly CELP variants) is still the most widely used technique> for speech coding. If you look at the latest standards (like G.729 and > AMR wideband), most use ACELP (which we cannot use because of patents, > but that's another thing). Note that CELP has nothing to do with LPC > vocoders (aside from the fact that is uses LPC analysis). There are > other techniques, like WI (waveform interpolation) and sinusoidal coding > (not sure about MBE), but most of them are for low bit-rate coding (in > the 4 kbps range). Our current goals focus more on high quality that low > bit-rate.you're right, VoIP applications uses variations of CELP. AMBE (& MELP) vocoders are mostly used in military applications (Digital voice over HF). but in some way, MBE complements Speex, the same way Speex complements vorbis :). anyway, it was only a thought.. just to mention, that it been some years since i studied vocoders principles, but i still remember (correct me if i'm wrong here :>), that CELP bitstream mostly (i believe that in LD-CELP it's even ONLY) includes codes to exciters codebook, but both the encoder and decoder includes LPC analysis. you can find some info about MBE over at : http://www.dvsinc.com/papers/mbe.htm these days, some people are testing the difference between tracks within mutlilingual movies. results would be published here : http://forum.doom9.org/forumdisplay.php?s=&forumid=11> > That being said, the Speex is an open-source project and the code is > designed so that it's really easy to change any part of the codec (LSP > quantization, pitch, fixed codebooks, ...). So you can play with it if > you like and if you find something that works well, let us know. > > Jean-Marc10x, Dg. http://DSPguru.doom9.net _____________________________________________________________ Get email for your site ---> http://www.everyone.net _____________________________________________________________ Run a small business? Then you need professional email like you@yourbiz.com from Everyone.net http://www.everyone.net?tag --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
DSPguru
2002-Mar-28 15:40 UTC
[vorbis-dev] Re: Speex: Open-source, patent-free speech coding
Hi Jean-Marc,> > sampling-rate should be high (48khz), but bandwidth should be less than > > 16khz (after "extracting" speech-only from the lingual track). > > Currently, Speex only supports sampling at 8 kHz and 16 kHz, so it would > need to be adapted to work at 32 kHz (and then up-sample to 48 kHz). I'd > say it's quite feasible.Speex shouldn't bother dealing with non-standard speech sampling-rates. Encoding tools like mine would downsample the signal before delivering it to Speex, and decoding tools like Tobias' DirectShowFilter should take care of the upsampling and summing the different tracks (common + speech). There's a brilliant, open-source, HQ sample-rate convertor, called SSRC. it's under LPGL, and i even made a dll release of this fine tool.> > > about bitrate, let me describe something : > > up until vorbis came, people used to encode their soundtrack of movies > > at 128kbps to 192kbps MP3. now, with Ogg, we can encode the "common" > > track at around 100kbps vorbis, and encode each speech track at less > > than 30kbps with speex. this gives us about 180kbps for a movie with > > three soundtracks (english/italian/francis, for instance). > > that could make a small revolution :). > > I think 30 kbps is realistic. When we add VBR, the average could easily > drop to ~16 kbps/track.ure thing! the speech track suppose to have lots of silent moments, so DTX (AD/CFI) would help to drop the bitrate. problem is - can CELP handle multiple spokesmen (ie, when two ppl are talking at the same time), and will sound quality differ when compressing English track compared to encoding Russian track ?> > > you can find some info about MBE over at : > > http://www.dvsinc.com/papers/mbe.htm > > This info seems very biased to me...most probably. still, i know of some satelite applications where AMBE is succesfuly used. i also know of a few projects where MELP is used over HF.> > So I'd say the first step would be to build a prototype that downsamples > the 48 kHz stream to 16 kHz and encodes it with the current Speex > version. Once that works, we can try making Speex work at 32/48 kHz. > Actually, that *might* not even be necessary, as most of the energy in > speech is in the 0-8 kHz band - and even the 4-8 kHz band can in some > cases (speech only) be severely distorted before the ear can tell the > difference.the first step is : - decide how we extract the 'common' track - define Speex integration in ogg - start testing - taking a multilingual title, creating the 'common' track, downsampling the 'lingual' tracks using ssrc.dll and muxing everything to ogg stream. then doing the reversed process.., and comparing quality. <p>Jean-Marc, you have a lot of knowledge regarding speech models. can you point out some useful sites/tools which i should check in order to implement the first stage of 'extracting the common track' ? keep in mind that i should take advantage of the fact that i have multiple soundtracks that mostly (only?) differs in the speech content. Best Regards, Dg. http://DSPguru.doom9.net _____________________________________________________________ Get email for your site ---> http://www.everyone.net _____________________________________________________________ Run a small business? Then you need professional email like you@yourbiz.com from Everyone.net http://www.everyone.net?tag --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
DSPguru
2002-Mar-30 08:58 UTC
[vorbis-dev] Re: Speex: Open-source, patent-free speech coding
Hi Jean-Marc,> The only potential problem is the one where two people are talking at > the same time. In this case, the solution could be to just boost the > bit-rate for a couple frames.are you sure this will give us good results for both voiced and unvoiced sounds ? i'm asking because this could raise an alternative solution. assuming with have two speech signals, s1,s2. if Speex can encode s1+s2, it should be able to encode s1-s2, as well. right ? in this case, our "common" track could be the original english (s1) soundtrack encoded at ~100kbps vorbis, and the german track (s2) would be encoded as speex of s1-s2. that way we have better compression-gain, and better quality for the default soundtrack. Beware, developers, we would need to define some comments strategy to the ogg, to be able to hold info about languages. so in the future, each user could choose its prefered langauge, and the player should be able to supply it by default (if available). <p>>> > the first step is : > > - decide how we extract the 'common' track > > I'll leave that one to you. I have no idea about the properties of the > different tracks.ok. maybe anyone else in the list have ideas ? <p>Best Regards, Dg. http://DSPguru.doom9.net _____________________________________________________________ Get email for your site ---> http://www.everyone.net _____________________________________________________________ Run a small business? Then you need professional email like you@yourbiz.com from Everyone.net http://www.everyone.net?tag --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
DSPguru
2002-Mar-30 15:55 UTC
[vorbis-dev] Re: Speex: Open-source, patent-free speech coding
> I believe unvoiced sounds won't be a problem, but voiced sounds could > (especially two simultaneous vowels at different pitch and very > different LPC), but increasing bit-rate could still work.that's exactly what i was afraid of. two voiced sounds with different pitches.. :( we need to test Speex behavior under this.> I don't think this would work. Encoding one track with occasional > double-talk would make sense (even if we need to triple bit-rate for the > few double-talk instances), but continuous double-talk would cause too > much problems.okay, back to original plan. btw, when you say "triple bit-rate", how much would that be.. ? <p>> Just a thought: ICA (Independent Component Analysis) might be able to do> it. Not sure whether it's good enough though. It has to do a perfect job > if you don't want to end up with the problem I described above. So I'm > not sure whether the project is feasible at all without access to the > "original" common track.there's a lot we can do. we should take advantage of the following facts : - we have several soundtracks that mostly differ in the speech, or in other words (generally speaking) : spectral view mostly differs in 3 to 6 formants. - BEFORE we downmix the 6ch (5.1) soundtrack to 2ch, we can focus our analysis on the _center_ channel on each soundtrack, which its nature is speech content. above all that, we should try to run ICA, kareoke, echo cancelling, etc'.. <p>Best Regards, Dg. http://DSPguru.doom9.net _____________________________________________________________ Get email for your site ---> http://www.everyone.net _____________________________________________________________ Run a small business? Then you need professional email like you@yourbiz.com from Everyone.net http://www.everyone.net?tag --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Kevin Marks
2002-Apr-01 13:17 UTC
[vorbis-dev] Re: Speex: Open-source, patent-free speech coding
On Saturday, March 30, 2002, at 08:58 AM, DSPguru wrote:>>> the first step is : >>> - decide how we extract the 'common' track >> >> I'll leave that one to you. I have no idea about the properties of the >> different tracks. > > ok. maybe anyone else in the list have ideas ?When a TV program is edited, the Music & Effects is a separate track form the speech precisely so that different langauges can be added afterwards. They are only mixed at the mastering stage. You should start from the unmixed tracks and compress them separately. <p>--- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.