You're asking the wrong question. The question is not "why does it would bad with Speex?", but "why does it sound good with LPC10 and MELP?". And the answer is that both are vocoders. Try dropping frames/subframes with anything else (Vorbis, MP3, G.729, u-law, ...) and it'll sound terrible as well. The only reason it sounds good with vocoders is because the codec parameters are in fact synthesizer parameters that don't have a direct connection with the signal. Jean-Marc Bill Cox <waywardgeek at gmail.com> a ?crit?:> I was able to easily hack in an option to play back at different > speeds. For example, using "speexdec --speed 2.0 file.enc file.wav" > plays back encoded file.enc at 2X speed. What I did was divide > st->frameSize and st->subFrameSize by the speedup, and added a > SPEEX_SET_SPEED decoder control for the nb_celp decoder. This > produced speech that was 2X faster than the original. > > However, the quality is very poor. This is where it gets harder for > me, as the quality is impacted by so many parts of the code. Can > anyone guess which part of the decoder is leading to such poor quality > when I cut the frame size in half? This hack works very well in > LPC10, and fairly well in MELPe. > > I've attached two outputs from speex: the decoded playback at normal > speed, and the 2X speed version. > > Thanks, > Bill >
Stuart O Anderson
2010-Oct-19 21:54 UTC
[Speex-dev] Increasing the speed of speex playback
Along those lines - what you want it to find a high quality time-scale shift algorithm. If you're batch processing samples offlines, I believe Audacity can do this base on a STFT that includes phase information. For fast online shifting, PICOLA, as provided by the SpanDSP library, might work well for you, although I haven't tried it at an 8x speedup. Stuart 2010/10/19 Jean-Marc Valin <Jean-Marc.Valin at usherbrooke.ca>> You're asking the wrong question. The question is not "why does it > would bad with Speex?", but "why does it sound good with LPC10 and > MELP?". And the answer is that both are vocoders. Try dropping > frames/subframes with anything else (Vorbis, MP3, G.729, u-law, ...) > and it'll sound terrible as well. The only reason it sounds good with > vocoders is because the codec parameters are in fact synthesizer > parameters that don't have a direct connection with the signal. > > Jean-Marc > > Bill Cox <waywardgeek at gmail.com> a ?crit : > > > I was able to easily hack in an option to play back at different > > speeds. For example, using "speexdec --speed 2.0 file.enc file.wav" > > plays back encoded file.enc at 2X speed. What I did was divide > > st->frameSize and st->subFrameSize by the speedup, and added a > > SPEEX_SET_SPEED decoder control for the nb_celp decoder. This > > produced speech that was 2X faster than the original. > > > > However, the quality is very poor. This is where it gets harder for > > me, as the quality is impacted by so many parts of the code. Can > > anyone guess which part of the decoder is leading to such poor quality > > when I cut the frame size in half? This hack works very well in > > LPC10, and fairly well in MELPe. > > > > I've attached two outputs from speex: the decoded playback at normal > > speed, and the 2X speed version. > > > > Thanks, > > Bill > > > > > > _______________________________________________ > Speex-dev mailing list > Speex-dev at xiph.org > http://lists.xiph.org/mailman/listinfo/speex-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/speex-dev/attachments/20101019/4acc0daf/attachment.htm
Hi, Jean-Marc, and thanks for the quick reply. Let me just say I'm a huge fan of speex, and the work you've done. I actually barely understand what I'm reading so far in the source code and documentation, just enough to understand just how cool the algorithms are. LPC10 and MELP allow me to speed up speech with a simple hack on the decoder frame size. Playing fewer samples per frame speeds up the speech, without effecting the excitation. It works well, but not as well as I would like. I've attached a sample of a female voice sped up with MELPe. I fully understand basic LPC10. Simply reducing the frame size in the decoder is exactly the right way to speed up LPC10 speech without changing the pitch. I would like to figure out how to apply some of the innovations in CELP to sped up speech. Frankly, this is the limit of my current knowledge, and I am clueless as to how to apply CELP concepts to high speed playback. Bill 2010/10/19 Jean-Marc Valin <Jean-Marc.Valin at usherbrooke.ca>:> You're asking the wrong question. The question is not "why does it would bad > with Speex?", but "why does it sound good with LPC10 and MELP?". And the > answer is that both are vocoders. Try dropping frames/subframes with > anything else (Vorbis, MP3, G.729, u-law, ...) and it'll sound terrible as > well. The only reason it sounds good with vocoders is because the codec > parameters are in fact synthesizer parameters that don't have a direct > connection with the signal. > > ? Jean-Marc > > Bill Cox <waywardgeek at gmail.com> a ?crit?: > >> I was able to easily hack in an option to play back at different >> speeds. ?For example, using "speexdec --speed 2.0 file.enc file.wav" >> plays back encoded file.enc at 2X speed. ?What I did was divide >> st->frameSize and st->subFrameSize by the speedup, and added a >> SPEEX_SET_SPEED decoder control for the nb_celp decoder. ?This >> produced speech that was 2X faster than the original. >> >> However, the quality is very poor. ?This is where it gets harder for >> me, as the quality is impacted by so many parts of the code. ?Can >> anyone guess which part of the decoder is leading to such poor quality >> when I cut the frame size in half? ?This hack works very well in >> LPC10, and fairly well in MELPe. >> >> I've attached two outputs from speex: the decoded playback at normal >> speed, and the 2X speed version. >> >> Thanks, >> Bill >> > > > >-------------- next part -------------- A non-text attachment was scrubbed... Name: f2x.ogg Type: audio/ogg Size: 34216 bytes Desc: not available Url : http://lists.xiph.org/pipermail/speex-dev/attachments/20101019/06ebbb7c/attachment-0001.bin
What you are trying to achieve is just *not* applicable to Speex or any codec I know that has a bit-rate above 4 kb/s. That's just not how it works. No amount of changes to Speex is going to help. You need to look elsewhere. Jean-Marc On 10-10-19 11:11 PM, Bill Cox wrote:> Hi, Jean-Marc, and thanks for the quick reply. Let me just say I'm a > huge fan of speex, and the work you've done. I actually barely > understand what I'm reading so far in the source code and > documentation, just enough to understand just how cool the algorithms > are. > > LPC10 and MELP allow me to speed up speech with a simple hack on the > decoder frame size. Playing fewer samples per frame speeds up the > speech, without effecting the excitation. It works well, but not as > well as I would like. I've attached a sample of a female voice sped > up with MELPe. I fully understand basic LPC10. Simply reducing the > frame size in the decoder is exactly the right way to speed up LPC10 > speech without changing the pitch. I would like to figure out how to > apply some of the innovations in CELP to sped up speech. Frankly, > this is the limit of my current knowledge, and I am clueless as to how > to apply CELP concepts to high speed playback. > > Bill > > 2010/10/19 Jean-Marc Valin <Jean-Marc.Valin at usherbrooke.ca>: >> You're asking the wrong question. The question is not "why does it would bad >> with Speex?", but "why does it sound good with LPC10 and MELP?". And the >> answer is that both are vocoders. Try dropping frames/subframes with >> anything else (Vorbis, MP3, G.729, u-law, ...) and it'll sound terrible as >> well. The only reason it sounds good with vocoders is because the codec >> parameters are in fact synthesizer parameters that don't have a direct >> connection with the signal. >> >> Jean-Marc >> >> Bill Cox <waywardgeek at gmail.com> a ?crit : >> >>> I was able to easily hack in an option to play back at different >>> speeds. For example, using "speexdec --speed 2.0 file.enc file.wav" >>> plays back encoded file.enc at 2X speed. What I did was divide >>> st->frameSize and st->subFrameSize by the speedup, and added a >>> SPEEX_SET_SPEED decoder control for the nb_celp decoder. This >>> produced speech that was 2X faster than the original. >>> >>> However, the quality is very poor. This is where it gets harder for >>> me, as the quality is impacted by so many parts of the code. Can >>> anyone guess which part of the decoder is leading to such poor quality >>> when I cut the frame size in half? This hack works very well in >>> LPC10, and fairly well in MELPe. >>> >>> I've attached two outputs from speex: the decoded playback at normal >>> speed, and the 2X speed version. >>> >>> Thanks, >>> Bill >>> >> >> >> >>