Here's one clue about whatever is causing the low quality speech. Speech sounds terrible at 1.01X faster, and it sounds excellent at normal speed (1.0X). So, the main problem is something that breaks with any change in frame size in the decoder. Any idea what that might be? Thanks, Bill On Tue, Oct 19, 2010 at 5:14 PM, Bill Cox <waywardgeek at gmail.com> wrote:> I was able to easily hack in an option to play back at different > speeds. ?For example, using "speexdec --speed 2.0 file.enc file.wav" > plays back encoded file.enc at 2X speed. ?What I did was divide > st->frameSize and st->subFrameSize by the speedup, and added a > SPEEX_SET_SPEED decoder control for the nb_celp decoder. ?This > produced speech that was 2X faster than the original. > > However, the quality is very poor. ?This is where it gets harder for > me, as the quality is impacted by so many parts of the code. ?Can > anyone guess which part of the decoder is leading to such poor quality > when I cut the frame size in half? ?This hack works very well in > LPC10, and fairly well in MELPe. > > I've attached two outputs from speex: the decoded playback at normal > speed, and the 2X speed version. > > Thanks, > Bill >
Hi Bill, Any attempt to alter speed by simple insert or dropping produces poor results. Even if you can get it to sound smooth, the resulting pitch shift is horrible. You really need to use a transform that alters speed smoothly, while maintaining the original pitch of the voice. If you look in my spandsp library you will find a module which does exactly this, using an algorithm called PICOLA. You can speed up or slow down a voice in fine speed steps using this module, and the resulting voice is almost the same quality as the original. There is a test program for it, which should function as an example of how you need to call the library to initialise and use it. Steve On 10/20/2010 05:21 AM, Bill Cox wrote:> Here's one clue about whatever is causing the low quality speech. > Speech sounds terrible at 1.01X faster, and it sounds excellent at > normal speed (1.0X). So, the main problem is something that breaks > with any change in frame size in the decoder. Any idea what that > might be? > > Thanks, > Bill > > On Tue, Oct 19, 2010 at 5:14 PM, Bill Cox<waywardgeek at gmail.com> wrote: >> I was able to easily hack in an option to play back at different >> speeds. For example, using "speexdec --speed 2.0 file.enc file.wav" >> plays back encoded file.enc at 2X speed. What I did was divide >> st->frameSize and st->subFrameSize by the speedup, and added a >> SPEEX_SET_SPEED decoder control for the nb_celp decoder. This >> produced speech that was 2X faster than the original. >> >> However, the quality is very poor. This is where it gets harder for >> me, as the quality is impacted by so many parts of the code. Can >> anyone guess which part of the decoder is leading to such poor quality >> when I cut the frame size in half? This hack works very well in >> LPC10, and fairly well in MELPe. >> >> I've attached two outputs from speex: the decoded playback at normal >> speed, and the 2X speed version. >> >> Thanks, >> Bill >> > _______________________________________________ > Speex-dev mailing list > Speex-dev at xiph.org > http://lists.xiph.org/mailman/listinfo/speex-dev >
Hi, Steve. I agree with what you've said. I'm interested in large speed changes in speech speed, beyond 2X speed up. I personally listen to books at around 3.2X speed, though my speed is considered slow by some blind hackers. In my experience so far (which is limited), the fundamental problem of speeding up speech beyond about 2X is that glottal pulse events determine the pitch and cannot change rate without distorting the voice, while other aspects of speech need to change proportionally to speech speed. This is why I've been looking into LPC based algorithms, which extract the glottal excitation from the voice signal and resynthesise it in the decoder. So far, I've had much better luck with LPC based algorithms than short-time FTP based algorithms for large speed up factors. That's all I have for now... I'm looking forward to learning about the PICOLA algorithm. I'll read as much as I can find on it tomorrow. Bill On Tue, Oct 19, 2010 at 9:37 PM, Steve Underwood <steveu at coppice.org> wrote:> ?Hi Bill, > > Any attempt to alter speed by simple insert or dropping produces poor > results. Even if you can get it to sound smooth, the resulting pitch > shift is horrible. You really need to use a transform that alters speed > smoothly, while maintaining the original pitch of the voice. If you look > in my spandsp library you will find a module which does exactly this, > using an algorithm called PICOLA. You can speed up or slow down a voice > in fine speed steps using this module, and the resulting voice is almost > the same quality as the original. There is a test program for it, which > should function as an example of how you need to call the library to > initialise and use it. > > Steve > > > On 10/20/2010 05:21 AM, Bill Cox wrote: >> Here's one clue about whatever is causing the low quality speech. >> Speech sounds terrible at 1.01X faster, and it sounds excellent at >> normal speed (1.0X). ?So, the main problem is something that breaks >> with any change in frame size in the decoder. ?Any idea what that >> might be? >> >> Thanks, >> Bill >> >> On Tue, Oct 19, 2010 at 5:14 PM, Bill ?Cox<waywardgeek at gmail.com> ?wrote: >>> I was able to easily hack in an option to play back at different >>> speeds. ?For example, using "speexdec --speed 2.0 file.enc file.wav" >>> plays back encoded file.enc at 2X speed. ?What I did was divide >>> st->frameSize and st->subFrameSize by the speedup, and added a >>> SPEEX_SET_SPEED decoder control for the nb_celp decoder. ?This >>> produced speech that was 2X faster than the original. >>> >>> However, the quality is very poor. ?This is where it gets harder for >>> me, as the quality is impacted by so many parts of the code. ?Can >>> anyone guess which part of the decoder is leading to such poor quality >>> when I cut the frame size in half? ?This hack works very well in >>> LPC10, and fairly well in MELPe. >>> >>> I've attached two outputs from speex: the decoded playback at normal >>> speed, and the 2X speed version. >>> >>> Thanks, >>> Bill >>> >> _______________________________________________ >> Speex-dev mailing list >> Speex-dev at xiph.org >> http://lists.xiph.org/mailman/listinfo/speex-dev >> > > _______________________________________________ > Speex-dev mailing list > Speex-dev at xiph.org > http://lists.xiph.org/mailman/listinfo/speex-dev >
Hi, Steve. I tried your the time_scale_tests program, and it works well! Especially for low speed changes, it's the best I've heard so far. For high speed increases, there is what sounds like static added to the sound output. I've attached two sound samples of high speed speech, which is a 4X speed up of a popular TTS voice in the blind community (voxin/Eloquence). I've sped up the voice with LPC in one case, and time_scale_tests in the other. Don't worry that you can't understand these speech samples - many blind people can, and I can understand it, just barely, at this speed. I guess now I need to learn about the algorithm you've used, and see if I can track down the source of the static. I've copied two lists that have blind users who may be interested in very high speed playback of voices. Bill On Tue, Oct 19, 2010 at 9:37 PM, Steve Underwood <steveu at coppice.org> wrote:> ?Hi Bill, > > Any attempt to alter speed by simple insert or dropping produces poor > results. Even if you can get it to sound smooth, the resulting pitch > shift is horrible. You really need to use a transform that alters speed > smoothly, while maintaining the original pitch of the voice. If you look > in my spandsp library you will find a module which does exactly this, > using an algorithm called PICOLA. You can speed up or slow down a voice > in fine speed steps using this module, and the resulting voice is almost > the same quality as the original. There is a test program for it, which > should function as an example of how you need to call the library to > initialise and use it. > > Steve > > > On 10/20/2010 05:21 AM, Bill Cox wrote: >> Here's one clue about whatever is causing the low quality speech. >> Speech sounds terrible at 1.01X faster, and it sounds excellent at >> normal speed (1.0X). ?So, the main problem is something that breaks >> with any change in frame size in the decoder. ?Any idea what that >> might be? >> >> Thanks, >> Bill >> >> On Tue, Oct 19, 2010 at 5:14 PM, Bill ?Cox<waywardgeek at gmail.com> ?wrote: >>> I was able to easily hack in an option to play back at different >>> speeds. ?For example, using "speexdec --speed 2.0 file.enc file.wav" >>> plays back encoded file.enc at 2X speed. ?What I did was divide >>> st->frameSize and st->subFrameSize by the speedup, and added a >>> SPEEX_SET_SPEED decoder control for the nb_celp decoder. ?This >>> produced speech that was 2X faster than the original. >>> >>> However, the quality is very poor. ?This is where it gets harder for >>> me, as the quality is impacted by so many parts of the code. ?Can >>> anyone guess which part of the decoder is leading to such poor quality >>> when I cut the frame size in half? ?This hack works very well in >>> LPC10, and fairly well in MELPe. >>> >>> I've attached two outputs from speex: the decoded playback at normal >>> speed, and the 2X speed version. >>> >>> Thanks, >>> Bill >>> >> _______________________________________________ >> Speex-dev mailing list >> Speex-dev at xiph.org >> http://lists.xiph.org/mailman/listinfo/speex-dev >> > > _______________________________________________ > Speex-dev mailing list > Speex-dev at xiph.org > http://lists.xiph.org/mailman/listinfo/speex-dev >-------------- next part -------------- A non-text attachment was scrubbed... Name: test1_4x_lpc.ogg Type: audio/ogg Size: 25076 bytes Desc: not available Url : http://lists.xiph.org/pipermail/speex-dev/attachments/20101020/9382097d/attachment-0002.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: test1_4x_time_scale.ogg Type: audio/ogg Size: 25396 bytes Desc: not available Url : http://lists.xiph.org/pipermail/speex-dev/attachments/20101020/9382097d/attachment-0003.bin