Hi. I'm Bill Cox, and I volunteer a bit for the Vinux project, which is Linux for people with vision impairments. Most blind users use a closed-source speech synthesis tool called voxin, as it's very easy to understand at high speed. I would like to make TTS synthesizers based on large recorded vocabularies of actual speech, but to make it useful for the blind, I need to be able to speed up the speech while maintaining excellent quality. To date, I've been playing with low bit rate LPC coding, and it works, which is very cool. However, the quality of the voices I speed up are too low. The blind will hate me if try to switch them over to these low quality voices. I've tried both basic LPC-10, and MELPe. Next, I want to try modifying Speex to see if it can generate higher quality voice at high speed. Do you think this will this be a difficult or easy project? Do you think speex can be modified to generate very high quality voice at high speed? By high speed, I mean voice starting at about 2.5X speed up, all the way up to aroun 8X speed up. I haven't looked at any of the code, yet, so any tips would be greatly appreciated. Thanks, Bill
I was able to easily hack in an option to play back at different speeds. For example, using "speexdec --speed 2.0 file.enc file.wav" plays back encoded file.enc at 2X speed. What I did was divide st->frameSize and st->subFrameSize by the speedup, and added a SPEEX_SET_SPEED decoder control for the nb_celp decoder. This produced speech that was 2X faster than the original. However, the quality is very poor. This is where it gets harder for me, as the quality is impacted by so many parts of the code. Can anyone guess which part of the decoder is leading to such poor quality when I cut the frame size in half? This hack works very well in LPC10, and fairly well in MELPe. I've attached two outputs from speex: the decoded playback at normal speed, and the 2X speed version. Thanks, Bill -------------- next part -------------- A non-text attachment was scrubbed... Name: 1x.ogg Type: audio/ogg Size: 48563 bytes Desc: not available Url : http://lists.xiph.org/pipermail/speex-dev/attachments/20101019/0cfe94e3/attachment-0002.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: 2x.ogg Type: audio/ogg Size: 25498 bytes Desc: not available Url : http://lists.xiph.org/pipermail/speex-dev/attachments/20101019/0cfe94e3/attachment-0003.bin
Here's one clue about whatever is causing the low quality speech. Speech sounds terrible at 1.01X faster, and it sounds excellent at normal speed (1.0X). So, the main problem is something that breaks with any change in frame size in the decoder. Any idea what that might be? Thanks, Bill On Tue, Oct 19, 2010 at 5:14 PM, Bill Cox <waywardgeek at gmail.com> wrote:> I was able to easily hack in an option to play back at different > speeds. ?For example, using "speexdec --speed 2.0 file.enc file.wav" > plays back encoded file.enc at 2X speed. ?What I did was divide > st->frameSize and st->subFrameSize by the speedup, and added a > SPEEX_SET_SPEED decoder control for the nb_celp decoder. ?This > produced speech that was 2X faster than the original. > > However, the quality is very poor. ?This is where it gets harder for > me, as the quality is impacted by so many parts of the code. ?Can > anyone guess which part of the decoder is leading to such poor quality > when I cut the frame size in half? ?This hack works very well in > LPC10, and fairly well in MELPe. > > I've attached two outputs from speex: the decoded playback at normal > speed, and the 2X speed version. > > Thanks, > Bill >
You're asking the wrong question. The question is not "why does it would bad with Speex?", but "why does it sound good with LPC10 and MELP?". And the answer is that both are vocoders. Try dropping frames/subframes with anything else (Vorbis, MP3, G.729, u-law, ...) and it'll sound terrible as well. The only reason it sounds good with vocoders is because the codec parameters are in fact synthesizer parameters that don't have a direct connection with the signal. Jean-Marc Bill Cox <waywardgeek at gmail.com> a ?crit?:> I was able to easily hack in an option to play back at different > speeds. For example, using "speexdec --speed 2.0 file.enc file.wav" > plays back encoded file.enc at 2X speed. What I did was divide > st->frameSize and st->subFrameSize by the speedup, and added a > SPEEX_SET_SPEED decoder control for the nb_celp decoder. This > produced speech that was 2X faster than the original. > > However, the quality is very poor. This is where it gets harder for > me, as the quality is impacted by so many parts of the code. Can > anyone guess which part of the decoder is leading to such poor quality > when I cut the frame size in half? This hack works very well in > LPC10, and fairly well in MELPe. > > I've attached two outputs from speex: the decoded playback at normal > speed, and the 2X speed version. > > Thanks, > Bill >