I have been thinking about what is needed to make language teaching/learning tools. (Like talking flash cards.) The main thing needed is a low bit-rate encoding of human voice. At first I thought I could take one of the gevernment standard vocoders and embed it as an Ogg stream. But: (1) there is not a standard vocoder, there is are half a dozen, at least. (2) they are fixed bit rate, we really don't want to waste bits while the teacher waits student to respond (3) they include error correction (not needed here because we assume the underlying storage and transport mechanism takes care of that (4) one might need several different quality/bit rate options For example, both hours of Spanish radio broadcast packed into as small a file as possible, and a high audio quality demonstration of the difference in pronounciation of a "D" in English, Spanish, Chinese, and German, using as much space as needed to make the difference sound clear. My current thought is to filter the input down to a bandwidth of 7KHz or 4KHz (traditional values for high and low quality speech), decimate the samples so that the sound is sampled at, say, 44/3=14.6KHz or 44/5=8.8KHz, then run it through the standard Vorbis encoder. Vorbis then sees an ordinary 20KHz bandwidth stream that sounds like a tape recording running at 3 to 5 times normal speed and encodes it as usual. I checked the mailing list archives, and found an old thread about low bit-rate encoding that quickly degenerated into a highly bogus discussion of the proper way to decimate the sample sequence. We don't need to do that again, so assume the filtering and decimation is done properly, is there any reason this scheme could not work? Are there hooks in the Vorbis stream format to tell the decoder that this as been done so that it will know to play back slower than normal? Do I have to write all this myself, or is it already in there if I just know the parameter to set? -- -- Keith Wright <kwright@free-comp-shop.com> Programmer in Chief, Free Computer Shop <http://www.free-comp-shop.com> --- Food, Shelter, Source code. --- --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
On Mon, 12 Feb 2001, Keith Wright wrote:> (1) there is not a standard vocoder, there is are half a dozen, at least.GSM is pretty standard for government use. I think LPC gets some limited use too. http://kbs.cs.tu-berlin.de/~jutta/toast.html> (2) they are fixed bit rate, we really don't want to waste bits > while the teacher waits student to respondSo do silence detection.> (3) they include error correction (not needed here because > we assume the underlying storage and transport mechanism > takes care of thatGSM does not.> (4) one might need several different quality/bit rate options > For example, both hours of Spanish radio broadcast packed > into as small a file as possible, and a high audio quality > demonstration of the difference in pronounciation of a "D" > in English, Spanish, Chinese, and German, using as much space > as needed to make the difference sound clear.You can overclock GSM if you want. -Dan --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
On Mon, Feb 12, 2001 at 07:50:25PM -0500, Keith Wright wrote: [snip]> My current thought is to filter the input down to a bandwidth > of 7KHz or 4KHz (traditional values for high and low quality > speech), decimate the samples so that the sound is sampled > at, say, 44/3=14.6KHz or 44/5=8.8KHz, then run it through > the standard Vorbis encoder. Vorbis then sees an ordinary > 20KHz bandwidth stream that sounds like a tape recording > running at 3 to 5 times normal speed and encodes it as > usual.[snip]> the sample sequence. We don't need to do that again, so > assume the filtering and decimation is done properly, > is there any reason this scheme could not work? Are > there hooks in the Vorbis stream format to tell the decoder > that this as been done so that it will know to play back > slower than normal? Do I have to write all this myself, > or is it already in there if I just know the parameter to set?Vorbis can happily take in a 8.8KHz (or just about any other) sampling rate file and act accordingly. If you like to vorbis (make it think it's chipmunks) you will get HORRIBLE results because the psycoacustic masking is highly frequency dependant and vorbis will get the masking all wrong. --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
> My current thought is to filter the input down to a bandwidth > of 7KHz or 4KHz (traditional values for high and low quality > speech), decimate the samples so that the sound is sampled > at, say, 44/3=14.6KHz or 44/5=8.8KHz, then run it through > the standard Vorbis encoder. Vorbis then sees an ordinary > 20KHz bandwidth stream that sounds like a tape recording > running at 3 to 5 times normal speed and encodes it as > usual.No, vorbis would see a normal stream of data, at a sampling rate you specify. Vorbis does NOT require that input be 44.1kHz, though that is what is has been tuned for. Notably, pre-echo will be particularly bad with lower sampling rates.> > I checked the mailing list archives, and found an old thread > about low bit-rate encoding that quickly degenerated into > a highly bogus discussion of the proper way to decimate > the sample sequence. We don't need to do that again, so > assume the filtering and decimation is done properly, > is there any reason this scheme could not work? Are > there hooks in the Vorbis stream format to tell the decoder > that this as been done so that it will know to play back > slower than normal? Do I have to write all this myself, > or is it already in there if I just know the parameter to set?There is no 'slower than normal'. The stream format specifies the sampling rate, vorbis will encode and play back at that rate (well, playback is really for the frontend to care about - on the decode side I don't think vorbis itself needs to worry about the sampling rate at all). Michael --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.