Hi, I'd like to contribute to Vorbis and I think this may be of some interest for low bitrate coding. I have been experimenting with low bit-rate coding for the high-band (11 kHz to 22 kHz) and, though I haven't yet started quantizing my coefficients (a gain and an LPC filter), I expect to be able to approximate the whole 11-22 kHz band with around 1000 bits/s per channel (maybe even 500 bps). Now, I don't know what is the normal bit-rate allocated for this band, but I expect it is greater than that. Am I right? (can anyone give me numbers for this?) The technique I use to do this is inspired from an acticle I published recently (http://panoramix.dyndns.org/jm/scw2000.pdf) and is based on the fact that at these frequencies, the ear is totally insensitive to the spectral fine structure. The processing also has relativly low complexity (most of it is two LPC analysis in the encoder and one in the decoder). I have tested it with some files (including harpsichord, which is supposed to be hard to code) and the difference with the original (CD rip) is hard to hear. You can find demo files of this at: ftp://freespeech.sourceforge.net/pub/freespeech/ There are 6 files: bach10-ref.sw : Original file (right channel from Bach's Chromatic Fantasia) bach10-ext.sw : Resulting file from my experiment. bach10-lp.sw : Low-passed at 11 kHz bach10-lame-ref.sw : Encoded with lame (128 bps), but original in the low band bach10-ogg-ref.sw : Encoded with vorbis (160 bps), but original in the low band bach10-ogg128-ref.sw : Encoded with vorbis (128 bps), but original in the low band For the last 3 files, I put back the original low band (0-11 kHz) so that only the high band differences are present. The files are PCM 16 bits/sample, little endian, 44.1 kHz. Anyone thinks this could be useful? Any interesting audio file you'd like me to process? Jean-Marc -- Jean-Marc Valin Universite de Sherbrooke - Genie Electrique valj01@gel.usherb.ca --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
> I'd like to contribute to Vorbis and I think this may be of some interest for > low bitrate coding. I have been experimenting with low bit-rate coding for the > high-band (11 kHz to 22 kHz) and, though I haven't yet started quantizing my > coefficients (a gain and an LPC filter), I expect to be able to approximate the > whole 11-22 kHz band with around 1000 bits/s per channel (maybe even 500 bps). > Now, I don't know what is the normal bit-rate allocated for this band, but I > expect it is greater than that. Am I right? (can anyone give me numbers for > this?)Depends. It varies from zero to a few kilobits depending on what the psychoacoustics model says.> The technique I use to do this is inspired from an acticle I published recently > (http://panoramix.dyndns.org/jm/scw2000.pdf) and is based on the fact that at > these frequencies, the ear is totally insensitive to the spectral fine > structure.Correct, however, the ear is extremely sensitive to preecho and time-localization of high frequency energy. You don't hear the pitch in the high frequencies, you hear the fact that a sharp edge was smeared (what aggressive quantization in the high end will cause).> I have tested it with some files (including harpsichord, which is supposed to be > hard to code) and the difference with the original (CD rip) is hard to hear. You > can find demo files of this at: > ftp://freespeech.sourceforge.net/pub/freespeech/Harpsichord (like voice) is well suited to this technique because of regular harmoncs. Try it on violin, cymbals, and nonmusical sources. I hear a brief, glassy preecho ... what block size were you using for your experiment? I'm guessing very short.... The results might be more if not used in situations where ogg/lame would be using short blocks and used over lapped 2048 sample blocks like ogg. Monty --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
>> Now, I don't know what is the normal bit-rate allocated for this band, but I >> expect it is greater than that. Am I right? (can anyone give me numbers for >> this?) > > Depends. It varies from zero to a few kilobits depending on what the > psychoacoustics model says.few kilobits, meaning? In my example can you say what amount of bits vorbis puts in the 11-22 kHz band?>> The technique I use to do this is inspired from an acticle I published recently >> (http://panoramix.dyndns.org/jm/scw2000.pdf) and is based on the fact that at >> these frequencies, the ear is totally insensitive to the spectral fine >> structure. > > Correct, however, the ear is extremely sensitive to preecho and > time-localization of high frequency energy. You don't hear the pitch > in the high frequencies, you hear the fact that a sharp edge was > smeared (what aggressive quantization in the high end will cause).The process I used is not subject to pre-echo. The way I extend the residue is by simply upsampling the LP residue, causing spectral folding (unlike my article, for which I use a non-linear function). The time-localization will thus be preserved. For voice, I have even obtained very good results when starting the extension at 3.5 kHz.>> I have tested it with some files (including harpsichord, which is supposed to be >> hard to code) and the difference with the original (CD rip) is hard to hear. You >> can find demo files of this at: >> ftp://freespeech.sourceforge.net/pub/freespeech/ > > Harpsichord (like voice) is well suited to this technique because of > regular harmoncs. Try it on violin, cymbals, and nonmusical sources.I have added a violin file in the same directory (ftp://freespeech.sourceforge.net/pub/freespeech/) with the "vi4-" prefix. I think it works a bit better than the harpsichord. I don't files with cymbals, but if you have some, please send them to me. As I said earlier, the ear is totally insensitive to the spectral fine structure at these frequencies. It cannot even tell noise from harmonics. The only reason I didn't just put noise is that upsampling preserves the time localization within a frame.> I hear a brief, glassy preecho ... what block size were you using for > your experiment? I'm guessing very short.... The results might be > more if not used in situations where ogg/lame would be using short > blocks and used over lapped 2048 sample blocks like ogg.I'm using 1024-sample frames and my LPC filter is calculated on a 2048 window. Anyway, the whole point of this was for very-low bitrate modes where you cannot afford many bits for the high-band and in which case, you could still afford 500 bps. I think I could go as low as that using vector quantization and prediction. Right now, the system is not optimal, I still need to play with the window size, and the LPC regularization params (noise floor, pre-emphasis, bandwidth expansion/lag windowing). What I'd like to know is whether you think this could potentially be interesting. Jean-Marc P.S. Please also reply directly to me, as my subscription to vorbis-dev doesn't seem to work. -- Jean-Marc Valin Universite de Sherbrooke - Genie Electrique valj01@gel.usherb.ca --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
OK, I have just finished quantizing my coefficients and the result is better than I had expected... 345 bps for the whole 11-22 kHz band. The audio files are still at ftp://freespeech.sourceforge.net/pub/freespeech/ bach10-diffquant.sw : The high band is quantized with 8 bits/frame, thus 345 bps bach10-diffquant2.sw : I used shorter frames to get better results, 690 bps In order to get these bitrates, I used differential vector quantization in the cepstral domain. There are two things left to try: 1) Use intra-frame LPC interpolation (in the LSF domain) 2) Predict the high-band envelope from the LSP masking curve, and further reduce the bit-rate By the way, what I'm proposing here is probably not something that would go in the 64 kbits/channel modes, but in a very low bitrate modes, when there are (almost) no bits left for the high band. Jean-Marc -- Jean-Marc Valin Universite de Sherbrooke - Genie Electrique valj01@gel.usherb.ca --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Just curious, our project was hoping to use 56k coding for a mono sound, and maybe even 36 or 28k if they sound good enough. Right now, the smallest rate for a mono source is 64k. Thanks, Spanky --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.