Hi. I'm an undergraduate linguistics major and computer science minor at the University of Colorado in Boulder, and am taking a couple classes this semester which give me the opportunity to do a research project - one on introductory acoustics in the physics department, and one in the linguistics department on phonetics and phonology. I've got an idea, but I'd like to hear from anyone here could help me refine my project to be useful to you folks in some way. I should mention that I don't have the math background to really understand the Fourier transform (much less more complicated beasts), since the highest math class I've taken so far is calculus 1, so this is a major caveat. My idea so far is to record several speakers producing minimal pairs (such as 'zip' and 'sip') and to compress the sound under different compression schemes (Ogg, MP3, GSM, etc) at varying levels of compression, then play them back on decent audio equipment for listening tests to see if listeners can still distinguish important parts of the sounds in the recordings. In particular, I'm interested in looking at the nature of degradation when the compression ratio is particularly high: what phonemes become more difficult to distinguish soonest, as the compression ratio goes up? And then, if possible, I'd like to come up with an analysis of *why* those particular sounds are poorly recreated, as opposed to others. My guess is that fricative sounds (/f/, /v/, /s/, /z/) will "degrade" first because they contain larger amounts of white noise, which is often poorly handled by compression. A couple other people in my classes are interested in working with me on the project - one has more math background, one is willing to administer perception tests on a group of people. I have the linguistics background. We have until the end of April to do the project, and the exact idea should be decided on by the end of this week. I'd like to know if there's something I could do that would be more helpful than academic. So... got any ideas for a project within this scope that would directly benefit Ogg Vorbis development? Changing topics is possible, though it would be preferable for it to have a strong linguistic element so I can use the project for both classes. -- Chris Riddoch | epistemological socket@peakpeak.com | humility --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
> My idea so far is to record several speakers producing minimal pairs > (such as 'zip' and 'sip') and to compress the sound under differentThose are not minimal combinations; minimal combinations are "two letter" sounds, like "zi" and "ip".> compression schemes (Ogg, MP3, GSM, etc) at varying levels of > compression, then play them back on decent audio equipment for > listening tests to see if listeners can still distinguish important > parts of the sounds in the recordings. In particular, I'm interested > in looking at the nature of degradation when the compression ratio is > particularly high: what phonemes become more difficult to distinguish > soonest, as the compression ratio goes up? And then, if possible, I'd > like to come up with an analysis of *why* those particular sounds are > poorly recreated, as opposed to others. My guess is that fricative > sounds (/f/, /v/, /s/, /z/) will "degrade" first because they contain > larger amounts of white noise, which is often poorly handled by > compression.In my experience, Vorbis is worst with plosive sounds. But different codecs have their own problems. And it varies with listener, too.> A couple other people in my classes are interested in working with me > on the project - one has more math background, one is willing to > administer perception tests on a group of people. I have theGroup of people is a good idea.> linguistics background. We have until the end of April to do the > project, and the exact idea should be decided on by the end of this > week. > > I'd like to know if there's something I could do that would be more > helpful than academic. So... got any ideas for a project within this > scope that would directly benefit Ogg Vorbis development? Changing > topics is possible, though it would be preferable for it to have a > strong linguistic element so I can use the project for both classes.Vorbis is not a speech codec, so this won't directly help Vorbis. But it might certainly be helpful by giving further insights in what we do and don't do well, right now. Good luck with your research, and have fun! <p>Segher <p>--- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
> My idea so far is to record several speakers producing minimal pairs > (such as 'zip' and 'sip') and to compress the sound under different > compression schemes (Ogg, MP3, GSM, etc) at varying levels of > compression, then play them back on decent audio equipment for > listening tests to see if listeners can still distinguish important > parts of the sounds in the recordings.I did a related paper last semester on MP3 compression effects. I put the paper up at http://poplar.seitz.com/~ross/mp3compression.ps.gz. It's not a mathematical or extremely technical paper. It was written from the tact of doing basic frequency analysis on post-encoding speech samples. The goal of my project was to determine if speech sounds that had been lossily compressed should be considered rigorous data. My results imply the answer is clearly "YES!" down to at least 64kbps CBR. Interestingly enough, if you read my paper, you see that 64kbps will sometimes outperform 256kbps on my raw frequency analysis tests. Unfortunately, it was an undergrad level class, and as such, some of the software issues I ran into couldn't be solved (time constraints are a bitch - as are the other four classes I had to pass.... ::-). I think if my other ideas for analysis could be worked out a lot more raw data on frequency distortion could be obtained in the domain of speech sounds. (this included the oggenc crash with short files, so no Vorbis testing could happen either). I originally intended to revisit the research as a self-study credit - my professor was very excited to have a student interested in working in this area. What's the scope and time frame for your research? Your findings may induce me to continue my analysis. When you're finished, please post a copy of your paper on the web - I'd really love to read it, as would my prof. Feel free to contact me at any time! Thanks, Ross Vandegrift ross@willow.seitz.com <p>> In particular, I'm interested> in looking at the nature of degradation when the compression ratio is > particularly high: what phonemes become more difficult to distinguish > soonest, as the compression ratio goes up? And then, if possible, I'd > like to come up with an analysis of *why* those particular sounds are > poorly recreated, as opposed to others. My guess is that fricative > sounds (/f/, /v/, /s/, /z/) will "degrade" first because they contain > larger amounts of white noise, which is often poorly handled by > compression. > > A couple other people in my classes are interested in working with me > on the project - one has more math background, one is willing to > administer perception tests on a group of people. I have the > linguistics background. We have until the end of April to do the > project, and the exact idea should be decided on by the end of this > week. > > I'd like to know if there's something I could do that would be more > helpful than academic. So... got any ideas for a project within this > scope that would directly benefit Ogg Vorbis development? Changing > topics is possible, though it would be preferable for it to have a > strong linguistic element so I can use the project for both classes. > > -- > Chris Riddoch | epistemological > socket@peakpeak.com | humility > > --- >8 ---- > List archives: http://www.xiph.org/archives/ > Ogg project homepage: http://www.xiph.org/ogg/ > To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' > containing only the word 'unsubscribe' in the body. No subject is needed. > Unsubscribe messages sent to the list will be ignored/filtered.--- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Sorry for the late reply, all, I'm a bit lagged these days.> My idea so far is to record several speakers producing minimal pairs > (such as 'zip' and 'sip') and to compress the sound under different > compression schemes (Ogg, MP3, GSM, etc) at varying levels of > compression, then play them back on decent audio equipment for > listening tests to see if listeners can still distinguish important > parts of the sounds in the recordings. In particular, I'm interested > in looking at the nature of degradation when the compression ratio is > particularly high: what phonemes become more difficult to distinguish > soonest, as the compression ratio goes up? And then, if possible, I'd > like to come up with an analysis of *why* those particular sounds are > poorly recreated, as opposed to others. My guess is that fricative > sounds (/f/, /v/, /s/, /z/) will "degrade" first because they contain > larger amounts of white noise, which is often poorly handled by > compression.You're going to be hitting this issue from an angle, then--expect to have some problems. Let me clarify. All compression systems that you mentioned, except GSM, are music codecs and are not tuned for speech. Expect them to break down on isolated speech. I would even suggest that your time with them may be better spend elsewhere. If you do limit your work to speech codecs, you're going to run into the LPC-10 kind of family where, at low bit rates, the speaker variations are stripped off and the output starts to sounds like a robot. These codecs were designed to work at very low bit rates (normaly encrypted phone links) and are just intended to get the data across, not sound nice. :) So, if I were you, that would lead me into refining my thesis question a bit. Maybe take your samples, degrade them in controlled ways--add noise, quantize frequency, frequency shift, etc. And test how well they're recognized. That could be done with some normal .WAV editing tools on a PC without much problem.> A couple other people in my classes are interested in working with me > on the project - one has more math background, one is willing to > administer perception tests on a group of people. I have the > linguistics background. We have until the end of April to do the > project, and the exact idea should be decided on by the end of this > week.Oops, looks like I was too late. Well, maybe next semester. :)> I'd like to know if there's something I could do that would be more > helpful than academic. So... got any ideas for a project within this > scope that would directly benefit Ogg Vorbis development? Changing > topics is possible, though it would be preferable for it to have a > strong linguistic element so I can use the project for both classes.Vorbis is a music codec, maybe run your tests with the words sung and music in the background. Test how the inteligibility of the voice degrades with different backgrounds? Just a thought. Cheers, David --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.