thr3ads.net - Vorbis dev - [vorbis-dev] Psycho-acoustics research [Mar 2002]

If this information is useful, please help other people find it:
Share via:

Chris Riddoch

2002-Mar-19 18:25 UTC

[vorbis-dev] Psycho-acoustics research

Hi.

I'm an undergraduate linguistics major and computer science minor at
the University of Colorado in Boulder, and am taking a couple classes
this semester which give me the opportunity to do a research project -
one on introductory acoustics in the physics department, and one in
the linguistics department on phonetics and phonology. I've got an
idea, but I'd like to hear from anyone here could help me refine my
project to be useful to you folks in some way.

I should mention that I don't have the math background to really
understand the Fourier transform (much less more complicated beasts),
since the highest math class I've taken so far is calculus 1, so this
is a major caveat.

My idea so far is to record several speakers producing minimal pairs
(such as 'zip' and 'sip') and to compress the sound under
different
compression schemes (Ogg, MP3, GSM, etc) at varying levels of
compression, then play them back on decent audio equipment for
listening tests to see if listeners can still distinguish important
parts of the sounds in the recordings. In particular, I'm interested
in looking at the nature of degradation when the compression ratio is
particularly high: what phonemes become more difficult to distinguish
soonest, as the compression ratio goes up?  And then, if possible, I'd
like to come up with an analysis of *why* those particular sounds are
poorly recreated, as opposed to others.  My guess is that fricative
sounds (/f/, /v/, /s/, /z/) will "degrade" first because they contain
larger amounts of white noise, which is often poorly handled by
compression.

A couple other people in my classes are interested in working with me
on the project - one has more math background, one is willing to
administer perception tests on a group of people.  I have the
linguistics background.  We have until the end of April to do the
project, and the exact idea should be decided on by the end of this
week.

I'd like to know if there's something I could do that would be more
helpful than academic. So... got any ideas for a project within this
scope that would directly benefit Ogg Vorbis development? Changing
topics is possible, though it would be preferable for it to have a
strong linguistic element so I can use the project for both classes.


-- 
Chris Riddoch       | epistemological
socket@peakpeak.com | humility

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Segher Boessenkool

2002-Mar-19 23:41 UTC

head link

[vorbis-dev] Psycho-acoustics research

> My idea so far is to record several speakers producing minimal pairs
> (such as 'zip' and 'sip') and to compress the sound under
different
Those are not minimal combinations; minimal combinations are
"two letter" sounds, like "zi" and "ip".
> compression schemes (Ogg, MP3, GSM, etc) at varying levels of
> compression, then play them back on decent audio equipment for
> listening tests to see if listeners can still distinguish important
> parts of the sounds in the recordings. In particular, I'm interested
> in looking at the nature of degradation when the compression ratio is
> particularly high: what phonemes become more difficult to distinguish
> soonest, as the compression ratio goes up?  And then, if possible, I'd
> like to come up with an analysis of *why* those particular sounds are
> poorly recreated, as opposed to others.  My guess is that fricative
> sounds (/f/, /v/, /s/, /z/) will "degrade" first because they
contain
> larger amounts of white noise, which is often poorly handled by
> compression.
In my experience, Vorbis is worst with plosive sounds.  But different
codecs have their own problems.  And it varies with listener, too.
> A couple other people in my classes are interested in working with me
> on the project - one has more math background, one is willing to
> administer perception tests on a group of people.  I have the
Group of people is a good idea.
> linguistics background.  We have until the end of April to do the
> project, and the exact idea should be decided on by the end of this
> week.
> 
> I'd like to know if there's something I could do that would be more
> helpful than academic. So... got any ideas for a project within this
> scope that would directly benefit Ogg Vorbis development? Changing
> topics is possible, though it would be preferable for it to have a
> strong linguistic element so I can use the project for both classes.
Vorbis is not a speech codec, so this won't directly help Vorbis.
But it might certainly be helpful by giving further insights in what
we do and don't do well, right now.

Good luck with your research, and have fun!

<p>Segher

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Ross Vandegrift

2002-Mar-20 18:11 UTC

head link

[vorbis-dev] Psycho-acoustics research

> My idea so far is to record several speakers producing minimal pairs
> (such as 'zip' and 'sip') and to compress the sound under
different
> compression schemes (Ogg, MP3, GSM, etc) at varying levels of
> compression, then play them back on decent audio equipment for
> listening tests to see if listeners can still distinguish important
> parts of the sounds in the recordings.
I did a related paper last semester on MP3 compression effects.  I put
the paper up at http://poplar.seitz.com/~ross/mp3compression.ps.gz.
It's not a mathematical or extremely technical paper.  It was written
from the tact of doing basic frequency analysis on post-encoding speech
samples.

The goal of my project was to determine if speech sounds that had been
lossily compressed should be considered rigorous data.  My results imply
the answer is clearly "YES!" down to at least 64kbps CBR. 
Interestingly
enough, if you read my paper, you see that 64kbps will sometimes
outperform 256kbps on my raw frequency analysis tests.

Unfortunately, it was an undergrad level class, and as such, some of the
software issues I ran into couldn't be solved (time constraints are a
bitch - as are the other four classes I had to pass.... ::-).  I think
if my other ideas for analysis could be worked out a lot more raw
data on frequency distortion could be obtained in the domain of speech
sounds.  (this included the oggenc crash with short files, so no Vorbis
testing could happen either).

I originally intended to revisit the research as a self-study credit -
my professor was very excited to have a student interested in working
in this area.  What's the scope and time frame for your research?  Your
findings may induce me to continue my analysis.

When you're finished, please post a copy of your paper on the web - I'd
really love to read it, as would my prof.

Feel free to contact me at any time!

Thanks,
Ross Vandegrift
ross@willow.seitz.com

<p>> In particular, I'm interested> in looking at the nature of degradation when the compression ratio is
> particularly high: what phonemes become more difficult to distinguish
> soonest, as the compression ratio goes up?  And then, if possible, I'd
> like to come up with an analysis of *why* those particular sounds are
> poorly recreated, as opposed to others.  My guess is that fricative
> sounds (/f/, /v/, /s/, /z/) will "degrade" first because they
contain
> larger amounts of white noise, which is often poorly handled by
> compression.
> 
> A couple other people in my classes are interested in working with me
> on the project - one has more math background, one is willing to
> administer perception tests on a group of people.  I have the
> linguistics background.  We have until the end of April to do the
> project, and the exact idea should be decided on by the end of this
> week.
> 
> I'd like to know if there's something I could do that would be more
> helpful than academic. So... got any ideas for a project within this
> scope that would directly benefit Ogg Vorbis development? Changing
> topics is possible, though it would be preferable for it to have a
> strong linguistic element so I can use the project for both classes.
> 
> -- 
> Chris Riddoch       | epistemological
> socket@peakpeak.com | humility
> 
> --- >8 ----
> List archives:  http://www.xiph.org/archives/
> Ogg project homepage: http://www.xiph.org/ogg/
> To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
> containing only the word 'unsubscribe' in the body.  No subject is
needed.
> Unsubscribe messages sent to the list will be ignored/filtered.
--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

David Willmore

2002-Apr-01 08:14 UTC

head link

[vorbis-dev] Psycho-acoustics research

Sorry for the late reply, all, I'm a bit lagged these days.
> My idea so far is to record several speakers producing minimal pairs
> (such as 'zip' and 'sip') and to compress the sound under
different
> compression schemes (Ogg, MP3, GSM, etc) at varying levels of
> compression, then play them back on decent audio equipment for
> listening tests to see if listeners can still distinguish important
> parts of the sounds in the recordings. In particular, I'm interested
> in looking at the nature of degradation when the compression ratio is
> particularly high: what phonemes become more difficult to distinguish
> soonest, as the compression ratio goes up?  And then, if possible, I'd
> like to come up with an analysis of *why* those particular sounds are
> poorly recreated, as opposed to others.  My guess is that fricative
> sounds (/f/, /v/, /s/, /z/) will "degrade" first because they
contain
> larger amounts of white noise, which is often poorly handled by
> compression.
You're going to be hitting this issue from an angle, then--expect
to have some problems.  Let me clarify.  All compression systems
that you mentioned, except GSM, are music codecs and are not tuned
for speech.  Expect them to break down on isolated speech.  I would
even suggest that your time with them may be better spend elsewhere.

If you do limit your work to speech codecs, you're going to run into
the LPC-10 kind of family where, at low bit rates, the speaker
variations are stripped off and the output starts to sounds like
a robot.  These codecs were designed to work at very low bit
rates (normaly encrypted phone links) and are just intended to
get the data across, not sound nice. :)

So, if I were you, that would lead me into refining my thesis
question a bit.  Maybe take your samples, degrade them in controlled
ways--add noise, quantize frequency, frequency shift, etc.  And
test how well they're recognized.  That could be done with some
normal .WAV editing tools on a PC without much problem.
> A couple other people in my classes are interested in working with me
> on the project - one has more math background, one is willing to
> administer perception tests on a group of people.  I have the
> linguistics background.  We have until the end of April to do the
> project, and the exact idea should be decided on by the end of this
> week.
Oops, looks like I was too late.  Well, maybe next semester. :)
> I'd like to know if there's something I could do that would be more
> helpful than academic. So... got any ideas for a project within this
> scope that would directly benefit Ogg Vorbis development? Changing
> topics is possible, though it would be preferable for it to have a
> strong linguistic element so I can use the project for both classes.
Vorbis is a music codec, maybe run your tests with the words sung
and music in the background.  Test how the inteligibility of the
voice degrades with different backgrounds?  Just a thought.

Cheers,
David

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Maybe Matching Threads

Search for more maybe matching threads

Vorbis dev - Mar 2002 - Psycho-acoustics research

[vorbis-dev] Psycho-acoustics research

[vorbis-dev] Psycho-acoustics research

[vorbis-dev] Psycho-acoustics research

[vorbis-dev] Psycho-acoustics research

Maybe Matching Threads