Hi,
I think I see what you mean, though I haven't been able to listen to
your wma file (not everyone has a wma decoder). The problem probably
only lies in the VBR tuning for wideband which hasn't received much work
yet. One way to check that is to encode in constant bit-rate and see
what the results are. I'm pretty sure you'll notice the problem appears
only at (CBR) quality 5 or below. 
        Jean-Marc
Le ven 05/12/2003 à 12:56, Olav a écrit :> thanks for getting back to me,
> 
> i have uploaded a zip file containing some sound files that
> demonstrates the issue.
> 
>   http://www.bogus.net/~olav/ess.zip
> 
> this contains
> 
>   s.mp3   original wav file (mono) converted to top-quality mp3 (370K)
>   s.wma   windows media encoder with 19khz voice compression    ( 62K)
>   s-2.spx speexenc --vbr --quality 2 on the wav file            ( 63K)
>   s-9.spx --quality 9                                           (197K)
> 
> plus quality 3, 4, 5, 6, 7 and 8.
> 
> the contents of the file is a norweigan sentence from a record
> containing a lot of ess sounds, repeated 10 times or so, just to get
> some file size so file size comparison makes sense.
> 
> one may argue on which compression the ess sounds become
> acceptable. after listening MANY times between the original and the
> spx file, i decided that going under quality 9 means you start to hear
> "computerish" ess sounds.
> 
> as for the speex VS windows media encoder issue, compare speex quality
> 2 with the wma file. they are equally sized and should therefore be of
> equal quality, but in my ears the wma file is quite a lot better. it
> may have less treble, but the spx file sounds very synthetic.
> 
> note: if i have used speexenc incorrectly please let me know.
> 
> the wav file was 2MB so i didn't want to include that, but simply use
> lameenc etc to decode the mp3 file into wav if you want to do testing.
> 
> i hope to hear from you soon. i find this issue very interesting.
> 
> olav
> 
> > From: "Tony & Amanda Benik"
<benikajal@mcihispeed.net>
> > Date: Thu, 4 Dec 2003 23:47:39 -0600
> > 
> > Representative of Olav,
> > 
> > >like if you say "someone said the sun is shining", there
is a lot of
> > >ess sounds, and these will sound "computer-ish" at vbr
qualities below
> > >9.
> > 
> >   I don't mean to be rude but what bit rate is windows media
encoder
> > encoding at and what encoder (type) are you using...  Unless its low
> > (32kbps-8kbps) it doesn't compare to speex (spx).  The
"ess" sound
> > you are hearing are most likely generated because the entire frame
> > (bit of sound) has been striped of all but it most mathematically
> > pure and simplest (smallest) representation.
> > 
> >   I know a bit about text2speech and speech2text, and though a de-ess
> > filter on the speex decoder would be 'pleasant' to the human
ear
> > (if one finds pure tones unpleasant rather than unhuman).  It would
> > make subsequent mixing and encoding of speex streams (VoIP phone
> > lines) less effective and more costly in a resource sense.
> > 
> >   It is a good idea, though I would consider a luxury filter,
that's
> > just me being overly assertive.
> > ||
> > \/
> > 
> >   If anyone is interested from my knowledge of speech recognition all
> > human phonemes when converted from power vs. time to power vs. freq
> > exibit 2 characteristic spikes.  The primary spike defines the base
> > for recognizing the phoneme and the next highest spikes relative
> > location and power give a program a good probability match as to
> > which phoneme it is.
> > 
> > Humanizing spx audio derived solely from pure human voices could be
> > accomplished by reconstructing the secondary peak but would introduce
> > a minimum latency far larger than several frame sizes (ie the length
> > of a human phoneme i.e. vowel consonant).
> > 
> > The filter also will most likely foul up the speech alittle cause
> > like most voice recognition software it can guess wrong an 
> > reconstuct the wrong secondary peak onto the frames.  (I'm
guessing)
> > 
> > The filter also will most likely eat up a lot of cpu power like most
> > voice recognition software.  (I'm guessing)
> > 
> > => > 
> > To conclude:
> >   I may be very wrong so please correct me but I am dilligent to keep
> >   up on these things.
> > 
> > -- Benikus Rex
> > --- >8 ----
> > List archives:  http://www.xiph.org/archives/
> > Ogg project homepage: http://www.xiph.org/ogg/
> > To unsubscribe from this list, send a message to
'speex-dev-request@xiph.org'
> > containing only the word 'unsubscribe' in the body.  No
subject is needed.
> > Unsubscribe messages sent to the list will be ignored/filtered.
> > 
> 
> --- >8 ----
> List archives:  http://www.xiph.org/archives/
> Ogg project homepage: http://www.xiph.org/ogg/
> To unsubscribe from this list, send a message to
'speex-dev-request@xiph.org'
> containing only the word 'unsubscribe' in the body.  No subject is
needed.
> Unsubscribe messages sent to the list will be ignored/filtered.
-- 
Jean-Marc Valin, M.Sc.A., ing. jr.
LABORIUS (http://www.gel.usherb.ca/laborius)
Université de Sherbrooke, Québec, Canada
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: Ceci est une partie de message numériquement signée.
Url :
http://lists.xiph.org/pipermail/speex-dev/attachments/20031205/ee2b5912/signature-0001.pgp
> Date: Fri, 05 Dec 2003 13:22:53 -0500 > From: Jean-Marc Valin <Jean-Marc.Valin@USherbrooke.ca> > > I think I see what you mean, though I haven't been able to listen to > your wma file (not everyone has a wma decoder). The problem probably > only lies in the VBR tuning for wideband which hasn't received much work > yet. One way to check that is to encode in constant bit-rate and see > what the results are. I'm pretty sure you'll notice the problem appears > only at (CBR) quality 5 or below.=20 > > Jean-Marci have done further testing, and even at constant bitrates, wma is far superior, from an "overall listening experience" point of view. it seems that: speex maintains the crispness/treble of the recording, but with the cost of computer-ish background noise, like turning into russian radio stations on the am band, if you get my drift. ess sounds are particularily fragile to this. wma removes all high-freq/treble of the voice, and makes it "round" and dark, but there is no evidence of the computer-bleep bleep effects made by speex, the human voices sound like human voices, only blunter, in a way. ess sounds do not become embarassing. to accomplish the same file size (or bit rate) with wma and spx, the spx quality turns out so poor it is not usable (quality 2) i tried to downsample my wav files from 44100 to 32000 to meet with spx optimizations, but this did not help the situation really. it would be fantastic if spx could be able to compress voice to the extent wma does, but maintain the crispness and treble of the original voice, as far as this is possible. wma seems also less tolerant to music between voices than spx, which is good. a voice encoder should not accept music at all, it should just make garble of silence of it, to the extent that this is possible to detect. wma seems to do this to a certain extent. i am a programmer, but i do not know sound compression algorithms, so i may be talking on wrong grounds, but i would just like to find the best voice compresion program on earth, and just now i have to choose between large/good spx files and small/blunt wma files. it would be great if this could be improved in coming releases. olav --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'speex-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
thanks for getting back to me, i have uploaded a zip file containing some sound files that demonstrates the issue. http://www.bogus.net/~olav/ess.zip this contains s.mp3 original wav file (mono) converted to top-quality mp3 (370K) s.wma windows media encoder with 19khz voice compression ( 62K) s-2.spx speexenc --vbr --quality 2 on the wav file ( 63K) s-9.spx --quality 9 (197K) plus quality 3, 4, 5, 6, 7 and 8. the contents of the file is a norweigan sentence from a record containing a lot of ess sounds, repeated 10 times or so, just to get some file size so file size comparison makes sense. one may argue on which compression the ess sounds become acceptable. after listening MANY times between the original and the spx file, i decided that going under quality 9 means you start to hear "computerish" ess sounds. as for the speex VS windows media encoder issue, compare speex quality 2 with the wma file. they are equally sized and should therefore be of equal quality, but in my ears the wma file is quite a lot better. it may have less treble, but the spx file sounds very synthetic. note: if i have used speexenc incorrectly please let me know. the wav file was 2MB so i didn't want to include that, but simply use lameenc etc to decode the mp3 file into wav if you want to do testing. i hope to hear from you soon. i find this issue very interesting. olav> From: "Tony & Amanda Benik" <benikajal@mcihispeed.net> > Date: Thu, 4 Dec 2003 23:47:39 -0600 > > Representative of Olav, > > >like if you say "someone said the sun is shining", there is a lot of > >ess sounds, and these will sound "computer-ish" at vbr qualities below > >9. > > I don't mean to be rude but what bit rate is windows media encoder > encoding at and what encoder (type) are you using... Unless its low > (32kbps-8kbps) it doesn't compare to speex (spx). The "ess" sound > you are hearing are most likely generated because the entire frame > (bit of sound) has been striped of all but it most mathematically > pure and simplest (smallest) representation. > > I know a bit about text2speech and speech2text, and though a de-ess > filter on the speex decoder would be 'pleasant' to the human ear > (if one finds pure tones unpleasant rather than unhuman). It would > make subsequent mixing and encoding of speex streams (VoIP phone > lines) less effective and more costly in a resource sense. > > It is a good idea, though I would consider a luxury filter, that's > just me being overly assertive. > || > \/ > > If anyone is interested from my knowledge of speech recognition all > human phonemes when converted from power vs. time to power vs. freq > exibit 2 characteristic spikes. The primary spike defines the base > for recognizing the phoneme and the next highest spikes relative > location and power give a program a good probability match as to > which phoneme it is. > > Humanizing spx audio derived solely from pure human voices could be > accomplished by reconstructing the secondary peak but would introduce > a minimum latency far larger than several frame sizes (ie the length > of a human phoneme i.e. vowel consonant). > > The filter also will most likely foul up the speech alittle cause > like most voice recognition software it can guess wrong an > reconstuct the wrong secondary peak onto the frames. (I'm guessing) > > The filter also will most likely eat up a lot of cpu power like most > voice recognition software. (I'm guessing) > > => > To conclude: > I may be very wrong so please correct me but I am dilligent to keep > up on these things. > > -- Benikus Rex > --- >8 ---- > List archives: http://www.xiph.org/archives/ > Ogg project homepage: http://www.xiph.org/ogg/ > To unsubscribe from this list, send a message to 'speex-dev-request@xiph.org' > containing only the word 'unsubscribe' in the body. No subject is needed. > Unsubscribe messages sent to the list will be ignored/filtered. >--- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'speex-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
I have started to update the ACM codec to Speex 1.0.3. Before a release I will also address some issues/bugs that have been brought to my attention. Following that I will try to port that code to PocketPC, making use of the unstable 1.1.3 version as integer build. There is probably little or no need for continuing work on my own integer version (called "Sphinx") Christian <p>--- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'speex-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.