david feldman
2008-May-01 17:31 UTC
[Speex-dev] Suitability of speex for use with noisy, non-voice source material?
Hello Jean-Marc, I have completed some very basic testing with unexpectedly excellent results. I am posting this to the reflector to encourage others into similar experiments. My experiment consisted of processing a 200-second long sample taken from a ham radio shortwave receiver, with a variety of signals (some strong, some weak - that is, weaker signals have more noise, and the noise is roughly speaking "white noise" vs. artifacts of other digital signal processing), and some in morse code. The test I ran is very non-scientific but was based on a decent cross-section of signals I'd normally find on shortwave ham radio. The receiver's audio was sent into a generic PC sound card line input port with linux (using Ubuntu) utility "arecord" operating at 8k samples/second, 8 bits/sample, to a WAV file format. Playback (of the subsequently decoded audio back to WAV format) was on a windows machine and windows media player version 9. Testing speex involved using the sample command-line speex windows binaries to encode, then decode. As a side-note, the decoded file was always 2x the byte count of the original source file, and I did not investigate why, but this was unimportant at this test. I tried quality levels from 0 to 8, and made no other command line parameter inputs to speexenc. speexdec was used with no command line parameters. Source material - 64k bits/sec raw Quality 0 - about 2.8k bits/sec encoded - unintelligible (could tell presence of voice, but no words readable) - FYI I did not try to optimize this mode by manipulating the source material further or adjusting other encoder parameters Quality 1 - about 4.3k bits/sec encoded - significant distortion but almost all test signals intelligible, morse code almost fully readable (could live with this) Quality 2 - about 6.3k bits/sec encoded - mild distortion and all test signals intelligible, morse code fully readable Quality 4 - about 8.3k bits/sec encoded - artifacts only very mild and if I was listening for them Quality 6 - about 11.3k bits/sec encoded - barely any artifacts - I can't say "none" but practically none Quality 8 - about 16k bits/sec encoded - no artifacts I could hear Overall, these results are far better than I had expected or could have hoped - both in terms of audio quality level achieved at data rates suitable for the envisioned 14.4 kbps IP/PPP experimental dial-up link, and in terms of the highly granular control I found with adjusting the quality parameter of speexenc. This granular degree of control means much potential for flexibility as I try to implement the end-to-end setup. Next steps involve getting the code compiled onto the linux machine and working out streaming mechanisms for delivery over the dial-up link. Thanks so much for this tool - I am more than encouraged!! Dave> Message: 1 > Date: Mon, 28 Apr 2008 07:27:46 +1000 > From: Jean-Marc Valin > Subject: Re: [Speex-dev] Suitability of speex for use with noisy, > non-voice source material? > To: david feldman > Cc: speex-dev at xiph.org > Message-ID: > Content-Type: text/plain; charset=ISO-8859-1 > > Hi Dave, > > Sounds like Speex would be appropriate for your application. The best > way to check would be to actually try it with the stock encoder and > decoder (speexenc/speexdec). The conditions you list are not ideal, but > they'll affect any speech codec. Plus, in terms of free codecs, Speex is > definitely the only one that can do the job. > > Jean-Marc > > david feldman a ?crit : >> Question from new subscriber - >> >> I'm working on a project to connect to remotely connect to a >> short-wave receiver via a dial-up PPP/IP circuit. Turns out the >> dial-up circuit is only stable (useful) to 14.4 kbps (faster modem >> training produces so many link errors that the net circuit quality is >> unusable - one end is in a remote, rural location), so looking for >> codec that can fit within this circuit minus PPP/IP overhead >> (probably 10 kbps net based on testing so far.) Latency is a >> consideration so I'm looking at voice-type encoding vs. streaming >> MP3. I was going to try use of G726 but it's not configured below 16 >> kbps so hence my resumed search for a codec. >> >> In my initial searching I found speex, but before I try to engineer >> the solution, I'd like to get any advice on use of speex in with the >> expected source material, which is likely to be noisy (static and >> stuff mixed in with the source audio). The source audio (monaural) >> will be pre-filtered to fit with 300-3000 Hz passband (can be >> slightly narrower if need be), and may not always be a single voice >> (that is, may be>1 voices interfering with the audio passband, or >> even non-voice such as tones and other stuff that would appear in the >> passband of the receiver.) So based on this, would I want to avoid >> speex or proceed to experimentation? By the way, this is just for a >> personal project, no commercial intent. >> >> Very tks, >> >> Dave wb0gaz at hotmail.com >>_________________________________________________________________ Make i'm yours.? Create a custom banner to support your cause. http://im.live.com/Messenger/IM/Contribute/Default.aspx?source=TXT_TAGHM_MSN_Make_IM_Yours