Roman Imankulov
2009-Jun-07 18:16 UTC
[Speex-dev] Speex quality estimation in lossless media
Hi, There is a lot of speex quality estimations. One of this comparative estimation is even available on the official site <http://speex.org/comparison/>. I'd like to present yet another one. And I thought that the best place for this presentation would be Speex-dev mailing list. I want to get feedbacks and criticisms please. If Speex authors consider to make some parts of this work public available on the official site or smth. elsewhere I'll be just happy. I'm ready to answer all concerning this experiment questions in the mailing list or personally. Below is more or less detailed description of the work. Motivation -------------- Currently we make a research work which main purpose is to develop an adaptive algorithm. This algorithm tunes speech encoder parameters depending on network media state (Speex has been chosen for this work because of its wide tuning possibilities). In order to correctly implement its logic we need to get reliable speech quality estimations. For ITU (G.729, etc) and GSM codecs these estimations has been performed and can be obtained from the ITU-T official page. Although there are some experiments which allow user to make an objective comparison between Speex and other codecs, unfortunately we can't found anything which can be considered as reliable enought for our purposes. That's because we perform yet another comparative experiment which result contains Speex quality estimations along with other most popular codecs. Not only results but source data and source of all testing tools are avaliable in public repository [1]. Due to that "open source nature" of these experiments we beleive these results are enough reliable, reproducable and thus objective. Experiment source data and experiment description ---------------------------------------------------- During the experiment the set of source speech samples goes throught simulation model which reproduces voice distortion during encoding and decoding processes (throught codec in fact). After that the source speech samples are compared with the degraded ones according to the PESQ algorithm as defined in P.862 ITU-T recommendation. The comparison is performed with ITU pesq utility. Source speech samples has the length about 8-15 seconds. The samples contain male and female voices, all sentence are pronounced in english. All speech samples are given from internet podcast interview, some of these has unsignificant noise artefacts. Every speech sample has at least 0.5 seconds of the silence on the bounds. Most of speech samples almost have no pauses inside. PESQ estimation is performed with 8kHz samples, resulting value is MOS LQO as defined in P.862.1. Experiment results and further work ------------------------------------- It's considered that everyone can reproduce the experiment results using given source data (see link [1]). But due to the untrivialilty of the environment deployment the plots with main results are available in the attachment. These plots represent the mean value for a set of experiments with given codec with a 95% confidence interval. Note that bitrate ("X") scale is logarithmic. Currently we propose no interpretation of these data. We plan to complete these experiments with ones describing dependencies of the voice quality from network losses with different codecs. [1] : https://github.com/imankulov/speex_quality_evaluation/ -- Roman Imankulov roman at netangels.ru -------------- next part -------------- A non-text attachment was scrubbed... Name: english_male_gsm.eps Type: image/x-eps Size: 22515 bytes Desc: not available Url : http://lists.xiph.org/pipermail/speex-dev/attachments/20090608/77e56ff8/attachment-0002.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: english_male_itu.eps Type: image/x-eps Size: 23134 bytes Desc: not available Url : http://lists.xiph.org/pipermail/speex-dev/attachments/20090608/77e56ff8/attachment-0003.bin