Matt Robinson
2005-Oct-28 14:14 UTC
[Speex-dev] Use voice onset timing to identify voiceless
In increasing the computation time and bit rate with VBR: Has anyone considered implementing the standard audiological recognition technique of using the duration of zero energy (voice onset timing) to identify the presence of voiceless sounds? I would like a means of determining whether or not a given window will be full of voiced speech or not. Matt -- The swallow may fly south with the sun or the house martin or the plover may seek warmer climes in winter yet these are not strangers to our land.
Jean-Marc Valin
2005-Oct-28 20:37 UTC
[Speex-dev] Use voice onset timing to identify voiceless
> Has anyone considered implementing the standard audiological > recognition technique of using the duration of zero energy (voice > onset timing) to identify the presence of voiceless sounds? > > I would like a means of determining whether or not a given window will > be full of voiced speech or not.I think pitch estimation works well enough. Jean-Marc> Matt > -- > The swallow may fly south with the sun or the house martin or the > plover may seek warmer climes in winter yet these are not strangers to > our land. > _______________________________________________ > Speex-dev mailing list > Speex-dev@xiph.org > http://lists.xiph.org/mailman/listinfo/speex-dev >
Steve Gibson
2005-Oct-28 21:13 UTC
[Speex-dev] To CELP or not to CELP ... at higher bitrates
Jean-Marc, I am building a tool for producing the highest possible quality Internet interviews for "podcasting" applications. The goal is to produce a perfect recording of an interview or conference -- and giving the participants a glitch-free experience is secondary. My approach, therefore, is to build a Windows "wave" file asynchronously by using a streaming retransmission protocol to request the retransmission of any lost packets from a short-term history buffer maintained by each sender, thus "filling in any gaps" after the fact. In this way we can guarantee that a perfect recording will always result. Being able to tolerate lost or delayed packets in the interactive exchange (since they will be filled-in later) means that the system can operate with a shallower jitter-buffer to minimize the interactive delay, and the professional participants can simply ignore any late or lost frames since they will know that those will be handled in the final recording. With that bit of background, my question is: Since the resulting recorded quality is the ONLY concern, and since we can stipulate that all parties will always have access to broadband-scale connectivity, does using a CELP-based codec such as Speex make the most sense in this application? I would be running Speex in ultra-wideband with 32 or 48 kHz sampling and its bitrate completely open-ended and upwards of 44 kbps ... but higher bit rates (of several hundred kbps) would also be readily available to this application. Given that we really don't need the compression levels offered by advanced CELP speech encoding, does it still make the most sense to use Speex, or would we be better served to use some other codec -- perhaps such as mp2 or mp3 -- at higher bitrates? Do you have any guidelines you could share? Thanks! ______________________________________________________________________ Steve.
Jean-Marc, Would it be a big deal to allow the use of SSE instructions to be a runtime rather than a buildtime option? Are there any plans to allow that? ______________________________________________________________________ Steve.