thr3ads.net - Speex dev - [Speex-dev] Use voice onset timing to identify voiceless [Oct 2005]

If this information is useful, please help other people find it:
Share via:

Matt Robinson

2005-Oct-28 14:14 UTC

[Speex-dev] Use voice onset timing to identify voiceless

In increasing the computation time and bit rate with VBR:

Has anyone considered implementing the standard audiological
recognition technique of using the duration of zero energy (voice
onset timing) to identify the presence of voiceless sounds?

I would like a means of determining whether or not a given window will
be full of voiced speech or not.

Matt
--
The swallow may fly south with the sun or the house martin or the
plover may seek warmer climes in winter yet these are not strangers to
our land.

Jean-Marc Valin

2005-Oct-28 20:37 UTC

head link

[Speex-dev] Use voice onset timing to identify voiceless

> Has anyone considered implementing the standard audiological
> recognition technique of using the duration of zero energy (voice
> onset timing) to identify the presence of voiceless sounds?
> 
> I would like a means of determining whether or not a given window will
> be full of voiced speech or not.
I think pitch estimation works well enough.

	Jean-Marc
> Matt
> --
> The swallow may fly south with the sun or the house martin or the
> plover may seek warmer climes in winter yet these are not strangers to
> our land.
> _______________________________________________
> Speex-dev mailing list
> Speex-dev@xiph.org
> http://lists.xiph.org/mailman/listinfo/speex-dev
>

Steve Gibson

2005-Oct-28 21:13 UTC

head link

[Speex-dev] To CELP or not to CELP ... at higher bitrates

Jean-Marc,

I am building a tool for producing the highest possible quality Internet 
interviews for "podcasting" applications.  The goal is to produce a
perfect
recording of an interview or conference -- and giving the participants a 
glitch-free experience is secondary.

My approach, therefore, is to build a Windows "wave" file
asynchronously by
using a streaming retransmission protocol to request the retransmission of 
any lost packets from a short-term history buffer maintained by each 
sender, thus "filling in any gaps" after the fact.  In this way we can
guarantee that a perfect recording will always result.

Being able to tolerate lost or delayed packets in the interactive exchange 
(since they will be filled-in later) means that the system can operate with 
a shallower jitter-buffer to minimize the interactive delay, and the 
professional participants can simply ignore any late or lost frames since 
they will know that those will be handled in the final recording.

With that bit of background, my question is:

Since the resulting recorded quality is the ONLY concern, and since we can 
stipulate that all parties will always have access to broadband-scale 
connectivity, does using a CELP-based codec such as Speex make the most 
sense in this application?

I would be running Speex in ultra-wideband with 32 or 48 kHz sampling and 
its bitrate completely open-ended and upwards of 44 kbps ... but higher bit 
rates (of several hundred kbps) would also be readily available to this 
application.

Given that we really don't need the compression levels offered by advanced 
CELP speech encoding, does it still make the most sense to use Speex, or 
would we be better served to use some other codec -- perhaps such as mp2 or 
mp3 -- at higher bitrates?

Do you have any guidelines you could share?

Thanks!
______________________________________________________________________
Steve.

Steve Gibson

2005-Oct-30 15:19 UTC

head link

[Speex-dev] Choosing SSE at runtime?

Jean-Marc,

Would it be a big deal to allow the use of SSE instructions to be a runtime 
rather than a buildtime option?  Are there any plans to allow that?
______________________________________________________________________
Steve.

Maybe Matching Threads

Search for more reasonably related threads

Speex dev - Oct 2005 - Use voice onset timing to identify voiceless

[Speex-dev] Use voice onset timing to identify voiceless

[Speex-dev] Use voice onset timing to identify voiceless

[Speex-dev] To CELP or not to CELP ... at higher bitrates

[Speex-dev] Choosing SSE at runtime?

Maybe Matching Threads