thr3ads.net - Speex dev - [Speex-dev] speex on TI C5x fixed-point DSP [Nov 2004]

If this information is useful, please help other people find it:
Share via:

Jamey Hicks

2004-Nov-01 11:47 UTC

[Speex-dev] speex on TI C5x fixed-point DSP

Jean-Marc Valin wrote:
>>I have the encoder and decoder running now and have verified that the 
>>encoder is bit-exact wrt to the fixed-point code running on x86 for the 
>>same 30-second audio sample.  Encode and decode together run in 
>>real-time for 8KHz data, complexity=3, on 120MHz C5509 when code and 
>>data are all in on-chip SRAM.  I have not tested the wideband codec yet.
>>    
>>
>
>Cool! Just curious, how much of the DSP does it take to do that?
>  
>It took all of the DSP at that speed.  The C5509 will go up to 200MHz, 
so I guess it's 2/3 of the DSP.  We're going to profile the code to see 
if we can speed it up.
>
>I looked at the patch. Seems mostly OK, though there are some minor
>issues. First, I don't think it's right to say spx_int_32_t=long
except
>for alpha. This will likely cause problems on Itanium, Power64, SPARC64
>and probably x86-64. I suggest either an explicit check for C55xx (and
>later others) or a test in configure.ac.
>
>Also, I'm not sure I get your use of "+#ifdef UNUSED" in the
SpeexBits
>code. Finally, there's a few changes which I think should be removed,
>like for speex_alloc, speex_warning, ... same as the "#if 0" for
>wideband and other debug stuff. Could you clean the patch and send it
>back to me so I can apply it? 
>  
>I'll clean up the patch as suggested.  I meant to remove all the 
extraneous changes but I guess I missed some.  A test in configure.ac 
for sizeof(long) makes more sense than the ifdef I put in there.

Jamey

Jean-Marc Valin

2004-Nov-01 20:34 UTC

head link

[Speex-dev] speex on TI C5x fixed-point DSP

> It took all of the DSP at that speed.  The C5509 will go up to 200MHz, 
> so I guess it's 2/3 of the DSP.  We're going to profile the code to
see
> if we can speed it up.
I would think it's possible to make it much faster. The first thing I'd
look is the fixed-point macros, like MUL16_16 and make sure they don't
do something much more complicated than what they're supposed to do
(e.g. compute on 32 bit operands), because there's no way to tell C to
do things like "multiply two 16-bit values that will have a result in 32
bits". Also, Speex uses lots of 16x32 multiplications, which your DSP
may do more efficiently in other ways. Of course, after that, I can also
point out several routines that would benefit the most from
optimizations (mainly the filters).

	Jean-Marc

Jamey Hicks

2004-Nov-02 18:42 UTC

head link

[Speex-dev] speex on TI C5x fixed-point DSP

Jean-Marc Valin wrote:
>>It took all of the DSP at that speed.  The C5509 will go up to 200MHz, 
>>so I guess it's 2/3 of the DSP.  We're going to profile the code
to see
>>if we can speed it up.
>>    
>>
>
>I would think it's possible to make it much faster. The first thing
I'd
>look is the fixed-point macros, like MUL16_16 and make sure they don't
>do something much more complicated than what they're supposed to do
>(e.g. compute on 32 bit operands), because there's no way to tell C to
>do things like "multiply two 16-bit values that will have a result in
32
>bits". 
>I have most of these right.  TI's C compiler generates a 16x16=>32 
multiply from the following idiom:

  int32_t z = (int32_t)(int16_t)x * (int32_t)(int16_t)y;

But I see there are still some calls to the 32x32 multiply routine in 
one of the filters.  It looks like the 16x32 multiplies to which you 
refer below.
>Also, Speex uses lots of 16x32 multiplications, which your DSP
>may do more efficiently in other ways. Of course, after that, I can also
>point out several routines that would benefit the most from
>optimizations (mainly the filters).
>  
>That would be helpful, if there are some key ones to start on.

Jamey

Jamey Hicks

2004-Nov-03 07:33 UTC

head link

[Speex-dev] speex on TI C5x fixed-point DSP

Jean-Marc Valin wrote:
>Well, I guess the first thing to look is whether your DSP can actually
>do either 16x32=>48 or 16x32=>32 (keeping the MSBs), which is what the
>smulwb does on ARM. If that's the case, you can gain a lot of speed (use
>one instruction for 16x32 instead of three). Otherwise, replacing the
>32x32 multiplies by 16x16 is probably a good thing.
>  
>One thing I've noticed so far in the filter_mem2 code is the calls to 
SATURATE(x, 805306368).  805306368 is 0x30000000.  I was expecting that 
to be on a bit boundary, say 0x3fffffff?  In which case the arithmetic 
saturation logic could be used.

Jamey

Jean-Marc Valin

2004-Nov-03 20:19 UTC

head link

[Speex-dev] speex on TI C5x fixed-point DSP

> One thing I've noticed so far in the filter_mem2 code is the calls to 
> SATURATE(x, 805306368).  805306368 is 0x30000000.  I was expecting that 
> to be on a bit boundary, say 0x3fffffff?  In which case the arithmetic 
> saturation logic could be used.
I don't think it would make that big of a difference, since the
saturation is outside of the inner loop. If it's that critical, you
could probably remove the saturation completely and just make sure your
signals are scaled properly (i.e. not too close to saturation).

	Jean-Marc

Speex dev - Nov 2004 - speex on TI C5x fixed-point DSP

[Speex-dev] speex on TI C5x fixed-point DSP

[Speex-dev] speex on TI C5x fixed-point DSP

[Speex-dev] speex on TI C5x fixed-point DSP

[Speex-dev] speex on TI C5x fixed-point DSP

[Speex-dev] speex on TI C5x fixed-point DSP