thr3ads.net - Speex dev - [Speex-dev] Major internal changes, TI DSP build change [Apr 2006]

If this information is useful, please help other people find it:
Share via:

Jean-Marc Valin

2006-Apr-21 20:56 UTC

[Speex-dev] Major internal changes, TI DSP build change

> The C5x and C6x output diverges in build 10143, which has log message
"lpc
> floor converted to fixed-point."  Also, the measured SNR changed from
11.05
> in builds 9854-10141 to 9.22 and 9.24 in 10143.
Actually, build 10143 introduced another bug, that was the reason for
the 1.1.11.1 release. 
> There is just four lines in modes.c which declare the constant, and one
line
> changed in nb_celp.c and sb_celp.c which use the constant.  Looking at 
> nb_mode, QCONT16(.0002,15) evaluates to 0x3FF9 on the C55 and 0x4006 on the
> C6x.  When I patch the value 0x4006 into the C55 build, the output matches 
> the C6x.  The problem is that 2^15 evaluates to -32768 on the C55 and 32768
> on the C6x.
Right on!
> Applying our friend EXTEND32 causes the constant to evaluate correctly.  In
> fixed_generic.h,
> #define QCONST16(x,bits) 
>
((spx_word16_t)((x)*((EXTEND32(1))<<(bits))+((EXTEND32(1))<<((bits)-1))))
Actually, this is a case for a simple cast to (spx_word32_t) because
QCONST can be used in a static initialization and EXTEND32 *can* be
defined as a function (e.g. for fixed-point debug). 
> Later I will check if this change makes these two builds match in the
latest
> SVN code.
I fixed it in svn. Could you check that?
> The MIPs are not a problem for me, and the C55 does very well on 32x16 
> multiplies, so I have not played with PRECISION16 since last year.
Does the C55 have a 32x16 multiplier or do you mean it handles my
emulation of it well?

	Jean-Marc
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Ceci est une partie de message
	=?ISO-8859-1?Q?num=E9riquement?= =?ISO-8859-1?Q?_sign=E9e?Url :
http://lists.xiph.org/pipermail/speex-dev/attachments/20060422/067dce30/attachment.pgp

Jim Crichton

2006-Apr-22 06:34 UTC

head link

[Speex-dev] Major internal changes, TI DSP build change

Jean Marc,
>> The C5x and C6x output diverges in build 10143, which has log message 
>> "lpc
>> floor converted to fixed-point."  Also, the measured SNR changed
from
>> 11.05
>> in builds 9854-10141 to 9.22 and 9.24 in 10143.
>
>Actually, build 10143 introduced another bug, that was the reason for
>the 1.1.11.1 release.
>
>> There is just four lines in modes.c which declare the constant, and one
>> line
>> changed in nb_celp.c and sb_celp.c which use the constant.  Looking at
>> nb_mode, QCONT16(.0002,15) evaluates to 0x3FF9 on the C55 and 0x4006 on
>> the
>> C6x.  When I patch the value 0x4006 into the C55 build, the output 
>> matches
>> the C6x.  The problem is that 2^15 evaluates to -32768 on the C55 and 
>> 32768
>> on the C6x.
>
>Right on!
>
>> Applying our friend EXTEND32 causes the constant to evaluate correctly.
>> In
>> fixed_generic.h,
>> #define QCONST16(x,bits)
>>
((spx_word16_t)((x)*((EXTEND32(1))<<(bits))+((EXTEND32(1))<<((bits)-1))))
>
>Actually, this is a case for a simple cast to (spx_word32_t) because
>QCONST can be used in a static initialization and EXTEND32 *can* be
>defined as a function (e.g. for fixed-point debug).
>
>> Later I will check if this change makes these two builds match in the 
>> latest
> SVN code.
>
>I fixed it in svn. Could you check that?
Now all platforms match again.  Note that the measured SNR for this test 
sample is lower than with the broken code (10.87 vs 11.10), but of course 
this is no way to judge the real quality.
>> The MIPs are not a problem for me, and the C55 does very well on 32x16
>> multiplies, so I have not played with PRECISION16 since last year.
>
>Does the C55 have a 32x16 multiplier or do you mean it handles my
>emulation of it well?
I has two ALUs with 17x17 bit MACs, and it has an instruction that does 
this:
ACy = M40(rnd((ACx >> #16) + (uns(Xmem) * uns(Ymem))))

I never quite understood this, so I went of and looked at the manuals.  It 
can multiply the low half in one cycle, then shift and add it to the high 
half in a second cycle.  And, in a type loop the parallel ALUs would allow 
one 32x16 multiply per cycle.

The C54x cannot do this, and uses library calls for 32x16 multiplies.  The 
changes that you have made since 1.1.8 are most dramatic for the 54x, which 
dropped from 184 (unusable in real time, the fastest parts are 160 MHz) to 
79 MIPs.  The C55x dropped from 41.5 to 29.4 MIPs (mixed 16/32 bit 
capability), and the C6x dropped slightly from 36 to 34.5 MIPs (32bit 
machine).

- Jim

Jean-Marc Valin

2006-Apr-22 06:56 UTC

head link

[Speex-dev] Major internal changes, TI DSP build change

> >I fixed it in svn. Could you check that?
> 
> Now all platforms match again.  Note that the measured SNR for this test 
> sample is lower than with the broken code (10.87 vs 11.10), but of course 
> this is no way to judge the real quality.
SNR, especially on a single sample, can be very misleading. Yet, could
you just check that the DSP results match what you get on a PC?
> >Does the C55 have a 32x16 multiplier or do you mean it handles my
> >emulation of it well?
> 
> I has two ALUs with 17x17 bit MACs, and it has an instruction that does 
> this:
> ACy = M40(rnd((ACx >> #16) + (uns(Xmem) * uns(Ymem))))
> 
> I never quite understood this, so I went of and looked at the manuals.  It 
> can multiply the low half in one cycle, then shift and add it to the high 
> half in a second cycle.  And, in a type loop the parallel ALUs would allow 
> one 32x16 multiply per cycle.
Just one thing I'd like to understand. Did you do some tricks and/or
assembly to implement the MULT16_32_Q* routines with these instructions
or does the compiler figure them out by itself?
> The C54x cannot do this, and uses library calls for 32x16 multiplies.  
Why is that? By default all the 32x16 multiplies are computed using only
16x16 multiplies (see fixed_generic.h).
> The 
> changes that you have made since 1.1.8 are most dramatic for the 54x, which
> dropped from 184 (unusable in real time, the fastest parts are 160 MHz) to 
> 79 MIPs.  The C55x dropped from 41.5 to 29.4 MIPs (mixed 16/32 bit 
> capability), and the C6x dropped slightly from 36 to 34.5 MIPs (32bit 
> machine).
Glad it makes such a difference. I'm just surprised that the C6x
complexity is that high.

	Jean-Marc

Reasonably Related Threads

Search for more seemingly similar threads

Speex dev - Apr 2006 - Major internal changes, TI DSP build change

[Speex-dev] Major internal changes, TI DSP build change

[Speex-dev] Major internal changes, TI DSP build change

[Speex-dev] Major internal changes, TI DSP build change

Reasonably Related Threads