Jean-Marc Valin
2006-Apr-21 20:56 UTC
[Speex-dev] Major internal changes, TI DSP build change
> The C5x and C6x output diverges in build 10143, which has log message "lpc > floor converted to fixed-point." Also, the measured SNR changed from 11.05 > in builds 9854-10141 to 9.22 and 9.24 in 10143.Actually, build 10143 introduced another bug, that was the reason for the 1.1.11.1 release.> There is just four lines in modes.c which declare the constant, and one line > changed in nb_celp.c and sb_celp.c which use the constant. Looking at > nb_mode, QCONT16(.0002,15) evaluates to 0x3FF9 on the C55 and 0x4006 on the > C6x. When I patch the value 0x4006 into the C55 build, the output matches > the C6x. The problem is that 2^15 evaluates to -32768 on the C55 and 32768 > on the C6x.Right on!> Applying our friend EXTEND32 causes the constant to evaluate correctly. In > fixed_generic.h, > #define QCONST16(x,bits) > ((spx_word16_t)((x)*((EXTEND32(1))<<(bits))+((EXTEND32(1))<<((bits)-1))))Actually, this is a case for a simple cast to (spx_word32_t) because QCONST can be used in a static initialization and EXTEND32 *can* be defined as a function (e.g. for fixed-point debug).> Later I will check if this change makes these two builds match in the latest > SVN code.I fixed it in svn. Could you check that?> The MIPs are not a problem for me, and the C55 does very well on 32x16 > multiplies, so I have not played with PRECISION16 since last year.Does the C55 have a 32x16 multiplier or do you mean it handles my emulation of it well? Jean-Marc -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Ceci est une partie de message =?ISO-8859-1?Q?num=E9riquement?= =?ISO-8859-1?Q?_sign=E9e?Url : http://lists.xiph.org/pipermail/speex-dev/attachments/20060422/067dce30/attachment.pgp
Jim Crichton
2006-Apr-22 06:34 UTC
[Speex-dev] Major internal changes, TI DSP build change
Jean Marc,>> The C5x and C6x output diverges in build 10143, which has log message >> "lpc >> floor converted to fixed-point." Also, the measured SNR changed from >> 11.05 >> in builds 9854-10141 to 9.22 and 9.24 in 10143. > >Actually, build 10143 introduced another bug, that was the reason for >the 1.1.11.1 release. > >> There is just four lines in modes.c which declare the constant, and one >> line >> changed in nb_celp.c and sb_celp.c which use the constant. Looking at >> nb_mode, QCONT16(.0002,15) evaluates to 0x3FF9 on the C55 and 0x4006 on >> the >> C6x. When I patch the value 0x4006 into the C55 build, the output >> matches >> the C6x. The problem is that 2^15 evaluates to -32768 on the C55 and >> 32768 >> on the C6x. > >Right on! > >> Applying our friend EXTEND32 causes the constant to evaluate correctly. >> In >> fixed_generic.h, >> #define QCONST16(x,bits) >> ((spx_word16_t)((x)*((EXTEND32(1))<<(bits))+((EXTEND32(1))<<((bits)-1)))) > >Actually, this is a case for a simple cast to (spx_word32_t) because >QCONST can be used in a static initialization and EXTEND32 *can* be >defined as a function (e.g. for fixed-point debug). > >> Later I will check if this change makes these two builds match in the >> latest > SVN code. > >I fixed it in svn. Could you check that?Now all platforms match again. Note that the measured SNR for this test sample is lower than with the broken code (10.87 vs 11.10), but of course this is no way to judge the real quality.>> The MIPs are not a problem for me, and the C55 does very well on 32x16 >> multiplies, so I have not played with PRECISION16 since last year. > >Does the C55 have a 32x16 multiplier or do you mean it handles my >emulation of it well?I has two ALUs with 17x17 bit MACs, and it has an instruction that does this: ACy = M40(rnd((ACx >> #16) + (uns(Xmem) * uns(Ymem)))) I never quite understood this, so I went of and looked at the manuals. It can multiply the low half in one cycle, then shift and add it to the high half in a second cycle. And, in a type loop the parallel ALUs would allow one 32x16 multiply per cycle. The C54x cannot do this, and uses library calls for 32x16 multiplies. The changes that you have made since 1.1.8 are most dramatic for the 54x, which dropped from 184 (unusable in real time, the fastest parts are 160 MHz) to 79 MIPs. The C55x dropped from 41.5 to 29.4 MIPs (mixed 16/32 bit capability), and the C6x dropped slightly from 36 to 34.5 MIPs (32bit machine). - Jim
Jean-Marc Valin
2006-Apr-22 06:56 UTC
[Speex-dev] Major internal changes, TI DSP build change
> >I fixed it in svn. Could you check that? > > Now all platforms match again. Note that the measured SNR for this test > sample is lower than with the broken code (10.87 vs 11.10), but of course > this is no way to judge the real quality.SNR, especially on a single sample, can be very misleading. Yet, could you just check that the DSP results match what you get on a PC?> >Does the C55 have a 32x16 multiplier or do you mean it handles my > >emulation of it well? > > I has two ALUs with 17x17 bit MACs, and it has an instruction that does > this: > ACy = M40(rnd((ACx >> #16) + (uns(Xmem) * uns(Ymem)))) > > I never quite understood this, so I went of and looked at the manuals. It > can multiply the low half in one cycle, then shift and add it to the high > half in a second cycle. And, in a type loop the parallel ALUs would allow > one 32x16 multiply per cycle.Just one thing I'd like to understand. Did you do some tricks and/or assembly to implement the MULT16_32_Q* routines with these instructions or does the compiler figure them out by itself?> The C54x cannot do this, and uses library calls for 32x16 multiplies.Why is that? By default all the 32x16 multiplies are computed using only 16x16 multiplies (see fixed_generic.h).> The > changes that you have made since 1.1.8 are most dramatic for the 54x, which > dropped from 184 (unusable in real time, the fastest parts are 160 MHz) to > 79 MIPs. The C55x dropped from 41.5 to 29.4 MIPs (mixed 16/32 bit > capability), and the C6x dropped slightly from 36 to 34.5 MIPs (32bit > machine).Glad it makes such a difference. I'm just surprised that the C6x complexity is that high. Jean-Marc