Jean-Marc,>> > It's odd that it "almost" works with the fixed_generic.h. The easiest >> > thing would be to gradually replace routines and see which one causes >> > problem. It's most likely (though I'm not 100% sure) that somewhere in >> > the code, I have a 16-bit value that gets sent to a function/macro that >> > expects a 32-bit value. In that case, Stuart's fixed_c55x.h file would >> > work because the inline function would first force the conversion to 32 >> > bits. My initial guess would be with the SH[RL]32 and ADD32 functions.Nice call. The culprit is SHRL32. This is not used in many places, and in most of those, the operand comes from EXTEND32. The only really suspicious instances is in lsp.c (lsp_interpolate): spx_word16_t tmp = DIV32_16(SHL32(1 + subframe,14),nb_subframes); (subframe is an int)> I just checked in changes that may fix some bad behavior on 16-bit > architectures. You might want to try if it fixes the problem. As I said, > it's not clear yet whether the problem is in fixed_generic.h or if it's > somewhere else and Stuart's fixed_c55.h just happens to work around it.I see that your changes include adding an EXTEND32 to the line above. I made ONLY this change to the 1.1.8 base that I am working from, and the problem is gone. I will go back to using fixed_generic.h for now, but it may still be worthwhile to make a custom version that takes advantage of the compiler intrinsics, which include 32-bit shifts, 16x16=32 and 32x16=32 MPY, and 32+16x16=32 MAC (with and without rounding). The multiply arithmetic all returns saturated results. It seems to me that can't hurt. Do you see any problem with using saturated MPY and MACs? By the way, the lsp.c change also fixes the TI C54x build, as expected. - Jim Crichton
Jean-Marc Valin
2005-May-26 08:03 UTC
[Speex-dev] Speex on TI C6x, Problem with TI C5x Patch
> Nice call. The culprit is SHRL32. This is not used in many places, and in > most of those, the operand comes from EXTEND32. The only really suspicious > instances is in lsp.c (lsp_interpolate): > > spx_word16_t tmp = DIV32_16(SHL32(1 + subframe,14),nb_subframes); > (subframe is an int) > ... > I see that your changes include adding an EXTEND32 to the line above. I > made ONLY this change to the 1.1.8 base that I am working from, and the > problem is gone.Good. So I guess you are still having your low MIPS count, right? How much is that?> I will go back to using fixed_generic.h for now, but it may still be > worthwhile to make a custom version that takes advantage of the compiler > intrinsics, which include 32-bit shifts, 16x16=32 and 32x16=32 MPY, and > 32+16x16=32 MAC (with and without rounding). The multiply arithmetic all > returns saturated results. It seems to me that can't hurt. Do you see any > problem with using saturated MPY and MACs?Saturation never hurts. That's actually what I would have used if all platforms supported it (which is at least not the case when ARMv4). If you send me a file with only the macros you want to override from fixed_generic.h, I can include it in the tree.> By the way, the lsp.c change also fixes the TI C54x build, as expected.What do you mean? Before that, it worked on C55, but not C54? Jean-Marc -- Jean-Marc Valin <Jean-Marc.Valin@USherbrooke.ca> Universit? de Sherbrooke
>> Nice call. The culprit is SHRL32. This is not used in many places, and >> in >> most of those, the operand comes from EXTEND32. The only really >> suspicious >> instances is in lsp.c (lsp_interpolate): >> >> spx_word16_t tmp = DIV32_16(SHL32(1 + subframe,14),nb_subframes); >> (subframe is an int) >> ... >> I see that your changes include adding an EXTEND32 to the line above. I >> made ONLY this change to the 1.1.8 base that I am working from, and the >> problem is gone. > > Good. So I guess you are still having your low MIPS count, right? How > much is that?About 42 MIPs peak for encode/decode, complexity 1, 8kbps, no preprocessor or jitter buffer or echo canceller (yet).>> I will go back to using fixed_generic.h for now, but it may still be >> worthwhile to make a custom version that takes advantage of the compiler >> intrinsics, which include 32-bit shifts, 16x16=32 and 32x16=32 MPY, and >> 32+16x16=32 MAC (with and without rounding). The multiply arithmetic all >> returns saturated results. It seems to me that can't hurt. Do you see >> any >> problem with using saturated MPY and MACs? > > Saturation never hurts. That's actually what I would have used if all > platforms supported it (which is at least not the case when ARMv4). If > you send me a file with only the macros you want to override from > fixed_generic.h, I can include it in the tree.Good, but I will not get around to this for a while. My next step is getting a multichannel version working on C64x.>> By the way, the lsp.c change also fixes the TI C54x build, as expected. > > What do you mean? Before that, it worked on C55, but not C54?No, they have always produced the same results. The C54x seems to take about 5x as many cycles to do the task. I can see that 32x32 multiplies are being done by library calls rather than inline, so I will play with this a bit more. Others have used C54x successfully, so significant improvement is possible. - Jim Crichton