Jean-Marc,>> I incorporated Stuarts fixed_c55x.h file, and that eliminated the >> artifacts, >> at the expense of a ~30% increase in MIPs. Now the male.wav file through >> encoder/decoder produces a bit-exact match with the C64x test that I did >> earlier. I will do some more testing to isolate the, but it may be a few >> days before I get to this task. As Jean-Marc says, fixed_generic should >> work, unless the compiler becomes hopelessly confused by something. >> Maybe >> this is a compiler bug. > > It's odd that it "almost" works with the fixed_generic.h. The easiest > thing would be to gradually replace routines and see which one causes > problem. It's most likely (though I'm not 100% sure) that somewhere in > the code, I have a 16-bit value that gets sent to a function/macro that > expects a 32-bit value. In that case, Stuart's fixed_c55x.h file would > work because the inline function would first force the conversion to 32 > bits. My initial guess would be with the SH[RL]32 and ADD32 functions.That is my plan.>> I noticed that in Jamey Hicks original 1.1.6 patch, he had test code with >> inline function (and some counters for measuring macro use), but I got >> the >> same results in this build with the inline functions or the macros. So >> something is different in 1.1.8. > > What do you mean here?The macros are the same, but the results are different. So it is probably a change in one of the macro calls.>> I believe that Code Composer provides very good support for inline >> assembly, >> although I have not used it myself, and hope not to. > > Well, you can always try to do assembly versions of the macros, but I'm > sure you'll have better results by re-writing some of the time-critical > routines directly (see what's been done for ARM in the _arm4.h files).It already runs fast enough on 55x for my immediate needs, if the simulator is telling the truth. But you make a good point.>> Unless there is some reason to do otherwise, it seems like all C55x >> specific >> options should be turned on by one flag. This can be decided after the >> current issues have been investigated further. > > That's already the case, except that this part relies on the configure > script. I realize that not everyone is using configure, so perhaps there > should be another way. > >> I think that Stuart is refering to the big jump in MIPs I saw between the >> C54x and C55x DSPs. I have not sorted that out yet. I would like to >> take a >> look at why Stuart's inline functions change the results, since there is >> a >> significant performance penalty in using them. > > I'm sure you'll still get a low MIPS count after we find the fix to > fixed_generic.h. > >> Yes, that's right. speex_bits_insert_terminator keeps calling >> speex_pack_bits forever, because bitPtr never reaches 15. Eventually the >> bit buffer runs out of room. > > Good, that explains other problems I've seen.- Jim Crichton
Jean-Marc Valin
2005-May-25 14:29 UTC
[Speex-dev] Speex on TI C6x, Problem with TI C5x Patch
> > It's odd that it "almost" works with the fixed_generic.h. The easiest > > thing would be to gradually replace routines and see which one causes > > problem. It's most likely (though I'm not 100% sure) that somewhere in > > the code, I have a 16-bit value that gets sent to a function/macro that > > expects a 32-bit value. In that case, Stuart's fixed_c55x.h file would > > work because the inline function would first force the conversion to 32 > > bits. My initial guess would be with the SH[RL]32 and ADD32 functions. > > That is my plan.I just checked in changes that may fix some bad behavior on 16-bit architectures. You might want to try if it fixes the problem. As I said, it's not clear yet whether the problem is in fixed_generic.h or if it's somewhere else and Stuart's fixed_c55.h just happens to work around it. Jean-Marc -- Jean-Marc Valin <Jean-Marc.Valin@USherbrooke.ca> Universite de Sherbrooke
Jean-Marc,>> > It's odd that it "almost" works with the fixed_generic.h. The easiest >> > thing would be to gradually replace routines and see which one causes >> > problem. It's most likely (though I'm not 100% sure) that somewhere in >> > the code, I have a 16-bit value that gets sent to a function/macro that >> > expects a 32-bit value. In that case, Stuart's fixed_c55x.h file would >> > work because the inline function would first force the conversion to 32 >> > bits. My initial guess would be with the SH[RL]32 and ADD32 functions.Nice call. The culprit is SHRL32. This is not used in many places, and in most of those, the operand comes from EXTEND32. The only really suspicious instances is in lsp.c (lsp_interpolate): spx_word16_t tmp = DIV32_16(SHL32(1 + subframe,14),nb_subframes); (subframe is an int)> I just checked in changes that may fix some bad behavior on 16-bit > architectures. You might want to try if it fixes the problem. As I said, > it's not clear yet whether the problem is in fixed_generic.h or if it's > somewhere else and Stuart's fixed_c55.h just happens to work around it.I see that your changes include adding an EXTEND32 to the line above. I made ONLY this change to the 1.1.8 base that I am working from, and the problem is gone. I will go back to using fixed_generic.h for now, but it may still be worthwhile to make a custom version that takes advantage of the compiler intrinsics, which include 32-bit shifts, 16x16=32 and 32x16=32 MPY, and 32+16x16=32 MAC (with and without rounding). The multiply arithmetic all returns saturated results. It seems to me that can't hurt. Do you see any problem with using saturated MPY and MACs? By the way, the lsp.c change also fixes the TI C54x build, as expected. - Jim Crichton