Stuart, Jean-Marc,>> 1) We need our own "fixed_xx.h" header file. I don't know why, and >> haven't >> had time to investigate, but there is a definite improvement when I use >> the >> attached fixed_c55x.h file which has turned all the maths into inline >> functions. > > Did you try with fixed_generic.h or just with fixed_debug.h? > fixed_debug.h uses int and short directly, so I know it won't work with > the C5x. However, I think fixed_generic.h should work and has all the > operators defined as macros anyway, so inlining isn't a problem.I incorporated Stuarts fixed_c55x.h file, and that eliminated the artifacts, at the expense of a ~30% increase in MIPs. Now the male.wav file through encoder/decoder produces a bit-exact match with the C64x test that I did earlier. I will do some more testing to isolate the, but it may be a few days before I get to this task. As Jean-Marc says, fixed_generic should work, unless the compiler becomes hopelessly confused by something. Maybe this is a compiler bug. I noticed that in Jamey Hicks original 1.1.6 patch, he had test code with inline function (and some counters for measuring macro use), but I got the same results in this build with the inline functions or the macros. So something is different in 1.1.8.>> Some optimisation or something is probably possible here to >> reduce code size and inline the functions, as by default the C55x >> compiler >> does not seem to inline them (perhaps due to debugging mode). This can be >> enabled with a C55X_ASM definition following the ARM fixed point math >> definition convention, and some it could be converted to assembler in the >> future. > > The assembly definitions for the operators are only useful if you have > gcc-like inline assembly. Otherwise, the explicit register loads will > make it worse.I believe that Code Composer provides very good support for inline assembly, although I have not used it myself, and hope not to.>> 2) Proper definitions for the speex types are required in the >> speex_types.h >> file - I did this and you can enable it via a __C55X__ definition. File >> attached. My definition follows the convention of other defines in this >> file. It could be covered by the C55X_ASM define above, but the content >> of >> this file is not going to have assembler in it. I leave it up to you if >> think this is wrong or right - just tell me and I'll follow! > > I didn't include the C5x in speex_types.h because I was under the > impression that autoconf would be used, but I can add it.Unless there is some reason to do otherwise, it seems like all C55x specific options should be turned on by one flag. This can be decided after the current issues have been investigated further.>> 3) There seem to be further int/long on a C55X issues in nb_celp.c for >> the >> nb_encoder_ctl and nb_decoder_ctl functions. I think that all the >> (*(int*)ptr) or *((int*)ptr) should be (*(long*)ptr) for a C55X. I don't >> know why. What I presume was happening was that the data passed to the >> function in void *ptr was being lost in the upper or lower half of a >> 32-bit >> word. So it didn't matter what you passed as a setting for >> SPEEX_SET_QUALITY >> and SPEEX_SET_COMPLEXITY, the (*(int*)ptr) always = 0. For quality this >> doesn't matter I don't think due to the default settings, but the >> st->complexity always ended up as zero, and the decoded bit-stream ends >> up >> sounding very ropy. > > Actually, I don't see why long/int would be a problem here since you're > also passing an (int*) and the values are never higher than 32767.This suggests that the problem is in the calling routine.>> I think this the issue Jim alluded to in his question 2 >> regarding 'artifacts'. I don't think it was artifacts but perhaps this >> issue >> I have described. In nb_celp.c I changed all those effected (*(int*)ptr) >> to >> long, but perhaps it would be better to have a more platform independant >> resolution? Does ptr need to be a void *? Can't we use an int * or, >> better >> still, one of the nice Speex types so we know what we are expecting to >> arrive at the function? > > It has to be a (void*) because the type is not always int. Some calls > use floats, some even use a struct.As indicated above, the inline functions cured the step and impulse artifacts. I will test the control calls later.>> I can make the changes and submit, I just want to >> know how best to do a platform independant change. (I attach my nb_celp.c >> for Jim and just so you can see what I did). Jim: Note that this will >> push >> your MIPS up again as now the encode is actually doing a decent job on >> the >> decoding - I think this was the reason for the huge performance increase. > > I don't see what you're talking about.I think that Stuart is refering to the big jump in MIPs I saw between the C54x and C55x DSPs. I have not sorted that out yet. I would like to take a look at why Stuart's inline functions change the results, since there is a significant performance penalty in using them.>> 4) Jamie's CONFIG_TI_C55X definition doesn't seem to work for me. >> Whenever I >> compile using this option, I get a "Buffer too small to apck bits" and >> then >> "Could not resize input buffer: not packing" messages. I haven't had time >> to >> figure out what that means or what causes it, but will if I get time. > > I just fixed a but in bits.c (see Jim's previous email), maybe that was > the cause.Yes, that's right. speex_bits_insert_terminator keeps calling speex_pack_bits forever, because bitPtr never reaches 15. Eventually the bit buffer runs out of room.>> 5) Re: stack memory allocations, yes, a compile-time option would be >> great >> to reduce the stack sizes. Ideally what is needed is the minimum stack >> usage >> so that us embedded developers can support streaming audio, for either >> compression or decompression. For nb mode, for example, this would mean >> receiving a 160 byte raw data frame, compressing and storing or sending, >> then dealing with the next frame. This is my opinion anyway, and I think >> this usage model is a good target model to keep in mind during >> development. >> For an embedded system it is likely that developers would do only >> compression, or only decompression in their application, so splitting the >> encoder and decoder into seperate files would also be a good idea in the >> future. > > I might add an option to disable the decoder or the encoder, but a lot > of the common code will still be there. BTW, have you tried removing the > preprocessor, the echo canceller and the jitter buffer if you want to > save space? > >> Whether you'll want all that stuff put >> back into the main Speex release I don't know, but if you do that is fine >> with me too. > > Unless it's really ugly, I don't see why it couldn't go in the main > releases. > > Jean-Marc > > -- > Jean-Marc Valin <Jean-Marc.Valin@USherbrooke.ca> > Universit? de Sherbrooke > > _______________________________________________ > Speex-dev mailing list > Speex-dev@xiph.org > http://lists.xiph.org/mailman/listinfo/speex-dev >
Jean-Marc Valin
2005-May-25 12:46 UTC
[Speex-dev] Speex on TI C6x, Problem with TI C5x Patch
> I incorporated Stuarts fixed_c55x.h file, and that eliminated the artifacts, > at the expense of a ~30% increase in MIPs. Now the male.wav file through > encoder/decoder produces a bit-exact match with the C64x test that I did > earlier. I will do some more testing to isolate the, but it may be a few > days before I get to this task. As Jean-Marc says, fixed_generic should > work, unless the compiler becomes hopelessly confused by something. Maybe > this is a compiler bug.It's odd that it "almost" works with the fixed_generic.h. The easiest thing would be to gradually replace routines and see which one causes problem. It's most likely (though I'm not 100% sure) that somewhere in the code, I have a 16-bit value that gets sent to a function/macro that expects a 32-bit value. In that case, Stuart's fixed_c55x.h file would work because the inline function would first force the conversion to 32 bits. My initial guess would be with the SH[RL]32 and ADD32 functions.> I noticed that in Jamey Hicks original 1.1.6 patch, he had test code with > inline function (and some counters for measuring macro use), but I got the > same results in this build with the inline functions or the macros. So > something is different in 1.1.8.What do you mean here?> I believe that Code Composer provides very good support for inline assembly, > although I have not used it myself, and hope not to.Well, you can always try to do assembly versions of the macros, but I'm sure you'll have better results by re-writing some of the time-critical routines directly (see what's been done for ARM in the _arm4.h files).> Unless there is some reason to do otherwise, it seems like all C55x specific > options should be turned on by one flag. This can be decided after the > current issues have been investigated further.That's already the case, except that this part relies on the configure script. I realize that not everyone is using configure, so perhaps there should be another way.> I think that Stuart is refering to the big jump in MIPs I saw between the > C54x and C55x DSPs. I have not sorted that out yet. I would like to take a > look at why Stuart's inline functions change the results, since there is a > significant performance penalty in using them.I'm sure you'll still get a low MIPS count after we find the fix to fixed_generic.h.> Yes, that's right. speex_bits_insert_terminator keeps calling > speex_pack_bits forever, because bitPtr never reaches 15. Eventually the > bit buffer runs out of room.Good, that explains other problems I've seen. Jean-Marc -- Jean-Marc Valin <Jean-Marc.Valin@USherbrooke.ca> Universite de Sherbrooke
Jean-Marc,>> I incorporated Stuarts fixed_c55x.h file, and that eliminated the >> artifacts, >> at the expense of a ~30% increase in MIPs. Now the male.wav file through >> encoder/decoder produces a bit-exact match with the C64x test that I did >> earlier. I will do some more testing to isolate the, but it may be a few >> days before I get to this task. As Jean-Marc says, fixed_generic should >> work, unless the compiler becomes hopelessly confused by something. >> Maybe >> this is a compiler bug. > > It's odd that it "almost" works with the fixed_generic.h. The easiest > thing would be to gradually replace routines and see which one causes > problem. It's most likely (though I'm not 100% sure) that somewhere in > the code, I have a 16-bit value that gets sent to a function/macro that > expects a 32-bit value. In that case, Stuart's fixed_c55x.h file would > work because the inline function would first force the conversion to 32 > bits. My initial guess would be with the SH[RL]32 and ADD32 functions.That is my plan.>> I noticed that in Jamey Hicks original 1.1.6 patch, he had test code with >> inline function (and some counters for measuring macro use), but I got >> the >> same results in this build with the inline functions or the macros. So >> something is different in 1.1.8. > > What do you mean here?The macros are the same, but the results are different. So it is probably a change in one of the macro calls.>> I believe that Code Composer provides very good support for inline >> assembly, >> although I have not used it myself, and hope not to. > > Well, you can always try to do assembly versions of the macros, but I'm > sure you'll have better results by re-writing some of the time-critical > routines directly (see what's been done for ARM in the _arm4.h files).It already runs fast enough on 55x for my immediate needs, if the simulator is telling the truth. But you make a good point.>> Unless there is some reason to do otherwise, it seems like all C55x >> specific >> options should be turned on by one flag. This can be decided after the >> current issues have been investigated further. > > That's already the case, except that this part relies on the configure > script. I realize that not everyone is using configure, so perhaps there > should be another way. > >> I think that Stuart is refering to the big jump in MIPs I saw between the >> C54x and C55x DSPs. I have not sorted that out yet. I would like to >> take a >> look at why Stuart's inline functions change the results, since there is >> a >> significant performance penalty in using them. > > I'm sure you'll still get a low MIPS count after we find the fix to > fixed_generic.h. > >> Yes, that's right. speex_bits_insert_terminator keeps calling >> speex_pack_bits forever, because bitPtr never reaches 15. Eventually the >> bit buffer runs out of room. > > Good, that explains other problems I've seen.- Jim Crichton