Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: maleout12may.wav Type: audio/wav Size: 95884 bytes Desc: not available Url : http://lists.xiph.org/pipermail/speex-dev/attachments/20050524/57112d0c/maleout12may-0001.bin
Jean-Marc Valin
2005-May-24  22:38 UTC
[Speex-dev] Speex on TI C6x, Problem with TI C5x Patch
Hi Jim, Thanks a lot for helping track problems with Speex.> There is a bit of work remaining to get the memory usage down for a > multichannel application. There have been some good posts over the > last couple of months about reducing memory usage.I think 1.1.8 incorporates all memory reductions proposed. Let me know otherwise.> Also, to nominally comply with the TI XDAIS algorithm standard, it is > necessary to extract all of the memory allocation from the code, > organize it into blocks, and provide a table to the application host > with the size and scratch/persistent nature of each block. The host > then does the memory allocating, and provides the pointers back to the > application.I'm not familiar with XDAIS, but I would think you could just overload the speex_alloc() and speex_free() functions, right?> The C5x has been another matter altogether. There were some build > issues: > > 1. The C54x compiler does not recognize "long long" and in fact does > not support any 64-bit integer types. For the moment I am using > "double" which maps to a 32-bit floating point on the C54x and C55x. > The C55x compiler does support "long long". > > Question 1: Is there anything wrong with using a 32-bit float for > spx_word64_t (other than MIPs)? This type is used only in two places > in ltp.c.No problem replacing with a float. The reason for the 64 bits is not the precision but only the range. A 40-bit accumulator would work too. Eventually, this could probably made to fit in a 32-bit int, but I haven't done that yet.> 2. I am not using the configure tools, so I needed to create > speex_config_types.h (short and int are 16, long is 32 on C5x).That, or you could add an entry in the speex_types.h file (which is used for all platforms that don't use autoconf).> 3. And, of course, the internal stack memory allocations in > nb_encoder_int and nb_decoder_init had to be cut down to fit within > the available data memory space. It would be useful to parameterize > the working stack allocation size for those folks who cannot use the > new VAR_ARRAYS and USE_ALLOCA stuff.Would a compile-time option be OK (so I don't need to change the API)? If so, I'll put that on the TODO list.> After these changes the code built, but the encoder never returned. > In bits.c, one of the changes from Jamey Hicks Speex 1.1.6 patch was > lost in his 1.1.7 patch, and thus is missing in the 1.1.8 release. > This causes an infinite loop when the code tries to pad the frame to > the next char boundry. In the while loop in speex_bits_pack (line > 246):... Oops! It's now fixed in SVN.> With this change, the codec ran, but the encoded data is garbage. > Eventually I realized that because the char size on the C5x is 16 > bits, the fread and fwrite routines are using only the least > significant 8 bits of each word. A little packing and unpacking > later, the encoder/decoder loop was producing intelligible sound. > However, there are some some anomalies. Using the sample file > male.wav, the output has a positive step at 0.1 sec (rapid ramp from 0 > to ~20000 sample value, with decay back to zero by time 0.112 sec), > another positive step at 2.940 sec (amplitude about 3000, decaying in > 12 ms again), and a rail-to-rail impulse at 4.600 sec (also decaying > within a few msec). This is a simulator, so there are no "real world" > effects at play. The C6x simulation does not show the artifacts. The > encoded bits are the same for the first frame, but then they diverge.That's odd, definitely worth investigating.> After some fumbling, I was able to extract the changed files from > Jamey Hicks' original 1.1.6 patch, and this did not show the > artifacts. However, the MIPs are too high for 1.1.6 (~285) and 1.1.8 > (~225), far exceeding the 160 MHz instruction rate the C54x processor, > so I may have to abandon this part.I remember sending to the list a corrected (so it still works on "normal" machines) modified version of Jamey's patch a couple months ago. Can you check if it has the artifact? Note that the bits.c bug is probably in that patch too. IIRC, it applied to SVN just before 1.1.7.> After some more fumbling, I got Speex running under the C55x > simulator, and this produces the same (bit-exact) results as C54x for > 1.1.8 and 1.1.6patch, although the MIPs are much better (56 for 1.1.6, > 42 for 1.1.8).I didn't understand. On the C55x, you're getting always good results or bad results?> Question 2: Does anyone have any suggestions about where to start > tracking down the artifacts? Speex does not use circular buffers, so > I do not suspect a pointer wrap. I am fairly certain that the ALLOC > stack is not running out. That would seem to leave arithmetic > overflow, or some kind of unsigned-signed conversion problem. There > have been many changes between 1.1.6 and 1.1.8, and nothing is leaping > out at me from the Diff so far. The MIPs difference is significant, > so it looks like some processing may have been simplified too much.Actually, there's been some major optimization between 1.1.6 and 1.1.8. Most of that was just before 1.1.7. As for a possible cause, it's hard to tell. Here's a couple things I would suggest: - Try defining FIXED_DEBUG and see if you have overflows (you'll probably have to tweak fixed_debug.h first because it assumes int=32 bits) - Try adding a "#define int long" or something like that in the C files. - Since you're saying that the decoder is having the problem too (when the file is correctly encoded), I suggest you focus on the decoder only, since it's much simpler. - One way to track differences would be to look at the exc signal at different places (e.g. after the ltp_unquant, before and after the comb_filter). I'm pretty sure it has to do with the fact that an int is 16 bits. Let me know if you have any questions or if you find something. Jean-Marc -- Jean-Marc Valin <Jean-Marc.Valin@USherbrooke.ca> Universit? de Sherbrooke
>> There is a bit of work remaining to get the memory usage down for a >> multichannel application. There have been some good posts over the >> last couple of months about reducing memory usage. > > I think 1.1.8 incorporates all memory reductions proposed. Let me know > otherwise.For the persistent storage, the only change that I have made is to MAX_CHARS_PER_FRAME, which is set to 2000 in bits.c. I changed bits.c to set this value only if it was not already defined, and then put my own, much smaller value in config.h. For the scratch stack, I replace the fixed values in nb_encoder_init and nb_decoder_init with constants that I defined in config.h. Jamey Hicks original C5x patch had some test code in stack_alloc.h to detect working stack overflow. Maybe something similar could be done to measure the peak stack usage, enabled by a debug switch. Then, for a space critical application, it would be easy to measure the stack requirement for a given operating mode, and set the size (manually) accordingly.>> Also, to nominally comply with the TI XDAIS algorithm standard, it is >> necessary to extract all of the memory allocation from the code, >> organize it into blocks, and provide a table to the application host >> with the size and scratch/persistent nature of each block. The host >> then does the memory allocating, and provides the pointers back to the >> application. > > I'm not familiar with XDAIS, but I would think you could just overload > the speex_alloc() and speex_free() functions, right?According to this standard, an allocate call is made to an algorithm, and the algorithm fills in a table of required blocks (size, alignment, and scratch/persistent type). The system allocates these blocks, and calls the algorithm init function, with the same memory table, now including the base addresses. Now, these addresses have to get into Speex somehow. Since I did not want to change the API, I have resorted to the kludge of declaring global variables, which I initialize based on the allocated memory blocks. My alloc routines then look at the global variables, similar to the way calloc works. This does not solve the problem of distinguishing persistent and scratch storage. To do this, I added a speex_alloc_scratch routine, which uses a different memory block than speex_alloc. This does force a change to nb_encoder_init, etc. At the moment, the code looks like this: #if defined(VAR_ARRAYS) || defined (USE_ALLOCA) st = (EncState*)speex_alloc(sizeof(EncState)); if (!st) return NULL; st->stack = NULL; #elif defined(SCRATCH_ALLOC) st = (EncState*)speex_alloc(sizeof(EncState)); if (!st) return NULL; st->stack = (char*)speex_alloc_scratch(SPEEXENC_SCRATCH_STACK_SIZE); #else st = (EncState*)speex_alloc(sizeof(EncState)+8000*sizeof(spx_sig_t)); if (!st) return NULL; st->stack = ((char*)st) + sizeof(EncState); #endif Note that I also moved the "if (!st)" check to before st-stack is set, since a write to a bad location would occur otherwise.>> Question 1: Is there anything wrong with using a 32-bit float for >> spx_word64_t (other than MIPs)? This type is used only in two places >> in ltp.c. > > No problem replacing with a float. The reason for the 64 bits is not the > precision but only the range. A 40-bit accumulator would work too. > Eventually, this could probably made to fit in a 32-bit int, but I > haven't done that yet.The C55x uses a 40-bit long long (as Stuart Cording pointed out), so this should be fine here.>> 3. And, of course, the internal stack memory allocations in >> nb_encoder_int and nb_decoder_init had to be cut down to fit within >> the available data memory space. It would be useful to parameterize >> the working stack allocation size for those folks who cannot use the >> new VAR_ARRAYS and USE_ALLOCA stuff. > > Would a compile-time option be OK (so I don't need to change the API)? > If so, I'll put that on the TODO list.I am using a compile option, as shown above.>> With this change, the codec ran, but the encoded data is garbage. >> Eventually I realized that because the char size on the C5x is 16 >> bits, the fread and fwrite routines are using only the least >> significant 8 bits of each word. A little packing and unpacking >> later, the encoder/decoder loop was producing intelligible sound. >> However, there are some some anomalies. Using the sample file >> male.wav, the output has a positive step at 0.1 sec (rapid ramp from 0 >> to ~20000 sample value, with decay back to zero by time 0.112 sec), >> another positive step at 2.940 sec (amplitude about 3000, decaying in >> 12 ms again), and a rail-to-rail impulse at 4.600 sec (also decaying >> within a few msec). This is a simulator, so there are no "real world" >> effects at play. The C6x simulation does not show the artifacts. The >> encoded bits are the same for the first frame, but then they diverge. > > That's odd, definitely worth investigating.Stuart Cordings change to replace the math macros with inline functions cures the problem. I will continue to look at this. - Jim Crichton